You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(This is an abridged version of the proposal document)
Big data open source projects have been leveraged for storage and analysis of geospatial data for a long time, and a flourishing ecosystem has evolved. Examples are GeoParquet for Parquet, Sedona for Spark, GeoMesa for HBase and Cassandra, and in-development or completed native support in Hive and Trino. Given the central position of Apache Iceberg table format in the stack, it would be great to natively support geospatial support as well.
There have been implementations of geospatial support in Iceberg (Geolake and Havasu) which have promising results. Unfortunately as Iceberg lacks Extension points, these have been in the form of forks of the project. It would be great to leverage the efforts and findings of these projects in adding native support to Iceberg.
This will add the following to the Iceberg project:
Create a table with geospatial type CREATE TABLE geom_table (geom GEOMETRY);
Insert geospatial data INSERT INTO geom_table VALUES ('POINT(1 2)', 'LINESTRING(1 2, 3 4)')
Query using geospatial predicates: SELECT * FROM geom_table WHERE ST_COVERS(geom, ST_POINT(0.5, 0.5))
Define a geospatial partition transform to allow partition filtering for geospatial query ALTER TABLE geom_table ADD PARTITION FIELD (xz2(geom))
Rewrite using geospatial sort order to allow file and row-group filtering for geospatial query CALL rewrite_data_files(table => `geom_table`, sort_order => `hilbert(geom)`)
Note: special thanks to @jiayuasu and @Kontinuation from Wherobots for invaluable domain specific advice and POC support from Havasu Iceberg-fork and Geolake, and also @badbye and other members of Geolake for support.
Proposed Change
(This is an abridged version of the proposal document)
Big data open source projects have been leveraged for storage and analysis of geospatial data for a long time, and a flourishing ecosystem has evolved. Examples are GeoParquet for Parquet, Sedona for Spark, GeoMesa for HBase and Cassandra, and in-development or completed native support in Hive and Trino. Given the central position of Apache Iceberg table format in the stack, it would be great to natively support geospatial support as well.
There have been implementations of geospatial support in Iceberg (Geolake and Havasu) which have promising results. Unfortunately as Iceberg lacks Extension points, these have been in the form of forks of the project. It would be great to leverage the efforts and findings of these projects in adding native support to Iceberg.
This will add the following to the Iceberg project:
This will allow the following use cases:
CREATE TABLE geom_table (geom GEOMETRY);
INSERT INTO geom_table VALUES ('POINT(1 2)', 'LINESTRING(1 2, 3 4)')
SELECT * FROM geom_table WHERE ST_COVERS(geom, ST_POINT(0.5, 0.5))
ALTER TABLE geom_table ADD PARTITION FIELD (xz2(geom))
CALL rewrite_data_files(table => `geom_table`, sort_order => `hilbert(geom)`)
Proposal document
https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI
Specifications
The text was updated successfully, but these errors were encountered: