cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Native geometry Parquet support

jordanpinder
New Contributor

Hi there!

With the recent GeoParquet 2.0 announcements, I'm curious to understand how this impacts storing geospatial data in Databricks and Delta. For reference:

Since it's being added to the Parquet specification itself, does it mean it'll soon end up native to Delta as well?

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

GeoParquet 2.0’s formalization within the Apache Parquet specification is a significant step for native geospatial data storage across the modern data ecosystem, particularly for platforms like Databricks and Delta Lake. In summary, Delta Lake's reliance on the underlying Parquet format means that, once support is generalized in Parquet, native geospatial types are likely to be supported in Delta too—though speed of adoption also depends on implementation details in the Delta Lake runtime and Databricks platform layers.

Current State of Geospatial Data in Parquet, Iceberg, and Delta

  • GeoParquet 2.0 & Parquet Specification
    The latest GeoParquet announcement together with Parquet’s official geospatial guidance means GEOMETRY and GEOGRAPHY types have standard encoding rules and metadata. This unifies storage conventions, making geospatial interoperability simpler and less vendor-specific.

  • Iceberg 3 Specification
    Iceberg has already specified native support for GEOMETRY and GEOGRAPHY types in its table format. This means that as these types are stored natively in Parquet, Iceberg table engines can leverage them with appropriate semantics for query and indexing.

  • Delta Lake and Databricks
    Delta Lake is built atop Parquet. While Delta Lake does not (as of now) maintain a separate geospatial type specification, its tight coupling with Parquet means new features or types (like native GEOMETRY/GEOGRAPHY columns) usually become available once Parquet writers/readers adopt them, provided the Delta transaction log can reference and manage those types.

Implications for Delta Lake

  • Native Support Timeline
    As the Parquet format implements these new types, Databricks and Delta Lake will inherit this support—but full native handling (reading, writing, indexing, and querying) also depends on their internal libraries and Spark integrators being updated to recognize and work with the new Parquet geospatial encodings.

  • Interoperability
    Expect increased interoperability between Spark, Databricks, Delta, and external tools (e.g., GDAL, QGIS) as these standards are adopted.

  • Layered Adoption
    Full benefit arrives when:

    • The Spark engine (used by Databricks/Delta) natively supports the new Parquet geospatial schemas

    • Delta Lake transaction log and APIs understand and preserve these types

    • Downstream libraries/tools can query, filter, and optimize over native geospatial columns

Conclusion

Yes, as geospatial types become native in Parquet and are utilized in table formats like Iceberg, it’s inevitable they will be adopted “natively” in Delta (i.e., as first-class geospatial columns without extra serialization/deserialization or user hacks)—but the exact timing depends on when Databricks and Delta Lake update their software stacks to fully leverage the new Parquet geospatial features. This adoption is highly likely due to the shared ecosystem, but check Databricks or Delta Lake release notes for specific support timelines as this rolls out.

Feature Parquet (GeoParquet 2.0) Iceberg 3 Delta Lake (future)
GEOMETRY/GEOGRAPHY Native (official spec) Native (spec) Likely (pending implementation, depends on Parquet support)
 
 

For practical use, keep a close eye on:

  • Delta Lake and Databricks changelogs/releases

  • Spark’s geospatial type support PRs/roadmap

Databricks’ rapid adoption of other Parquet features suggests native geospatial support will arrive soon after it matures in upstream formats.