<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Native geometry Parquet support in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/native-geometry-parquet-support/m-p/111465#M43900</link>
    <description>&lt;P&gt;Hi there!&lt;/P&gt;&lt;P&gt;With the recent &lt;A href="https://medium.com/radiant-earth-insights/geoparquet-2-0-going-native-840066371c77" target="_self"&gt;GeoParquet 2.0 announcements&lt;/A&gt;, I'm curious to understand how this impacts storing geospatial data in Databricks and Delta. For reference:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;the&amp;nbsp;&lt;SPAN&gt;Parquet specification officially adopting &lt;A href="https://github.com/apache/parquet-format/blob/94b9d631aef332c78b8f1482fb032743a9c3c407/Geospatial.md?plain=1#L27" target="_self"&gt;geospatial guidance&lt;/A&gt;&amp;nbsp;allowing native storage of GEOMETRY and GEOGRAPHY types&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Iceberg 3 includes GEOMETRY AND GEOGRAPHY as part of its &lt;A href="https://github.com/apache/iceberg/blob/main/format/spec.md" target="_self"&gt;official specification&lt;/A&gt;&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Since it's being added to the Parquet specification itself, does it mean it'll soon end up native to Delta as well?&lt;/P&gt;</description>
    <pubDate>Fri, 28 Feb 2025 14:51:13 GMT</pubDate>
    <dc:creator>jordanpinder</dc:creator>
    <dc:date>2025-02-28T14:51:13Z</dc:date>
    <item>
      <title>Native geometry Parquet support</title>
      <link>https://community.databricks.com/t5/data-engineering/native-geometry-parquet-support/m-p/111465#M43900</link>
      <description>&lt;P&gt;Hi there!&lt;/P&gt;&lt;P&gt;With the recent &lt;A href="https://medium.com/radiant-earth-insights/geoparquet-2-0-going-native-840066371c77" target="_self"&gt;GeoParquet 2.0 announcements&lt;/A&gt;, I'm curious to understand how this impacts storing geospatial data in Databricks and Delta. For reference:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;the&amp;nbsp;&lt;SPAN&gt;Parquet specification officially adopting &lt;A href="https://github.com/apache/parquet-format/blob/94b9d631aef332c78b8f1482fb032743a9c3c407/Geospatial.md?plain=1#L27" target="_self"&gt;geospatial guidance&lt;/A&gt;&amp;nbsp;allowing native storage of GEOMETRY and GEOGRAPHY types&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Iceberg 3 includes GEOMETRY AND GEOGRAPHY as part of its &lt;A href="https://github.com/apache/iceberg/blob/main/format/spec.md" target="_self"&gt;official specification&lt;/A&gt;&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Since it's being added to the Parquet specification itself, does it mean it'll soon end up native to Delta as well?&lt;/P&gt;</description>
      <pubDate>Fri, 28 Feb 2025 14:51:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/native-geometry-parquet-support/m-p/111465#M43900</guid>
      <dc:creator>jordanpinder</dc:creator>
      <dc:date>2025-02-28T14:51:13Z</dc:date>
    </item>
    <item>
      <title>Re: Native geometry Parquet support</title>
      <link>https://community.databricks.com/t5/data-engineering/native-geometry-parquet-support/m-p/137751#M50807</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;GeoParquet 2.0’s formalization within the Apache Parquet specification is a significant step for native geospatial data storage across the modern data ecosystem, particularly for platforms like Databricks and Delta Lake. In summary, Delta Lake's reliance on the underlying Parquet format means that, once support is generalized in Parquet, native geospatial types are likely to be supported in Delta too—though speed of adoption also depends on implementation details in the Delta Lake runtime and Databricks platform layers.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Current State of Geospatial Data in Parquet, Iceberg, and Delta&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;GeoParquet 2.0 &amp;amp; Parquet Specification&lt;/STRONG&gt;&lt;BR /&gt;The latest GeoParquet announcement together with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="reset interactable cursor-pointer decoration-1 underline-offset-1 text-super hover:underline font-semibold" href="https://github.com/apache/parquet-format/blob/94b9d631aef332c78b8f1482fb032743a9c3c407/Geospatial.md?plain=1#L27" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class="text-box-trim-both"&gt;Parquet’s official geospatial guidance&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;means GEOMETRY and GEOGRAPHY types have standard encoding rules and metadata. This unifies storage conventions, making geospatial interoperability simpler and less vendor-specific.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Iceberg 3 Specification&lt;/STRONG&gt;&lt;BR /&gt;Iceberg has already specified native support for GEOMETRY and GEOGRAPHY types in its table format. This means that as these types are stored natively in Parquet, Iceberg table engines can leverage them with appropriate semantics for query and indexing.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Delta Lake and Databricks&lt;/STRONG&gt;&lt;BR /&gt;Delta Lake is built atop Parquet. While Delta Lake does not (as of now) maintain a separate geospatial type specification, its tight coupling with Parquet means new features or types (like native GEOMETRY/GEOGRAPHY columns) usually become available once Parquet writers/readers adopt them, provided the Delta transaction log can reference and manage those types.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Implications for Delta Lake&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Native Support Timeline&lt;/STRONG&gt;&lt;BR /&gt;As the Parquet format implements these new types, Databricks and Delta Lake will inherit this support—but full native handling (reading, writing, indexing, and querying) also depends on their internal libraries and Spark integrators being updated to recognize and work with the new Parquet geospatial encodings.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Interoperability&lt;/STRONG&gt;&lt;BR /&gt;Expect increased interoperability between Spark, Databricks, Delta, and external tools (e.g., GDAL, QGIS) as these standards are adopted.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Layered Adoption&lt;/STRONG&gt;&lt;BR /&gt;Full benefit arrives when:&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;The Spark engine (used by Databricks/Delta) natively supports the new Parquet geospatial schemas&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Delta Lake transaction log and APIs understand and preserve these types&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Downstream libraries/tools can query, filter, and optimize over native geospatial columns&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Conclusion&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Yes, as geospatial types become native in Parquet and are utilized in table formats like Iceberg, it’s inevitable they will be adopted “natively” in Delta (i.e., as first-class geospatial columns without extra serialization/deserialization or user hacks)—but the exact timing depends on when Databricks and Delta Lake update their software stacks to fully leverage the new Parquet geospatial features. This adoption is highly likely due to the shared ecosystem, but check Databricks or Delta Lake release notes for specific support timelines as this rolls out.&lt;/P&gt;
&lt;DIV class="group relative"&gt;
&lt;DIV class="w-full overflow-x-auto md:max-w-[90vw] border-subtlest ring-subtlest divide-subtlest bg-transparent"&gt;
&lt;TABLE class="border-subtler my-[1em] w-full table-auto border-separate border-spacing-0 border-l border-t"&gt;
&lt;THEAD class="bg-subtler"&gt;
&lt;TR&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Feature&lt;/TH&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Parquet (GeoParquet 2.0)&lt;/TH&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Iceberg 3&lt;/TH&gt;
&lt;TH class="border-subtler p-sm break-normal border-b border-r text-left align-top"&gt;Delta Lake (future)&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;GEOMETRY/GEOGRAPHY&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Native (official spec)&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Native (spec)&lt;/TD&gt;
&lt;TD class="px-sm border-subtler min-w-[48px] break-normal border-b border-r"&gt;Likely (pending implementation, depends on Parquet support)&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;DIV class="bg-base border-subtler shadow-subtle pointer-coarse:opacity-100 right-xs absolute bottom-0 flex rounded-lg border opacity-0 transition-opacity group-hover:opacity-100 [&amp;amp;&amp;gt;*:not(:first-child)]:border-subtle [&amp;amp;&amp;gt;*:not(:first-child)]:border-l"&gt;
&lt;DIV class="flex"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="flex"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;For practical use, keep a close eye on:&lt;/P&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Delta Lake and Databricks changelogs/releases&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Spark’s geospatial type support PRs/roadmap&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Databricks’ rapid adoption of other Parquet features suggests native geospatial support will arrive soon after it matures in upstream formats.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Nov 2025 12:43:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/native-geometry-parquet-support/m-p/137751#M50807</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-11-05T12:43:03Z</dc:date>
    </item>
  </channel>
</rss>

