<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Query a &amp;quot;partition metadata logging&amp;quot; enabled external parquet table on Databricks SQL in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/query-a-quot-partition-metadata-logging-quot-enabled-external/m-p/120935#M46281</link>
    <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;We have a pretty large hive-partitioned parquet table on S3, we followed the&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/tables/external-partition-discovery#enable-partition-metadata-logging" target="_self"&gt;document&lt;/A&gt;&amp;nbsp;to recreate the table with&amp;nbsp;partition metadata logging on Unity Catalog.&lt;/P&gt;&lt;P&gt;We're using Databricks Runtime 16.4 LTS, but despite the &lt;A href="https://docs.databricks.com/aws/en/release-notes/runtime/16.4lts#move-partition-metadata-log-enablement-anchor-to-table" target="_self"&gt;release note&lt;/A&gt; mentioned that&amp;nbsp;partition metadata logging setting would be anchored to the table, we noticed that all query sessions must set&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SET spark.databricks.nonDelta.partitionLog.enabled = true;&lt;/LI-CODE&gt;&lt;P&gt;so the query wouldn't scan all directories.&lt;/P&gt;&lt;P&gt;With DBR clusters we can make this part of the cluster's spark config, but when using Databricks SQL, it doesn't allow us to set this config, and it doesn't seem to honor the table settings automatically either. It would simply scan all directories and cause queries being extremely slow.&lt;/P&gt;&lt;P&gt;We tried both current and preview channels, but the behaviors are the same. Is there anyway we can make Databricks SQL honor the&amp;nbsp;partition metadata logging settings?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 04 Jun 2025 14:10:26 GMT</pubDate>
    <dc:creator>Samael</dc:creator>
    <dc:date>2025-06-04T14:10:26Z</dc:date>
    <item>
      <title>Query a "partition metadata logging" enabled external parquet table on Databricks SQL</title>
      <link>https://community.databricks.com/t5/data-engineering/query-a-quot-partition-metadata-logging-quot-enabled-external/m-p/120935#M46281</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;We have a pretty large hive-partitioned parquet table on S3, we followed the&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/tables/external-partition-discovery#enable-partition-metadata-logging" target="_self"&gt;document&lt;/A&gt;&amp;nbsp;to recreate the table with&amp;nbsp;partition metadata logging on Unity Catalog.&lt;/P&gt;&lt;P&gt;We're using Databricks Runtime 16.4 LTS, but despite the &lt;A href="https://docs.databricks.com/aws/en/release-notes/runtime/16.4lts#move-partition-metadata-log-enablement-anchor-to-table" target="_self"&gt;release note&lt;/A&gt; mentioned that&amp;nbsp;partition metadata logging setting would be anchored to the table, we noticed that all query sessions must set&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SET spark.databricks.nonDelta.partitionLog.enabled = true;&lt;/LI-CODE&gt;&lt;P&gt;so the query wouldn't scan all directories.&lt;/P&gt;&lt;P&gt;With DBR clusters we can make this part of the cluster's spark config, but when using Databricks SQL, it doesn't allow us to set this config, and it doesn't seem to honor the table settings automatically either. It would simply scan all directories and cause queries being extremely slow.&lt;/P&gt;&lt;P&gt;We tried both current and preview channels, but the behaviors are the same. Is there anyway we can make Databricks SQL honor the&amp;nbsp;partition metadata logging settings?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jun 2025 14:10:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-a-quot-partition-metadata-logging-quot-enabled-external/m-p/120935#M46281</guid>
      <dc:creator>Samael</dc:creator>
      <dc:date>2025-06-04T14:10:26Z</dc:date>
    </item>
    <item>
      <title>Re: Query a "partition metadata logging" enabled external parquet table on Databricks SQL</title>
      <link>https://community.databricks.com/t5/data-engineering/query-a-quot-partition-metadata-logging-quot-enabled-external/m-p/120954#M46287</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/166260"&gt;@Samael&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The documentation states that partition metadata logging should persist once enabled on a table during creation&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/tables/external-partition-discovery" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/tables/external-partition-discovery&lt;/A&gt;, &lt;A href="https://docs.databricks.com/en/tables/external-partition-discovery.html" target="_blank"&gt;https://docs.databricks.com/en/tables/external-partition-discovery.html&lt;/A&gt;,&lt;BR /&gt;but you're experiencing that Databricks SQL warehouses don't automatically honor this setting, requiring manual configuration of spark.databricks.nonDelta.partitionLog.enabled = true in each session.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Current Workarounds&lt;/STRONG&gt;&lt;BR /&gt;Since Databricks SQL warehouses don't allow you to set Spark configurations directly, here are several approaches you can try:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1. Use SQL Warehouses with Custom Spark Configurations&lt;/STRONG&gt;&lt;BR /&gt;Some organizations have found success by working with Databricks support to enable custom Spark configurations on SQL warehouses for specific use cases.&lt;BR /&gt;This isn't a standard feature, but may be available for enterprise customers.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2. Create Views with Partition Hints&lt;/STRONG&gt;&lt;BR /&gt;You can create views that include partition predicates to help the query optimizer:&lt;/P&gt;&lt;P&gt;CREATE OR REPLACE VIEW your_table_optimized AS&lt;BR /&gt;SELECT * FROM your_table&lt;BR /&gt;WHERE partition_column IS NOT NULL&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;3. Use Databricks Runtime Clusters for Heavy Queries&lt;/STRONG&gt;&lt;BR /&gt;For queries that require partition metadata logging, consider using regular Databricks clusters (where you can set the Spark config)&lt;BR /&gt;rather than SQL warehouses.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;4. Table Properties Alternative&lt;/STRONG&gt;&lt;BR /&gt;Try setting table properties that might influence query planning:&lt;/P&gt;&lt;P&gt;ALTER TABLE your_table SET TBLPROPERTIES (&lt;BR /&gt;'spark.databricks.nonDelta.partitionLog.enabled' = 'true'&lt;BR /&gt;);&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;5. Contact Databricks Support&lt;/STRONG&gt;&lt;BR /&gt;This appears to be a gap between the documented behavior and actual implementation in SQL warehouses.&lt;BR /&gt;I'd recommend opening a support ticket with Databricks, as this seems like either:&lt;BR /&gt;- A bug where SQL warehouses should honor the table-level partition metadata logging setting&lt;BR /&gt;- A missing feature that should be prioritized given the performance impact.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Long-term Considerations&lt;/STRONG&gt;&lt;BR /&gt;Given that you're using DBR 16.4 LTS, you might also want to consider:&lt;BR /&gt;- Migrating to Delta Lake format if feasible, which has better partition handling&lt;BR /&gt;- Evaluating whether the partition strategy is still optimal for your query patterns&lt;BR /&gt;- Using liquid clustering (if applicable) for better performance without traditional partitioning&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jun 2025 16:40:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-a-quot-partition-metadata-logging-quot-enabled-external/m-p/120954#M46287</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-06-04T16:40:37Z</dc:date>
    </item>
    <item>
      <title>Re: Query a "partition metadata logging" enabled external parquet table on Databricks SQL</title>
      <link>https://community.databricks.com/t5/data-engineering/query-a-quot-partition-metadata-logging-quot-enabled-external/m-p/122088#M46647</link>
      <description>&lt;P&gt;Thanks for helping!&lt;/P&gt;&lt;P&gt;Setting table properties unfortunately didn't do the trick. We ended up have a view that points to the latest partition like this for fast queries:&amp;nbsp;&lt;/P&gt;&lt;P&gt;SELECT&lt;BR /&gt;*&lt;BR /&gt;FROM&lt;BR /&gt;parquet.`s3://bucket/prefix/partition_column_date=20250616/`&lt;/P&gt;&lt;P&gt;We haven't found a better solution yet.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jun 2025 07:11:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-a-quot-partition-metadata-logging-quot-enabled-external/m-p/122088#M46647</guid>
      <dc:creator>Samael</dc:creator>
      <dc:date>2025-06-18T07:11:16Z</dc:date>
    </item>
  </channel>
</rss>

