topic Query a "partition metadata logging" enabled external parquet table on Databricks SQL in Data Engineering

Query a "partition metadata logging" enabled external parquet table on Databricks SQL

Samael — Wed, 04 Jun 2025 14:10:26 GMT

Hi there,

We have a pretty large hive-partitioned parquet table on S3, we followed the document to recreate the table with partition metadata logging on Unity Catalog.

We're using Databricks Runtime 16.4 LTS, but despite the release note mentioned that partition metadata logging setting would be anchored to the table, we noticed that all query sessions must set

SET spark.databricks.nonDelta.partitionLog.enabled = true;

so the query wouldn't scan all directories.

With DBR clusters we can make this part of the cluster's spark config, but when using Databricks SQL, it doesn't allow us to set this config, and it doesn't seem to honor the table settings automatically either. It would simply scan all directories and cause queries being extremely slow.

We tried both current and preview channels, but the behaviors are the same. Is there anyway we can make Databricks SQL honor the partition metadata logging settings?

Re: Query a "partition metadata logging" enabled external parquet table on Databricks SQL

lingareddy_Alva — Wed, 04 Jun 2025 16:40:37 GMT

Hi @Samael

The documentation states that partition metadata logging should persist once enabled on a table during creation
https://learn.microsoft.com/en-us/azure/databricks/tables/external-partition-discovery, https://docs.databricks.com/en/tables/external-partition-discovery.html,
but you're experiencing that Databricks SQL warehouses don't automatically honor this setting, requiring manual configuration of spark.databricks.nonDelta.partitionLog.enabled = true in each session.

Current Workarounds
Since Databricks SQL warehouses don't allow you to set Spark configurations directly, here are several approaches you can try:

1. Use SQL Warehouses with Custom Spark Configurations
Some organizations have found success by working with Databricks support to enable custom Spark configurations on SQL warehouses for specific use cases.
This isn't a standard feature, but may be available for enterprise customers.

2. Create Views with Partition Hints
You can create views that include partition predicates to help the query optimizer:

CREATE OR REPLACE VIEW your_table_optimized AS
SELECT * FROM your_table
WHERE partition_column IS NOT NULL

3. Use Databricks Runtime Clusters for Heavy Queries
For queries that require partition metadata logging, consider using regular Databricks clusters (where you can set the Spark config)
rather than SQL warehouses.

4. Table Properties Alternative
Try setting table properties that might influence query planning:

ALTER TABLE your_table SET TBLPROPERTIES (
'spark.databricks.nonDelta.partitionLog.enabled' = 'true'
);

5. Contact Databricks Support
This appears to be a gap between the documented behavior and actual implementation in SQL warehouses.
I'd recommend opening a support ticket with Databricks, as this seems like either:
- A bug where SQL warehouses should honor the table-level partition metadata logging setting
- A missing feature that should be prioritized given the performance impact.

Long-term Considerations
Given that you're using DBR 16.4 LTS, you might also want to consider:
- Migrating to Delta Lake format if feasible, which has better partition handling
- Evaluating whether the partition strategy is still optimal for your query patterns
- Using liquid clustering (if applicable) for better performance without traditional partitioning

Re: Query a "partition metadata logging" enabled external parquet table on Databricks SQL

Samael — Wed, 18 Jun 2025 07:11:16 GMT

Thanks for helping!

Setting table properties unfortunately didn't do the trick. We ended up have a view that points to the latest partition like this for fast queries:

SELECT
*
FROM
parquet.`s3://bucket/prefix/partition_column_date=20250616/`

We haven't found a better solution yet.