cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Query a "partition metadata logging" enabled external parquet table on Databricks SQL

Samael
New Contributor II

Hi there,

We have a pretty large hive-partitioned parquet table on S3, we followed the document to recreate the table with partition metadata logging on Unity Catalog.

We're using Databricks Runtime 16.4 LTS, but despite the release note mentioned that partition metadata logging setting would be anchored to the table, we noticed that all query sessions must set

SET spark.databricks.nonDelta.partitionLog.enabled = true;

so the query wouldn't scan all directories.

With DBR clusters we can make this part of the cluster's spark config, but when using Databricks SQL, it doesn't allow us to set this config, and it doesn't seem to honor the table settings automatically either. It would simply scan all directories and cause queries being extremely slow.

We tried both current and preview channels, but the behaviors are the same. Is there anyway we can make Databricks SQL honor the partition metadata logging settings?

 

2 REPLIES 2

lingareddy_Alva
Honored Contributor II

Hi @Samael 

The documentation states that partition metadata logging should persist once enabled on a table during creation
https://learn.microsoft.com/en-us/azure/databricks/tables/external-partition-discovery, https://docs.databricks.com/en/tables/external-partition-discovery.html,
but you're experiencing that Databricks SQL warehouses don't automatically honor this setting, requiring manual configuration of spark.databricks.nonDelta.partitionLog.enabled = true in each session.

Current Workarounds
Since Databricks SQL warehouses don't allow you to set Spark configurations directly, here are several approaches you can try:

1. Use SQL Warehouses with Custom Spark Configurations
Some organizations have found success by working with Databricks support to enable custom Spark configurations on SQL warehouses for specific use cases.
This isn't a standard feature, but may be available for enterprise customers.

2. Create Views with Partition Hints
You can create views that include partition predicates to help the query optimizer:

CREATE OR REPLACE VIEW your_table_optimized AS
SELECT * FROM your_table
WHERE partition_column IS NOT NULL

3. Use Databricks Runtime Clusters for Heavy Queries
For queries that require partition metadata logging, consider using regular Databricks clusters (where you can set the Spark config)
rather than SQL warehouses.

4. Table Properties Alternative
Try setting table properties that might influence query planning:

ALTER TABLE your_table SET TBLPROPERTIES (
'spark.databricks.nonDelta.partitionLog.enabled' = 'true'
);

5. Contact Databricks Support
This appears to be a gap between the documented behavior and actual implementation in SQL warehouses.
I'd recommend opening a support ticket with Databricks, as this seems like either:
- A bug where SQL warehouses should honor the table-level partition metadata logging setting
- A missing feature that should be prioritized given the performance impact.

Long-term Considerations
Given that you're using DBR 16.4 LTS, you might also want to consider:
- Migrating to Delta Lake format if feasible, which has better partition handling
- Evaluating whether the partition strategy is still optimal for your query patterns
- Using liquid clustering (if applicable) for better performance without traditional partitioning

 

LR

Samael
New Contributor II

Thanks for helping!

Setting table properties unfortunately didn't do the trick. We ended up have a view that points to the latest partition like this for fast queries: 

SELECT
*
FROM
parquet.`s3://bucket/prefix/partition_column_date=20250616/`

We haven't found a better solution yet.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now