Hi @Samael
The documentation states that partition metadata logging should persist once enabled on a table during creation
https://learn.microsoft.com/en-us/azure/databricks/tables/external-partition-discovery, https://docs.databricks.com/en/tables/external-partition-discovery.html,
but you're experiencing that Databricks SQL warehouses don't automatically honor this setting, requiring manual configuration of spark.databricks.nonDelta.partitionLog.enabled = true in each session.
Current Workarounds
Since Databricks SQL warehouses don't allow you to set Spark configurations directly, here are several approaches you can try:
1. Use SQL Warehouses with Custom Spark Configurations
Some organizations have found success by working with Databricks support to enable custom Spark configurations on SQL warehouses for specific use cases.
This isn't a standard feature, but may be available for enterprise customers.
2. Create Views with Partition Hints
You can create views that include partition predicates to help the query optimizer:
CREATE OR REPLACE VIEW your_table_optimized AS
SELECT * FROM your_table
WHERE partition_column IS NOT NULL
3. Use Databricks Runtime Clusters for Heavy Queries
For queries that require partition metadata logging, consider using regular Databricks clusters (where you can set the Spark config)
rather than SQL warehouses.
4. Table Properties Alternative
Try setting table properties that might influence query planning:
ALTER TABLE your_table SET TBLPROPERTIES (
'spark.databricks.nonDelta.partitionLog.enabled' = 'true'
);
5. Contact Databricks Support
This appears to be a gap between the documented behavior and actual implementation in SQL warehouses.
I'd recommend opening a support ticket with Databricks, as this seems like either:
- A bug where SQL warehouses should honor the table-level partition metadata logging setting
- A missing feature that should be prioritized given the performance impact.
Long-term Considerations
Given that you're using DBR 16.4 LTS, you might also want to consider:
- Migrating to Delta Lake format if feasible, which has better partition handling
- Evaluating whether the partition strategy is still optimal for your query patterns
- Using liquid clustering (if applicable) for better performance without traditional partitioning
LR