cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Conflict between Predictive Optimization and High Frequency Writes

aonurdemir
Contributor

(Dear Moderators, why do you remove this question? It is a genuine question. Do not, please. )

We have a continuous dlt pipeline that has tables updating every minute and partitioned by "partition_key" column. Table is 4 TB. Has 16k files. Sometimes we are getting below error caused by predictive optimization conflict. After exception, pipeline restarts itself and continues operation but we want to avoid this and we also don't want to disable predictive optimization. What do you recommend?

Error:

[DELTA_METADATA_CHANGED] MetadataChangedException: The metadata of the Delta table has been changed by a concurrent update. Please try the operation again.
Conflicting commit: {"timestamp":1771277469269,"userId":"******","userName":"*********","operation":"SET TBLPROPERTIES","operationParameters":{"properties":{"delta.workloadBasedColumns.optimizerStatistics":"`partition_key`"}},"job":{"jobId":"*****","jobName":"Predictive Optimization Job-*****","jobRunId":"*****","runId":"*****","jobOwnerId":"*****","triggerType":"manual"},"clusterId":"*****","readVersion":247493,"isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{},"tags":{"maintenance":"true","delta.rowTracking.preserved":"true"},"engineInfo":"Databricks-Runtime/18.0.x-aarch64-photon-scala2.13","txnId":"******"}

Code is this:

@DP.table(
    name="***",
    comment="***",  
    partition_cols=["partition_key"],
    table_properties={
        "description": "***",
        # thought it can solve the problem, appearently it did not
        "delta.workloadBasedColumns.optimizerStatistics": "partition_key"
    }
)
def ***():
    df = (
        spark.readStream.format("cloudFiles")
        .option("cloudFiles.format", "json")
        # we intentionally do not use schema since this stream has overall 700 columns and constantly evolving.
        .option("cloudFiles.inferColumnTypes", "true")
        .option("cloudFiles.useManagedFileEvents","true")
        .option("cloudFiles.schemaEvolutionMode", "rescue")
        .load(f"***")
    )
 

 

3 REPLIES 3

saurabh18cs
Honored Contributor III

Hi @aonurdemir This is expected behaviour , with default WriteSerializable isolation this will happen. There is a conflic period when both are updating at the same time,

maybe you can try limiting optimization window to certain scheduled window so less conflict happens when main jon is not processing any data?

 

balajij8
Contributor

The issue arises as Predictive Optimization runs optimization when you have high frequency ingestion, With high frequency writes - transaction log grows rapidly leading to optimization and writes colliding.

You can

  • Move to 30-minute table updates.
  • Move to Explicit Liquid Clustering
  • Move to manual Optimization for the specific tables

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @aonurdemir,

This is a well-known conflict pattern in Delta Lake, and the root cause is clearly documented. Let me break it down and give you the concrete options.


ROOT CAUSE

The Databricks documentation on isolation levels and write conflicts explicitly states:

"Metadata changes cause all concurrent write operations to fail. These operations include changes to table protocol, table properties, or data schema."

This applies regardless of isolation level (both WriteSerializable and Serializable). The table properties documentation further confirms:

"All operations that set or update table properties conflict with other concurrent write operations, causing them to fail."

What is happening in your case: Predictive Optimization's ANALYZE operation is updating the delta.workloadBasedColumns.optimizerStatistics table property via an implicit SET TBLPROPERTIES. Because your continuous DLT pipeline writes every minute, there is a high probability that a write microbatch is in-flight when this metadata change commits, causing the MetadataChangedException.

Docs:
- Isolation levels and write conflicts: https://docs.databricks.com/en/delta/isolation-level.html
- Delta table properties: https://docs.databricks.com/en/delta/table-properties.html
- Concurrency control: https://docs.databricks.com/en/delta/concurrency-control.html


WHY SETTING THE PROPERTY YOURSELF DOES NOT HELP

You mentioned you tried manually setting delta.workloadBasedColumns.optimizerStatistics. This does not prevent the conflict because:

1. Predictive Optimization's ANALYZE operation will still attempt to UPDATE the property with fresh statistics values, even if it already exists.
2. The conflict is triggered by the act of committing a SET TBLPROPERTIES transaction, not by the property being absent.


SOLUTION 1: DISABLE PREDICTIVE OPTIMIZATION ON THIS SPECIFIC TABLE (RECOMMENDED)

You can disable Predictive Optimization at the individual table level:

ALTER TABLE your_catalog.your_schema.your_table DISABLE PREDICTIVE OPTIMIZATION;

This stops all three PO operations (OPTIMIZE, VACUUM, ANALYZE) for this table only. All other tables in your schema will continue to benefit from PO.

To restore inheritance later:

ALTER TABLE your_catalog.your_schema.your_table INHERIT PREDICTIVE OPTIMIZATION;

You can then run OPTIMIZE and VACUUM manually during low-write windows (e.g., via a scheduled job), while avoiding the ANALYZE statistics collection that triggers the SET TBLPROPERTIES conflict.

This is the most reliable solution since it eliminates the conflict at its source while keeping PO active for the rest of your tables.

Docs: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-alter-table.html


SOLUTION 2: DEDICATED SCHEMA FOR HIGH-FREQUENCY TABLES

If you have multiple tables with high-frequency writes, consider placing them in a dedicated schema with PO disabled:

ALTER SCHEMA your_catalog.high_frequency_schema DISABLE PREDICTIVE OPTIMIZATION;

Then schedule manual maintenance (OPTIMIZE, VACUUM) via Databricks Jobs on your own cadence, timed to avoid peak write windows.


SOLUTION 3: CONSIDER LIQUID CLUSTERING (LONGER-TERM)

Liquid clustering replaces traditional partitioning and ZORDER. While it does NOT directly prevent the SET TBLPROPERTIES metadata conflict from ANALYZE, it offers several benefits for high-frequency write workloads:

- OPTIMIZE operates incrementally, only rewriting files that need clustering (reducing the window for conflicts with OPTIMIZE specifically)
- It works with streaming tables in Databricks Runtime 16.0+
- It is the recommended approach for "tables that have concurrent write requirements" per the documentation
- You can modify clustering keys without rewriting existing data

However, migrating to liquid clustering alone will NOT solve the ANALYZE/TBLPROPERTIES conflict. You would still need to combine it with Solution 1 or 2.

Docs: https://docs.databricks.com/en/delta/clustering.html


SOLUTION 4: ACCEPT THE RESTART (STATUS QUO WITH MONITORING)

DLT continuous pipelines are designed to recover from transient failures. The MetadataChangedException causes a retry, and the pipeline resumes from its checkpoint. If the restart time is acceptable (typically seconds), you could:

- Add alerting/monitoring to track how frequently this occurs
- Accept it as a known transient failure that self-heals
- Confirm that no data is lost (Delta Lake's exactly-once guarantee with checkpointing ensures this)

This is a pragmatic approach if the restart frequency is low (e.g., a few times per day).

Docs: https://docs.databricks.com/en/structured-streaming/delta-lake.html


WHAT I RECOMMEND

Solution 1 is the cleanest fix: disable PO on this specific table, then schedule your own OPTIMIZE and VACUUM jobs timed to run during quiet periods. For example:

-- Disable PO on the problematic table
ALTER TABLE your_catalog.your_schema.your_table DISABLE PREDICTIVE OPTIMIZATION;

-- Then create a scheduled Databricks Job that runs:
OPTIMIZE your_catalog.your_schema.your_table;
VACUUM your_catalog.your_schema.your_table;
-- Schedule this during a known low-write window


DOCUMENTATION REFERENCES

- Isolation levels and write conflicts: https://docs.databricks.com/en/delta/isolation-level.html
- Delta table properties: https://docs.databricks.com/en/delta/table-properties.html
- Concurrency control: https://docs.databricks.com/en/delta/concurrency-control.html
- ALTER TABLE syntax: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-alter-table.html
- Predictive Optimization: https://docs.databricks.com/en/optimizations/predictive-optimization.html
- Liquid clustering: https://docs.databricks.com/en/delta/clustering.html
- Streaming with Delta Lake: https://docs.databricks.com/en/structured-streaming/delta-lake.html

Hope this helps! Let me know if you have questions about any of these approaches.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.