01-29-2026 04:14 AM
Hi all,
Lately I've been experimenting with the newish feature of scheduling jobs on a table update trigger. There's one thing thats blokcing me from implementing it however and I was hoping someone found a solution to it.
We occasionally perform a vacuum operation in our PRD environment. This is not synchronized across the platform and as such some of my DP's in the platform receive an update but others dont. When these then get updated on a daily load the day after, it oftentimes begins to run with only half of its tables actually updated.
Do you know of ways to filter on specific table update trigger types (e.g. we can filter on specific operations) or ways to make this more stable?
Thanks for reading!
Wednesday
Hi @Garybary,
Quick clarification on how table update triggers actually behave, because this changes the answer significantly.
Table update triggers fire on data-changing operations only (writes, merges, updates, deletes). A standalone VACUUM does NOT fire the trigger. From the docs: "A table update trigger can be configured to monitor one or more tables for data changes such as updates, merges and deletes." The trigger inspects the operation recorded in the Delta log and filters out pure maintenance commits.
So if you are seeing your job run after what you believe was "just a VACUUM," the most likely cause is that another operation in the same job or sequence wrote a data-changing commit. A few common patterns that look like maintenance but actually fire the trigger:
- DELETE followed by VACUUM (the DELETE is the trigger, not the VACUUM)
- REORG TABLE ... APPLY (PURGE) which rewrites files
- RESTORE TABLE to a prior version
- CLONE (shallow or deep) into the monitored table
- A streaming or batch write running concurrently with the maintenance window
- Predictive optimization or auto-compaction running outside your scheduled window
DIAGNOSING WHAT FIRED THE TRIGGER
The cleanest way to confirm which operation actually triggered the run is to log the commit metadata from the trigger itself and correlate with DESCRIBE HISTORY. Add a small task at the start of the job that captures the dynamic trigger parameter:
Key: updated_tables
Value: {{job.trigger.table_update.updated_tables}}
Then in the task:
import json
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
updated_tables = json.loads(dbutils.widgets.get("updated_tables"))
for entry in updated_tables:
table_name = entry["table_name"]
commit_ts = entry["commit_timestamp"]
print(f"Triggered by {table_name} at {commit_ts}")
spark.sql(f"DESCRIBE HISTORY {table_name} LIMIT 5").show(truncate=False)
Run that for a few of the unexpected firings and look at the "operation" column for the commit at or just before the trigger timestamp. That will tell you exactly which operation caused it (WRITE, MERGE, DELETE, RESTORE, REORG, etc.).
If you do find a case where the operation column shows only VACUUM and the trigger still fired, that would be unexpected behavior and worth reporting with the run ID and table name so the Jobs team can investigate.
ADDITIONAL CONTROLS WORTH KNOWING
- "Minimum time between triggers" lets you batch frequent updates so a flurry of small commits does not fan out into many runs.
- The "All tables updated" condition (when monitoring multiple tables) only fires once every monitored table has had a data change since the last run, useful when your pipeline truly depends on all sources being refreshed.
- If a particular upstream process is producing trigger-firing commits you do not want to act on, separating it into a different table or scheduling it through a path that does not touch the monitored table is usually cleaner than trying to filter at the trigger.
RELEVANT DOCUMENTATION
- Table update triggers: https://docs.databricks.com/aws/en/jobs/trigger-table-update
- Job triggers overview: https://docs.databricks.com/aws/en/jobs/triggers
- Delta table history: https://docs.databricks.com/aws/en/delta/history
- REORG TABLE: https://docs.databricks.com/aws/en/sql/language-manual/delta-reorg-table
- Predictive optimization: https://docs.databricks.com/aws/en/optimizations/predictive-optimization
If you can share the DESCRIBE HISTORY output around one of the unexpected runs, it should be straightforward to pinpoint the actual operation that fired the trigger.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.
01-29-2026 07:48 AM
Hi @Garybary I am not sure if this trigger supports operations filtering or not but I could suggest you another approach where everytime this job gets triggered you can add a extra check on the first cell of this pipeline notebook or code:
02-02-2026 01:34 AM
Hi @saurabh18cs, Thanks for thinking along! I think your methodology works well but I feel like there are some downsides to it. E.g. it still launches the required compute before being told to shut down again. I read some other post that uses the time window for updating the job trigger, that might also provide a solution. By putting the window on +-23 hours we could also stop all unintended updates to trigger a run. It's a bit of a brute force method though :).
Wednesday
Hi @Garybary,
Quick clarification on how table update triggers actually behave, because this changes the answer significantly.
Table update triggers fire on data-changing operations only (writes, merges, updates, deletes). A standalone VACUUM does NOT fire the trigger. From the docs: "A table update trigger can be configured to monitor one or more tables for data changes such as updates, merges and deletes." The trigger inspects the operation recorded in the Delta log and filters out pure maintenance commits.
So if you are seeing your job run after what you believe was "just a VACUUM," the most likely cause is that another operation in the same job or sequence wrote a data-changing commit. A few common patterns that look like maintenance but actually fire the trigger:
- DELETE followed by VACUUM (the DELETE is the trigger, not the VACUUM)
- REORG TABLE ... APPLY (PURGE) which rewrites files
- RESTORE TABLE to a prior version
- CLONE (shallow or deep) into the monitored table
- A streaming or batch write running concurrently with the maintenance window
- Predictive optimization or auto-compaction running outside your scheduled window
DIAGNOSING WHAT FIRED THE TRIGGER
The cleanest way to confirm which operation actually triggered the run is to log the commit metadata from the trigger itself and correlate with DESCRIBE HISTORY. Add a small task at the start of the job that captures the dynamic trigger parameter:
Key: updated_tables
Value: {{job.trigger.table_update.updated_tables}}
Then in the task:
import json
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
updated_tables = json.loads(dbutils.widgets.get("updated_tables"))
for entry in updated_tables:
table_name = entry["table_name"]
commit_ts = entry["commit_timestamp"]
print(f"Triggered by {table_name} at {commit_ts}")
spark.sql(f"DESCRIBE HISTORY {table_name} LIMIT 5").show(truncate=False)
Run that for a few of the unexpected firings and look at the "operation" column for the commit at or just before the trigger timestamp. That will tell you exactly which operation caused it (WRITE, MERGE, DELETE, RESTORE, REORG, etc.).
If you do find a case where the operation column shows only VACUUM and the trigger still fired, that would be unexpected behavior and worth reporting with the run ID and table name so the Jobs team can investigate.
ADDITIONAL CONTROLS WORTH KNOWING
- "Minimum time between triggers" lets you batch frequent updates so a flurry of small commits does not fan out into many runs.
- The "All tables updated" condition (when monitoring multiple tables) only fires once every monitored table has had a data change since the last run, useful when your pipeline truly depends on all sources being refreshed.
- If a particular upstream process is producing trigger-firing commits you do not want to act on, separating it into a different table or scheduling it through a path that does not touch the monitored table is usually cleaner than trying to filter at the trigger.
RELEVANT DOCUMENTATION
- Table update triggers: https://docs.databricks.com/aws/en/jobs/trigger-table-update
- Job triggers overview: https://docs.databricks.com/aws/en/jobs/triggers
- Delta table history: https://docs.databricks.com/aws/en/delta/history
- REORG TABLE: https://docs.databricks.com/aws/en/sql/language-manual/delta-reorg-table
- Predictive optimization: https://docs.databricks.com/aws/en/optimizations/predictive-optimization
If you can share the DESCRIBE HISTORY output around one of the unexpected runs, it should be straightforward to pinpoint the actual operation that fired the trigger.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.