<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Scheduling jobs with table update triggers in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/scheduling-jobs-with-table-update-triggers/m-p/150182#M53291</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/140116"&gt;@Garybary&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;This is a common scenario when using table update triggers. Currently, table update triggers do not support filtering by operation type. The trigger fires on any commit to the Delta transaction log, and VACUUM does write a commit entry to the log (you can verify this with DESCRIBE HISTORY on the table). So a VACUUM on any of your monitored tables will look like a "table update" to the trigger, even though no actual data changed.&lt;/P&gt;
&lt;P&gt;Here are a few approaches to work around this:&lt;/P&gt;
&lt;P&gt;OPTION 1: ADD A GUARD CHECK AT THE START OF YOUR JOB&lt;/P&gt;
&lt;P&gt;Use the dynamic job parameter that provides the list of updated tables along with commit metadata. In the first task of your job, query DESCRIBE HISTORY on each triggered table and check whether the most recent commit was a data-changing operation or just a maintenance operation (VACUUM, OPTIMIZE, etc.).&lt;/P&gt;
&lt;P&gt;Delta history includes a column called "operation" which tells you exactly what type of commit it was (WRITE, MERGE, DELETE, VACUUM, OPTIMIZE, etc.). You can use this to short-circuit the job early if the trigger was caused only by maintenance operations.&lt;/P&gt;
&lt;P&gt;Example Python task at the start of your job:&lt;/P&gt;
&lt;P&gt;import json&lt;BR /&gt;from pyspark.sql import SparkSession&lt;/P&gt;
&lt;P&gt;spark = SparkSession.builder.getOrCreate()&lt;/P&gt;
&lt;P&gt;# Get the list of updated tables from the trigger parameter&lt;BR /&gt;updated_tables_json = dbutils.widgets.get("updated_tables")&lt;BR /&gt;updated_tables = json.loads(updated_tables_json)&lt;/P&gt;
&lt;P&gt;# Check if any table had a real data change (not just VACUUM/OPTIMIZE)&lt;BR /&gt;maintenance_ops = {"VACUUM", "OPTIMIZE", "FSCK", "REORG"}&lt;BR /&gt;has_data_change = False&lt;/P&gt;
&lt;P&gt;for table_name in updated_tables:&lt;BR /&gt;history = spark.sql(f"DESCRIBE HISTORY {table_name} LIMIT 1").collect()&lt;BR /&gt;if history:&lt;BR /&gt;last_op = history[0]["operation"]&lt;BR /&gt;if last_op not in maintenance_ops:&lt;BR /&gt;has_data_change = True&lt;BR /&gt;break&lt;/P&gt;
&lt;P&gt;if not has_data_change:&lt;BR /&gt;dbutils.notebook.exit("SKIP: triggered by maintenance operation only")&lt;/P&gt;
&lt;P&gt;Then configure the downstream tasks to use "Run if dependencies" conditions so they only run when the guard task succeeds with a data-change result.&lt;/P&gt;
&lt;P&gt;When setting up this job, pass the dynamic parameter to the first task by adding a job parameter:&lt;/P&gt;
&lt;P&gt;Key: updated_tables&lt;BR /&gt;Value: {{job.trigger.table_update.updated_tables}}&lt;/P&gt;
&lt;P&gt;OPTION 2: USE THE "ALL TABLES UPDATED" CONDITION&lt;/P&gt;
&lt;P&gt;If your job depends on multiple source tables all being refreshed, set the trigger condition to "All tables updated" rather than "Any table updated." This way, a VACUUM on just one or two tables will not cause the job to run. The trigger will only fire once all monitored tables have been updated since the last run. This does not completely solve the problem (if all tables get vacuumed, it would still fire), but it significantly reduces false triggers in environments where VACUUM runs are staggered.&lt;/P&gt;
&lt;P&gt;OPTION 3: COORDINATE YOUR VACUUM SCHEDULE&lt;/P&gt;
&lt;P&gt;Consider scheduling VACUUM operations during a known maintenance window and combine them with the "Minimum time between triggers" setting. For example, if your daily data loads happen at 8:00 AM and your VACUUM jobs run at 2:00 AM, you could set the minimum time between triggers to avoid responding to changes during the maintenance window. This is not a filter, but it can help batch the trigger so that a VACUUM at 2:00 AM and a data load at 8:00 AM do not cause two separate runs.&lt;/P&gt;
&lt;P&gt;OPTION 4: USE A WRAPPER JOB PATTERN&lt;/P&gt;
&lt;P&gt;Instead of triggering your main pipeline directly, create a lightweight "dispatcher" job that fires on table updates. The dispatcher checks whether the updates are data changes (using the DESCRIBE HISTORY approach from Option 1), and if so, programmatically triggers your main pipeline job via the Databricks Jobs API.&lt;/P&gt;
&lt;P&gt;import requests&lt;/P&gt;
&lt;P&gt;# After confirming a real data change occurred...&lt;BR /&gt;response = requests.post(&lt;BR /&gt;f"https://{workspace_url}/api/2.1/jobs/run-now",&lt;BR /&gt;headers={"Authorization": f"Bearer {token}"},&lt;BR /&gt;json={"job_id": YOUR_MAIN_JOB_ID}&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;RELEVANT DOCUMENTATION&lt;/P&gt;
&lt;P&gt;- Table update triggers: &lt;A href="https://docs.databricks.com/aws/en/jobs/trigger-table-update" target="_blank"&gt;https://docs.databricks.com/aws/en/jobs/trigger-table-update&lt;/A&gt;&lt;BR /&gt;- Job triggers overview: &lt;A href="https://docs.databricks.com/aws/en/jobs/triggers" target="_blank"&gt;https://docs.databricks.com/aws/en/jobs/triggers&lt;/A&gt;&lt;BR /&gt;- Delta table history: &lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/history&lt;/A&gt;&lt;BR /&gt;- VACUUM documentation: &lt;A href="https://docs.databricks.com/aws/en/delta/vacuum" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/vacuum&lt;/A&gt;&lt;BR /&gt;- Dynamic job parameters for triggers: &lt;A href="https://docs.databricks.com/aws/en/jobs/trigger-table-update" target="_blank"&gt;https://docs.databricks.com/aws/en/jobs/trigger-table-update&lt;/A&gt; (see "Reference updated tables and commit timestamps in job configurations")&lt;/P&gt;
&lt;P&gt;The ability to filter triggers by operation type would be a useful enhancement. If this is important to your workflow, I would encourage submitting a feature request through the Databricks Ideas portal, as community votes help prioritize the roadmap.&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
    <pubDate>Sun, 08 Mar 2026 07:31:15 GMT</pubDate>
    <dc:creator>SteveOstrowski</dc:creator>
    <dc:date>2026-03-08T07:31:15Z</dc:date>
    <item>
      <title>Scheduling jobs with table update triggers</title>
      <link>https://community.databricks.com/t5/data-engineering/scheduling-jobs-with-table-update-triggers/m-p/145725#M52569</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;Lately I've been experimenting with the newish feature of scheduling jobs on a table update trigger. There's one thing thats blokcing me from implementing it however and I was hoping someone found a solution to it.&lt;/P&gt;&lt;P&gt;We occasionally perform a vacuum operation in our PRD environment. This is not synchronized across the platform and as such some of my DP's in the platform receive an update but others dont. When these then get updated on a daily load the day after, it oftentimes begins to run with only half of its tables actually updated.&lt;/P&gt;&lt;P&gt;Do you know of ways to filter on specific table update trigger types (e.g. we can filter on specific operations) or ways to make this more stable?&lt;/P&gt;&lt;P&gt;Thanks for reading!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Jan 2026 12:14:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scheduling-jobs-with-table-update-triggers/m-p/145725#M52569</guid>
      <dc:creator>Garybary</dc:creator>
      <dc:date>2026-01-29T12:14:09Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling jobs with table update triggers</title>
      <link>https://community.databricks.com/t5/data-engineering/scheduling-jobs-with-table-update-triggers/m-p/145765#M52573</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/140116"&gt;@Garybary&lt;/a&gt;&amp;nbsp;I am not sure if this trigger supports operations filtering or not but I could suggest you another approach where everytime this job gets triggered you can add a extra check on the first cell of this pipeline notebook or code:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&lt;!--   ScriptorStartFragment   --&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;tbl = "catalog.schema.table_name"&amp;nbsp;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;last = (spark.sql(f"DESCRIBE HISTORY {tbl}")&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .orderBy("version", ascending=False)&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .limit(1)&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .collect()[0])&amp;nbsp;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;op = (last["operation"] or "").upper()&amp;nbsp;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;MAINT_OPS = {"VACUUM", "OPTIMIZE", "ZORDER BY"}&amp;nbsp;&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;if op in MAINT_OPS:&lt;/SPAN&gt;&lt;DIV class=""&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; dbutils.notebook.exit(f"SKIP: maintenance operation detected: {op}")&lt;!--   ScriptorEndFragment   --&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Jan 2026 15:48:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scheduling-jobs-with-table-update-triggers/m-p/145765#M52573</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2026-01-29T15:48:17Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling jobs with table update triggers</title>
      <link>https://community.databricks.com/t5/data-engineering/scheduling-jobs-with-table-update-triggers/m-p/146500#M52661</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;, Thanks for thinking along! I think your methodology works well but I feel like there are some downsides to it. E.g. it still launches the required compute before being told to shut down again. I read some other post that uses the time window for updating the job trigger, that might also provide a solution. By putting the window on +-23 hours we could also stop all unintended updates to trigger a run. It's a bit of a brute force method though :).&lt;/P&gt;</description>
      <pubDate>Mon, 02 Feb 2026 09:34:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scheduling-jobs-with-table-update-triggers/m-p/146500#M52661</guid>
      <dc:creator>Garybary</dc:creator>
      <dc:date>2026-02-02T09:34:24Z</dc:date>
    </item>
    <item>
      <title>Re: Scheduling jobs with table update triggers</title>
      <link>https://community.databricks.com/t5/data-engineering/scheduling-jobs-with-table-update-triggers/m-p/150182#M53291</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/140116"&gt;@Garybary&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;This is a common scenario when using table update triggers. Currently, table update triggers do not support filtering by operation type. The trigger fires on any commit to the Delta transaction log, and VACUUM does write a commit entry to the log (you can verify this with DESCRIBE HISTORY on the table). So a VACUUM on any of your monitored tables will look like a "table update" to the trigger, even though no actual data changed.&lt;/P&gt;
&lt;P&gt;Here are a few approaches to work around this:&lt;/P&gt;
&lt;P&gt;OPTION 1: ADD A GUARD CHECK AT THE START OF YOUR JOB&lt;/P&gt;
&lt;P&gt;Use the dynamic job parameter that provides the list of updated tables along with commit metadata. In the first task of your job, query DESCRIBE HISTORY on each triggered table and check whether the most recent commit was a data-changing operation or just a maintenance operation (VACUUM, OPTIMIZE, etc.).&lt;/P&gt;
&lt;P&gt;Delta history includes a column called "operation" which tells you exactly what type of commit it was (WRITE, MERGE, DELETE, VACUUM, OPTIMIZE, etc.). You can use this to short-circuit the job early if the trigger was caused only by maintenance operations.&lt;/P&gt;
&lt;P&gt;Example Python task at the start of your job:&lt;/P&gt;
&lt;P&gt;import json&lt;BR /&gt;from pyspark.sql import SparkSession&lt;/P&gt;
&lt;P&gt;spark = SparkSession.builder.getOrCreate()&lt;/P&gt;
&lt;P&gt;# Get the list of updated tables from the trigger parameter&lt;BR /&gt;updated_tables_json = dbutils.widgets.get("updated_tables")&lt;BR /&gt;updated_tables = json.loads(updated_tables_json)&lt;/P&gt;
&lt;P&gt;# Check if any table had a real data change (not just VACUUM/OPTIMIZE)&lt;BR /&gt;maintenance_ops = {"VACUUM", "OPTIMIZE", "FSCK", "REORG"}&lt;BR /&gt;has_data_change = False&lt;/P&gt;
&lt;P&gt;for table_name in updated_tables:&lt;BR /&gt;history = spark.sql(f"DESCRIBE HISTORY {table_name} LIMIT 1").collect()&lt;BR /&gt;if history:&lt;BR /&gt;last_op = history[0]["operation"]&lt;BR /&gt;if last_op not in maintenance_ops:&lt;BR /&gt;has_data_change = True&lt;BR /&gt;break&lt;/P&gt;
&lt;P&gt;if not has_data_change:&lt;BR /&gt;dbutils.notebook.exit("SKIP: triggered by maintenance operation only")&lt;/P&gt;
&lt;P&gt;Then configure the downstream tasks to use "Run if dependencies" conditions so they only run when the guard task succeeds with a data-change result.&lt;/P&gt;
&lt;P&gt;When setting up this job, pass the dynamic parameter to the first task by adding a job parameter:&lt;/P&gt;
&lt;P&gt;Key: updated_tables&lt;BR /&gt;Value: {{job.trigger.table_update.updated_tables}}&lt;/P&gt;
&lt;P&gt;OPTION 2: USE THE "ALL TABLES UPDATED" CONDITION&lt;/P&gt;
&lt;P&gt;If your job depends on multiple source tables all being refreshed, set the trigger condition to "All tables updated" rather than "Any table updated." This way, a VACUUM on just one or two tables will not cause the job to run. The trigger will only fire once all monitored tables have been updated since the last run. This does not completely solve the problem (if all tables get vacuumed, it would still fire), but it significantly reduces false triggers in environments where VACUUM runs are staggered.&lt;/P&gt;
&lt;P&gt;OPTION 3: COORDINATE YOUR VACUUM SCHEDULE&lt;/P&gt;
&lt;P&gt;Consider scheduling VACUUM operations during a known maintenance window and combine them with the "Minimum time between triggers" setting. For example, if your daily data loads happen at 8:00 AM and your VACUUM jobs run at 2:00 AM, you could set the minimum time between triggers to avoid responding to changes during the maintenance window. This is not a filter, but it can help batch the trigger so that a VACUUM at 2:00 AM and a data load at 8:00 AM do not cause two separate runs.&lt;/P&gt;
&lt;P&gt;OPTION 4: USE A WRAPPER JOB PATTERN&lt;/P&gt;
&lt;P&gt;Instead of triggering your main pipeline directly, create a lightweight "dispatcher" job that fires on table updates. The dispatcher checks whether the updates are data changes (using the DESCRIBE HISTORY approach from Option 1), and if so, programmatically triggers your main pipeline job via the Databricks Jobs API.&lt;/P&gt;
&lt;P&gt;import requests&lt;/P&gt;
&lt;P&gt;# After confirming a real data change occurred...&lt;BR /&gt;response = requests.post(&lt;BR /&gt;f"https://{workspace_url}/api/2.1/jobs/run-now",&lt;BR /&gt;headers={"Authorization": f"Bearer {token}"},&lt;BR /&gt;json={"job_id": YOUR_MAIN_JOB_ID}&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;RELEVANT DOCUMENTATION&lt;/P&gt;
&lt;P&gt;- Table update triggers: &lt;A href="https://docs.databricks.com/aws/en/jobs/trigger-table-update" target="_blank"&gt;https://docs.databricks.com/aws/en/jobs/trigger-table-update&lt;/A&gt;&lt;BR /&gt;- Job triggers overview: &lt;A href="https://docs.databricks.com/aws/en/jobs/triggers" target="_blank"&gt;https://docs.databricks.com/aws/en/jobs/triggers&lt;/A&gt;&lt;BR /&gt;- Delta table history: &lt;A href="https://docs.databricks.com/aws/en/delta/history" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/history&lt;/A&gt;&lt;BR /&gt;- VACUUM documentation: &lt;A href="https://docs.databricks.com/aws/en/delta/vacuum" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/vacuum&lt;/A&gt;&lt;BR /&gt;- Dynamic job parameters for triggers: &lt;A href="https://docs.databricks.com/aws/en/jobs/trigger-table-update" target="_blank"&gt;https://docs.databricks.com/aws/en/jobs/trigger-table-update&lt;/A&gt; (see "Reference updated tables and commit timestamps in job configurations")&lt;/P&gt;
&lt;P&gt;The ability to filter triggers by operation type would be a useful enhancement. If this is important to your workflow, I would encourage submitting a feature request through the Databricks Ideas portal, as community votes help prioritize the roadmap.&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2026 07:31:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/scheduling-jobs-with-table-update-triggers/m-p/150182#M53291</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T07:31:15Z</dc:date>
    </item>
  </channel>
</rss>

