<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Observable API and Delta Table merge in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/observable-api-and-delta-table-merge/m-p/152875#M53890</link>
    <description>&lt;DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9268"&gt;@Malthe&lt;/a&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;You have hit a very specific, known behavioral gap in how Apache Spark and Delta Lake interact.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;BR /&gt;
&lt;DIV&gt;&lt;SPAN&gt;To answer your question directly: &lt;STRONG&gt;Yes&lt;/STRONG&gt;, the Observable API is effectively incompatible with Delta Table merges when used directly.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;BR /&gt;
&lt;DIV&gt;&lt;STRONG&gt;Why It Hangs Indefinitely&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;The deadlock you are experiencing boils down to how Delta Lake plans its queries versus how Spark listens for metrics:&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;How Observation works:&lt;/STRONG&gt; The pyspark.sql.Observation object relies on standard Spark actions (like .collect(), .count(), or .write()) to complete. When these actions finish, they trigger a background physical query execution event that populates your observation object.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;How Delta MERGE works:&lt;/STRONG&gt; A Delta MERGE is not processed as a standard Spark action. Internally, the Delta engine intercepts the logical plan, heavily modifies it to figure out matching/non-matching rows, and executes custom physical writes.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;The Clash:&lt;/STRONG&gt; During this plan rewriting, the logical node attached by .observe() often gets stripped out or fails to trigger the expected listener event.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;The Hang:&lt;/STRONG&gt; Because the background listener never receives the signal that the data flowed through, Observation.get defaults to its fallback behavior: waiting forever.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV&gt;&lt;SPAN&gt;This is magnified inside foreachBatch because you are dealing with a static micro-batch DataFrame, but the core issue remains the Delta execution plan itself.&lt;BR /&gt;&lt;BR /&gt;Feel free to add more info if i have misunderstood your issue or for a workaround.&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Wed, 01 Apr 2026 08:53:48 GMT</pubDate>
    <dc:creator>AnthonyAnand</dc:creator>
    <dc:date>2026-04-01T08:53:48Z</dc:date>
    <item>
      <title>Observable API and Delta Table merge</title>
      <link>https://community.databricks.com/t5/data-engineering/observable-api-and-delta-table-merge/m-p/150575#M53474</link>
      <description>&lt;P&gt;Using the Observable API on the source dataframe to a Delta Table merge seems to hang indefinitely.&lt;/P&gt;&lt;P&gt;Steps to reproduce:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Create one or more &lt;A href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Observation.html" target="_self"&gt;pyspark.sql.Observation&lt;/A&gt; objects.&lt;/LI&gt;&lt;LI&gt;Use &lt;A href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.observe.html" target="_self"&gt;DataFrame.observe&lt;/A&gt; on the merge source.&lt;/LI&gt;&lt;LI&gt;Run merge.&lt;/LI&gt;&lt;LI&gt;Accessing &lt;A href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Observation.get.html#pyspark.sql.Observation.get" target="_self"&gt;Observation.get&lt;/A&gt; blocks indefinitely.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The source dataframe here is a batch dataframe, executed within the&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/structured-streaming/foreach" target="_self"&gt;foreachBatch&lt;/A&gt;&amp;nbsp;framework on a streaming data source.&lt;/P&gt;&lt;P&gt;Is the Observable API not compatible with Delta Table merges?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Mar 2026 11:01:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/observable-api-and-delta-table-merge/m-p/150575#M53474</guid>
      <dc:creator>Malthe</dc:creator>
      <dc:date>2026-03-11T11:01:23Z</dc:date>
    </item>
    <item>
      <title>Re: Observable API and Delta Table merge</title>
      <link>https://community.databricks.com/t5/data-engineering/observable-api-and-delta-table-merge/m-p/152875#M53890</link>
      <description>&lt;DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9268"&gt;@Malthe&lt;/a&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;You have hit a very specific, known behavioral gap in how Apache Spark and Delta Lake interact.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;BR /&gt;
&lt;DIV&gt;&lt;SPAN&gt;To answer your question directly: &lt;STRONG&gt;Yes&lt;/STRONG&gt;, the Observable API is effectively incompatible with Delta Table merges when used directly.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;BR /&gt;
&lt;DIV&gt;&lt;STRONG&gt;Why It Hangs Indefinitely&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;The deadlock you are experiencing boils down to how Delta Lake plans its queries versus how Spark listens for metrics:&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;How Observation works:&lt;/STRONG&gt; The pyspark.sql.Observation object relies on standard Spark actions (like .collect(), .count(), or .write()) to complete. When these actions finish, they trigger a background physical query execution event that populates your observation object.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;How Delta MERGE works:&lt;/STRONG&gt; A Delta MERGE is not processed as a standard Spark action. Internally, the Delta engine intercepts the logical plan, heavily modifies it to figure out matching/non-matching rows, and executes custom physical writes.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;The Clash:&lt;/STRONG&gt; During this plan rewriting, the logical node attached by .observe() often gets stripped out or fails to trigger the expected listener event.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;STRONG&gt;The Hang:&lt;/STRONG&gt; Because the background listener never receives the signal that the data flowed through, Observation.get defaults to its fallback behavior: waiting forever.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV&gt;&lt;SPAN&gt;This is magnified inside foreachBatch because you are dealing with a static micro-batch DataFrame, but the core issue remains the Delta execution plan itself.&lt;BR /&gt;&lt;BR /&gt;Feel free to add more info if i have misunderstood your issue or for a workaround.&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Wed, 01 Apr 2026 08:53:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/observable-api-and-delta-table-merge/m-p/152875#M53890</guid>
      <dc:creator>AnthonyAnand</dc:creator>
      <dc:date>2026-04-01T08:53:48Z</dc:date>
    </item>
  </channel>
</rss>

