<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7854#M3622</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Could you please confirm your cluster configuration details? Also, did you verify the network configuration between the Control plane and Dataplane? &lt;/P&gt;&lt;P&gt;please tag&amp;nbsp;&lt;A href="https://community.databricks.com/s/profile/0053f000000WWwvAAG" alt="https://community.databricks.com/s/profile/0053f000000WWwvAAG" target="_blank"&gt;@Debayan&lt;/A&gt;​&amp;nbsp;with your next response which will notify me, Thank you!&lt;/P&gt;</description>
    <pubDate>Mon, 13 Mar 2023 06:22:01 GMT</pubDate>
    <dc:creator>Debayan</dc:creator>
    <dc:date>2023-03-13T06:22:01Z</dc:date>
    <item>
      <title>DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7853#M3621</link>
      <description>&lt;P&gt;Dear Community, Hope you are doing well.&lt;/P&gt;&lt;P&gt;For the last couple of days I am seeing very strange issues with my DLT pipeline, So every 60-70 mins it is getting failed in &lt;B&gt;&lt;I&gt;continuous mode&lt;/I&gt;&lt;/B&gt;, with the &lt;B&gt;&lt;U&gt;ERROR; &lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds&lt;/U&gt;. at &lt;/B&gt;2023-03-12 21:26:51&amp;nbsp;IST&lt;B&gt; &lt;/B&gt; (Please see the screenshot), &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="DLT_ERROR"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/556iB5B719D04E32B648/image-size/large?v=v2&amp;amp;px=999" role="button" title="DLT_ERROR" alt="DLT_ERROR" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I try to check the events for the driver, It says that &lt;B&gt;"Cluster terminated by system-user"&lt;/B&gt; (at 2023-03-12 21:26:47 IST), and Could not find any details associated with this event. And this is happening again and again, every time Pipeline re-starts and runs around 1 hour or sometimes 1.5 hours fine, and then the same. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="DLT_Cluster_events"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/552i5F074E01591F9BB2/image-size/large?v=v2&amp;amp;px=999" role="button" title="DLT_Cluster_events" alt="DLT_Cluster_events" /&gt;&lt;/span&gt;Could anyone please help us with the priority, what, and why it starts happening suddenly? Because no changes were done in the pipeline code as well as data volume recently. Also, I tried increasing workers from 6 to 10. And issue remains the same.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please note earlier it was running fine with 6 clusters as well. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Provider: Azure Databricks &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Any help on priority will be really appreciated, as this is impacting our Production Data pipelines. &lt;/B&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 12 Mar 2023 17:20:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7853#M3621</guid>
      <dc:creator>vgupta</dc:creator>
      <dc:date>2023-03-12T17:20:31Z</dc:date>
    </item>
    <item>
      <title>Re: DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7855#M3623</link>
      <description>&lt;P&gt;Thanks @Debayan Mukherjee​&amp;nbsp;, Thanks for your response. &lt;/P&gt;&lt;P&gt;Below is the screenshot for cluster configurations&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/548i1161DE1C102A6937/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;:&lt;/P&gt;&lt;P&gt;And If I understand correctly, As of now, we do not have any restrictions at the network layer between the control plane and data plane, these all are default.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/557i22EB590F983E0A75/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;Please guide me through if you are looking for anything specific for networking configurations. &lt;/P&gt;</description>
      <pubDate>Tue, 14 Mar 2023 07:18:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7855#M3623</guid>
      <dc:creator>vgupta</dc:creator>
      <dc:date>2023-03-14T07:18:16Z</dc:date>
    </item>
    <item>
      <title>Re: DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7857#M3625</link>
      <description>&lt;P&gt;Hi @Vishnu Gupta​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please help us select the best solution by clicking on "Select As Best" if it does.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Your feedback will help us ensure that we are providing the best possible service to you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; Thank you!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 18 Mar 2023 07:31:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7857#M3625</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-03-18T07:31:01Z</dc:date>
    </item>
    <item>
      <title>Re: DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7854#M3622</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Could you please confirm your cluster configuration details? Also, did you verify the network configuration between the Control plane and Dataplane? &lt;/P&gt;&lt;P&gt;please tag&amp;nbsp;&lt;A href="https://community.databricks.com/s/profile/0053f000000WWwvAAG" alt="https://community.databricks.com/s/profile/0053f000000WWwvAAG" target="_blank"&gt;@Debayan&lt;/A&gt;​&amp;nbsp;with your next response which will notify me, Thank you!&lt;/P&gt;</description>
      <pubDate>Mon, 13 Mar 2023 06:22:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7854#M3622</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-03-13T06:22:01Z</dc:date>
    </item>
    <item>
      <title>Re: DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7856#M3624</link>
      <description>&lt;P&gt;Hi @Vishnu Gupta​&amp;nbsp;, thanks for the details. &lt;/P&gt;&lt;P&gt;You can refer to &lt;A href="https://kb.databricks.com/en_US/jobs/driver-unavailable" alt="https://kb.databricks.com/en_US/jobs/driver-unavailable" target="_blank"&gt;https://kb.databricks.com/en_US/jobs/driver-unavailable&lt;/A&gt; which probably the issue here. &lt;/P&gt;&lt;P&gt;Please let us know if this helps, please tag&amp;nbsp;&lt;A href="https://community.databricks.com/s/profile/0053f000000WWwvAAG" alt="https://community.databricks.com/s/profile/0053f000000WWwvAAG" target="_blank"&gt;@Debayan&lt;/A&gt;​&amp;nbsp;with your next response which will notify me, Thank you!&lt;/P&gt;</description>
      <pubDate>Thu, 16 Mar 2023 05:49:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/7856#M3624</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-03-16T05:49:00Z</dc:date>
    </item>
    <item>
      <title>Re: DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluste</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/44068#M27600</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/26078"&gt;@Debayan&lt;/a&gt;&amp;nbsp;, I am facing same issue, while running Delta live table, This job is running in produtcuion, but it's not working in dev, i have tried to increae the worker nodes but no use. Can you please help on this.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Reddy24_0-1694165445213.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/3608i5C7CCAAB57D9E8AC/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="Reddy24_0-1694165445213.png" alt="Reddy24_0-1694165445213.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Sep 2023 09:31:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/44068#M27600</guid>
      <dc:creator>Reddy-24</dc:creator>
      <dc:date>2023-09-08T09:31:19Z</dc:date>
    </item>
    <item>
      <title>Re: DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluste</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/117851#M45577</link>
      <description>&lt;P&gt;We had similar error for one the DLT pipeline, This could be some times because of compute size, we had increased compute size of server in your DLT pipelines, still we were seeing this error while processing very large file.&amp;nbsp;&lt;/P&gt;&lt;P&gt;we then added below parameter to the DLT pipeline configuration, as the default timeout is 120s which increased to 3600s, then the pipeline ran successfully&lt;/P&gt;&lt;P&gt;pipeline.timeout=3600s&lt;BR /&gt;pipeline.clusterShutdown.delay=120s&lt;/P&gt;</description>
      <pubDate>Tue, 06 May 2025 08:53:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-cluster-terminated-by-system-user-internal-error/m-p/117851#M45577</guid>
      <dc:creator>Rahiman</dc:creator>
      <dc:date>2025-05-06T08:53:49Z</dc:date>
    </item>
  </channel>
</rss>

