<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT Pipeline failing (due &amp;amp;gt; 500 tables) any graph tables limitation in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-failing-due-amp-gt-500-tables-any-graph-tables/m-p/79889#M35871</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/112779"&gt;@venkatgmf&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Yeah, you are right that high number of tables could be a problem&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;If you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the driver are insufficient.To manage the ingestion of a large number of tables, you can consider batching the tables. You can create multiple DLT pipelines, each handling a subset of the tables. This way, you can distribute the load across multiple pipelines, reducing the pressure on a single pipeline and potentially mitigating the GC issue.In terms of compute type on Azure, you might want to consider using larger VM sizes for your Databricks clusters, especially for the driver node, to handle the load of reading a large number of tables. The choice of VM size would depend on the size and complexity of your tables.Also, consider tuning the Spark configurations related to memory management and GC. For instance, you can adjust the Spark driver memory, the fraction of memory dedicated to Spark's storage and execution, and the GC settings.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Could attach also cluster logs? Also, take a look on below articles to find out most probable cause of this issue&lt;/P&gt;&lt;P&gt;&lt;A href="https://kb.databricks.com/en_US/jobs/driver-unavailable" target="_blank" rel="noopener"&gt;https://kb.databricks.com/en_US/jobs/driver-unavailable&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 22 Jul 2024 15:15:12 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2024-07-22T15:15:12Z</dc:date>
    <item>
      <title>DLT Pipeline failing (due &amp;gt; 500 tables) any graph tables limitation</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-failing-due-amp-gt-500-tables-any-graph-tables/m-p/79879#M35869</link>
      <description>&lt;P&gt;DLT Pipeline Faling due to &lt;SPAN&gt;&lt;SPAN class=""&gt;INTERNAL_ERROR: Communication lost with driver. Cluster 0719-162209-rx37csry was not reachable for 120 seconds&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="DLT communication error.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/9778i2C3E06FA2311F160/image-size/large/is-moderation-mode/true?v=v2&amp;amp;px=999" role="button" title="DLT communication error.png" alt="DLT communication error.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 14:26:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-failing-due-amp-gt-500-tables-any-graph-tables/m-p/79879#M35869</guid>
      <dc:creator>venkatgmf</dc:creator>
      <dc:date>2024-07-22T14:26:26Z</dc:date>
    </item>
    <item>
      <title>Re: DLT Pipeline failing (due &amp;gt; 500 tables) any graph tables limitation</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-pipeline-failing-due-amp-gt-500-tables-any-graph-tables/m-p/79889#M35871</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/112779"&gt;@venkatgmf&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Yeah, you are right that high number of tables could be a problem&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;If you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the driver are insufficient.To manage the ingestion of a large number of tables, you can consider batching the tables. You can create multiple DLT pipelines, each handling a subset of the tables. This way, you can distribute the load across multiple pipelines, reducing the pressure on a single pipeline and potentially mitigating the GC issue.In terms of compute type on Azure, you might want to consider using larger VM sizes for your Databricks clusters, especially for the driver node, to handle the load of reading a large number of tables. The choice of VM size would depend on the size and complexity of your tables.Also, consider tuning the Spark configurations related to memory management and GC. For instance, you can adjust the Spark driver memory, the fraction of memory dedicated to Spark's storage and execution, and the GC settings.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Could attach also cluster logs? Also, take a look on below articles to find out most probable cause of this issue&lt;/P&gt;&lt;P&gt;&lt;A href="https://kb.databricks.com/en_US/jobs/driver-unavailable" target="_blank" rel="noopener"&gt;https://kb.databricks.com/en_US/jobs/driver-unavailable&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 15:15:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-pipeline-failing-due-amp-gt-500-tables-any-graph-tables/m-p/79889#M35871</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-07-22T15:15:12Z</dc:date>
    </item>
  </channel>
</rss>

