cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT Pipeline failing (due > 500 tables) any graph tables limitation

venkatgmf
New Contributor II

DLT Pipeline Faling due to INTERNAL_ERROR: Communication lost with driver. Cluster 0719-162209-rx37csry was not reachable for 120 secondsDLT communication error.png

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Contributor

Hi @venkatgmf ,

Yeah, you are right that high number of tables could be a problem 

If you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the driver are insufficient.To manage the ingestion of a large number of tables, you can consider batching the tables. You can create multiple DLT pipelines, each handling a subset of the tables. This way, you can distribute the load across multiple pipelines, reducing the pressure on a single pipeline and potentially mitigating the GC issue.In terms of compute type on Azure, you might want to consider using larger VM sizes for your Databricks clusters, especially for the driver node, to handle the load of reading a large number of tables. The choice of VM size would depend on the size and complexity of your tables.Also, consider tuning the Spark configurations related to memory management and GC. For instance, you can adjust the Spark driver memory, the fraction of memory dedicated to Spark's storage and execution, and the GC settings.

Could attach also cluster logs? Also, take a look on below articles to find out most probable cause of this issue

https://kb.databricks.com/en_US/jobs/driver-unavailable

View solution in original post

1 REPLY 1

szymon_dybczak
Contributor

Hi @venkatgmf ,

Yeah, you are right that high number of tables could be a problem 

If you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the driver are insufficient.To manage the ingestion of a large number of tables, you can consider batching the tables. You can create multiple DLT pipelines, each handling a subset of the tables. This way, you can distribute the load across multiple pipelines, reducing the pressure on a single pipeline and potentially mitigating the GC issue.In terms of compute type on Azure, you might want to consider using larger VM sizes for your Databricks clusters, especially for the driver node, to handle the load of reading a large number of tables. The choice of VM size would depend on the size and complexity of your tables.Also, consider tuning the Spark configurations related to memory management and GC. For instance, you can adjust the Spark driver memory, the fraction of memory dedicated to Spark's storage and execution, and the GC settings.

Could attach also cluster logs? Also, take a look on below articles to find out most probable cause of this issue

https://kb.databricks.com/en_US/jobs/driver-unavailable

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group