cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta live tables for large number of tables

priyanananthram
New Contributor II

Hi There 

I am hoping for some guidance I have some 850 tables that I need to ingest using  a DLT Pipeline. When I do this my event log shows that driver node dies becomes unresponsive likely due to GC.

Can DLT be used to ingest large number of tables

Is there some way for me to batch these tables so that I can create dlt tables 50 odd at a time.My tables will be streaming tables and hte plan is for them to run continuously

What can I do to ameliorate these?

I am on azure cloud is there a particular compute type that would be beneficial to read larger number of tables ?

Kind Regards

Priya

4 REPLIES 4

Faisal
Contributor

This can be controlled at workflow level, my opinion would be to batch it basis schema

priyanananthram
New Contributor II

The only issue is though hat the tables are largely from one schema 🙂 I wonder if there is an upper limit on the number of tables in a dlt pipeline/

Sidhant07
New Contributor III
New Contributor III

Delta Live Tables (DLT) can indeed be used to ingest a large number of tables. However, if you're experiencing issues with the driver node becoming unresponsive due to garbage collection (GC), it might be a sign that the resources allocated to the driver are insufficient.To manage the ingestion of a large number of tables, you can consider batching the tables. You can create multiple DLT pipelines, each handling a subset of the tables. This way, you can distribute the load across multiple pipelines, reducing the pressure on a single pipeline and potentially mitigating the GC issue.In terms of compute type on Azure, you might want to consider using larger VM sizes for your Databricks clusters, especially for the driver node, to handle the load of reading a large number of tables. The choice of VM size would depend on the size and complexity of your tables.Also, consider tuning the Spark configurations related to memory management and GC. For instance, you can adjust the Spark driver memory, the fraction of memory dedicated to Spark's storage and execution, and the GC settings.

 

Kaniz
Community Manager
Community Manager

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.