cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Live Tables: Too much time to do the "setting up"

jorgemarmol
New Contributor II

Hello community!

Recently I have been working in delta live table for a big project. My team and me have been studying so much and finally we have built a good pipeline with CDC that load 608 entities (and, therefore, 608 delta live tables and 608 materialized views).

The issue is not the load, although there are many tables, the issue is the setting up process. The total time is 1h 30 min in which almost 1 h (or more) is setting up the tables process. 

jorgemarmol_0-1688633577282.png

 

Anyone have an idea of why could have been happening it or how could I accelerate this first process? Thank you so much! 

10 REPLIES 10

PearceR
New Contributor III

I have been having similar problems, any insight into this would be great.

anandh
New Contributor II

I am facing same issues while acquiring the resource to run the DLT pipeline and its taking too much time to setup resource for DLT run.

PearceR
New Contributor III

I had to scrap using DLT in the end for so many entities, the other option is to run in continuously but this is expensive.

-werners-
Esteemed Contributor III

Interesting...
DLT probably spends x seconds/table for the setup.
If you have time, you could do some tests to see if the table setup scales linearly (1 table, 5 sec for setup, 10 tables 50 sec etc).
If you do, please share the outcome.

charl-p-botha
New Contributor III

Same issue here. Medallion architecture in a test setup with 150 tables. SETTING_UP_TABLES takes 11 to 15 minutes, everything else together takes 10 minutes to do the incremental update (40K bronze rows, these fan out to asset tables, not much data!).

This is with serverless DLT, ADLS gen2 storage.

Any ideas how to make SETTING_UP_TABLES faster with this number of tables?

charl-p-botha
New Contributor III

We got a performance boost with the latest DLT preview release, see https://community.databricks.com/t5/data-engineering/thank-you-for-the-quot-setting-up-tables-quot-s...

With my 300 table test setup, this speedup was consistent.

I'm currently testing with 900 tables (there's a 1000 table limit for a DLT pipeline) to see how it goes, but I am also considering trying it without DLT.

In the end we ditched DLT, gives nice advantages ot a quick setup but it couldn't handle the entity scaling. Can do everything dlt can do in the native setup (and workflows are handy) so I would recommend the switch.

charl-p-botha
New Contributor III

Thank you @PearceR for your recommendation, which aligns nicely with my experimentation up to now. I'll get started on moving away from DLT, which indeed was very handy for getting everything set up quickly.

It's a pity we can't have our cake and eat it. ๐Ÿ˜‰

Good luck!

DataEngineer
New Contributor II

Increase the worker and driver to higher configuraion on the pipeline. It will take initially for setting up but once the setup is completed, the ingestion would be faster. Here you can save the one hour took for ingestion..

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group