07-06-2023 01:54 AM
Hello community!
Recently I have been working in delta live table for a big project. My team and me have been studying so much and finally we have built a good pipeline with CDC that load 608 entities (and, therefore, 608 delta live tables and 608 materialized views).
The issue is not the load, although there are many tables, the issue is the setting up process. The total time is 1h 30 min in which almost 1 h (or more) is setting up the tables process.
Anyone have an idea of why could have been happening it or how could I accelerate this first process? Thank you so much!
07-16-2023 05:12 AM
I have been having similar problems, any insight into this would be great.
01-09-2024 04:20 AM
I am facing same issues while acquiring the resource to run the DLT pipeline and its taking too much time to setup resource for DLT run.
01-09-2024 06:03 AM
I had to scrap using DLT in the end for so many entities, the other option is to run in continuously but this is expensive.
01-09-2024 07:44 AM
Interesting...
DLT probably spends x seconds/table for the setup.
If you have time, you could do some tests to see if the table setup scales linearly (1 table, 5 sec for setup, 10 tables 50 sec etc).
If you do, please share the outcome.
3 weeks ago
Same issue here. Medallion architecture in a test setup with 150 tables. SETTING_UP_TABLES takes 11 to 15 minutes, everything else together takes 10 minutes to do the incremental update (40K bronze rows, these fan out to asset tables, not much data!).
This is with serverless DLT, ADLS gen2 storage.
Any ideas how to make SETTING_UP_TABLES faster with this number of tables?
3 weeks ago
We got a performance boost with the latest DLT preview release, see https://community.databricks.com/t5/data-engineering/thank-you-for-the-quot-setting-up-tables-quot-s...
With my 300 table test setup, this speedup was consistent.
I'm currently testing with 900 tables (there's a 1000 table limit for a DLT pipeline) to see how it goes, but I am also considering trying it without DLT.
3 weeks ago
In the end we ditched DLT, gives nice advantages ot a quick setup but it couldn't handle the entity scaling. Can do everything dlt can do in the native setup (and workflows are handy) so I would recommend the switch.
2 weeks ago
Thank you @PearceR for your recommendation, which aligns nicely with my experimentation up to now. I'll get started on moving away from DLT, which indeed was very handy for getting everything set up quickly.
It's a pity we can't have our cake and eat it. 😉
2 weeks ago
Good luck!
2 weeks ago
Increase the worker and driver to higher configuraion on the pipeline. It will take initially for setting up but once the setup is completed, the ingestion would be faster. Here you can save the one hour took for ingestion..
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group