cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT setup taking longer than actually building the tables

dsldan
New Contributor II

Hi all!

We are using DLT for our ETL jobs, and we're noticing the setup steps (Initializing, Resetting tables. Setting up tables, Rendering graph) are taking much longer than actually ETL'ing the data into our tables. We have about 110 tables combined in our bronze, silver and gold layers. It seems using a bigger compute cluster does not help reduce this time.

Is there any way to speed this up?

Cheers,

Daan

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@daan duppenโ€‹ :

There are a few strategies you could try to optimize your Delta Live Table setup time:

  1. Minimize the number of tables: 110 tables is a significant number, and the setup time for each table can add up quickly. Consider consolidating tables where possible or removing any tables that are no longer needed.
  2. Optimize your table schema: Ensure that the schema of your tables is as streamlined as possible. This can include removing unnecessary columns, using appropriate data types, and leveraging Delta Lake optimizations such as partitioning and clustering.
  3. Optimize your data source: If your data is coming from an external source, consider optimizing that source to reduce the amount of data that needs to be read and processed during setup.
  4. Optimize your ETL code: Ensure that your ETL code is as efficient as possible, with appropriate use of parallelism and optimized SQL queries.
  5. Leverage caching: Consider using caching to reduce the amount of time required to load data into your tables. This can be particularly effective if you are repeatedly querying the same data set.
  6. Leverage pre-processing: Depending on the nature of your data, it may be possible to pre-process it before loading it into your tables. For example, you could aggregate data at a higher level before loading it into your tables, or remove any unnecessary data that is not required for downstream processing.
  7. Use a performance profiling tool: Use a performance profiling tool to identify any bottlenecks in your ETL job, which could be impacting setup time. This can help you identify areas for optimization.

It's important to note that the specific strategies you should use will depend on the specifics of your data and use case. You may need to experiment with different strategies and measure the impact on setup time to find the most effective approach.

View solution in original post

2 REPLIES 2

Anonymous
Not applicable

@daan duppenโ€‹ :

There are a few strategies you could try to optimize your Delta Live Table setup time:

  1. Minimize the number of tables: 110 tables is a significant number, and the setup time for each table can add up quickly. Consider consolidating tables where possible or removing any tables that are no longer needed.
  2. Optimize your table schema: Ensure that the schema of your tables is as streamlined as possible. This can include removing unnecessary columns, using appropriate data types, and leveraging Delta Lake optimizations such as partitioning and clustering.
  3. Optimize your data source: If your data is coming from an external source, consider optimizing that source to reduce the amount of data that needs to be read and processed during setup.
  4. Optimize your ETL code: Ensure that your ETL code is as efficient as possible, with appropriate use of parallelism and optimized SQL queries.
  5. Leverage caching: Consider using caching to reduce the amount of time required to load data into your tables. This can be particularly effective if you are repeatedly querying the same data set.
  6. Leverage pre-processing: Depending on the nature of your data, it may be possible to pre-process it before loading it into your tables. For example, you could aggregate data at a higher level before loading it into your tables, or remove any unnecessary data that is not required for downstream processing.
  7. Use a performance profiling tool: Use a performance profiling tool to identify any bottlenecks in your ETL job, which could be impacting setup time. This can help you identify areas for optimization.

It's important to note that the specific strategies you should use will depend on the specifics of your data and use case. You may need to experiment with different strategies and measure the impact on setup time to find the most effective approach.

Anonymous
Not applicable

Hi @daan duppenโ€‹ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group