cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Streaming delta table - Performance with incremental refresh

Fnazar
New Contributor

Hi Team,

We are hitting performance issues with Streaming live delta table specifically when evaluating large tables of more than 10million rows. 
What are the workarounds to handle these streaming live tables in an attempt to load these large tables. 
Also, if we can use partition by then help me with the syntax please

Thanks

1 REPLY 1

Priyanka_Biswas
Valued Contributor
Valued Contributor

Hi @Fnazar 

When dealing with streaming data, you might end up with many small files, which can be inefficient. Use Delta Lake's OPTIMIZE command to compact files into larger ones and ZORDER to colocate related information in the same set of files. This is particularly useful for columns that are often queried together.

Select a column that results in evenly distributed data. Common choices include dates (for time-based data) or some form of categorical data that is well balanced.

When creating or writing to a Delta table, you can specify the partitioning using the PARTITION BY clause. For instance, if you're partitioning by a date column: df.write.format("delta").partitionBy("date_column").save("/mnt/delta/my_table")

This command will create partitions in the Delta table based on unique values in the date_column

If you're ingesting streaming data into Delta Lake, consider using Auto Loader for efficient and incremental processing of new data.

https://docs.delta.io/latest/best-practices.html

https://docs.databricks.com/en/sql/language-manual/delta-optimize.html

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.