Topics with Label: Optimize Command

Forum Posts

Sorted by:

by alejandrofm • Valued Contributor

04-20-2023 5:44:19 AM

5694 Views
2 replies
2 kudos

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

Hi! I'm optimizing several Tb of partitioned data on ZSTD lvl 9.It surprises me the level of shuffle write, it could make sense because of ZORDER but I want to be sure that I'm not missing something, here is some context: Could I be missing something...

Data Engineering

5694 Views
2 replies
2 kudos

04-20-2023 5:44:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-23-2023 8:05:20 AM

2 kudos

Hi @Alejandro Martinez Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

2 kudos

04-23-2023 8:05:20 AM

1 More Replies

by User16835756816 • Databricks Employee

01-23-2023 3:55:06 PM

5174 Views
3 replies
1 kudos

How can I optimize my data pipeline?

Delta Lake provides optimizations that can help you accelerate your data lake operations. Here’s how you can improve query speed by optimizing the layout of data in storage.There are two ways you can optimize your data pipeline: 1) Notebook Optimizat...

Data Engineering

5174 Views
3 replies
1 kudos

01-23-2023 3:55:06 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-24-2023 10:40:50 AM

1 kudos

some tips from me:Look for data skews; some partitions can be huge, some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit (get getNumPartitions()), especially SQL can divide it unequally to parti...

1 kudos

01-24-2023 10:40:50 AM

2 More Replies

by MaximS • New Contributor

12-16-2022 7:25:35 AM

2010 Views
1 replies
1 kudos

OPTIMIZE command failed to complete on partitioned dataset

Trying to optimize delta table with following stats:size: 212,848 blobs, 31,162,417,246,985 bytescommand: OPTIMIZE <table> ZORDER BY (X, Y, Z)In Spark UI I can see all work divided to batches, and each batch start with 400 tasks to collect data. But ...

Data Engineering

2010 Views
1 replies
1 kudos

12-16-2022 7:25:35 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-17-2022 10:49:34 PM

1 kudos

can you share some sample datasets for this by that we can debug and help you accordingly ThanksAviral

1 kudos

12-17-2022 10:49:34 PM

by NOOR_BASHASHAIK • Contributor

10-22-2022 9:59:36 AM

6345 Views
4 replies
4 kudos

Azure Databricks VM type for OPTIMIZE with ZORDER on a single column

DearsI was trying to check what Azure Databricks VM type is best suited for executing OPTIMIZE with ZORDER on a single timestamp value (but string data type) column for around 5000+ tables in the Delta Lake.I chose Standard_F16s_v2 with 6 workers & 1...

Data Engineering

6345 Views
4 replies
4 kudos

10-22-2022 9:59:36 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-24-2022 10:39:38 AM

4 kudos

Hi,The Standard_F16s_v2 is a compute optimize type machine. On the other-hand, for delta optimize (both bin-packing and Z-Ordering), we recommend Stabdard_DS_v2-series. Also, follow Hubert's recommendations.

4 kudos

10-24-2022 10:39:38 AM

3 More Replies

by AP • New Contributor III

07-31-2022 8:20:58 PM

6123 Views
5 replies
3 kudos

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...

Data Engineering

6123 Views
5 replies
3 kudos

07-31-2022 8:20:58 PM

View Replies

Latest Reply

Anonymous
Not applicable

08-03-2022 11:09:30 AM

3 kudos

@AKSHAY PALLERLA Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens for jumping in, as always!

3 kudos

08-03-2022 11:09:30 AM

4 More Replies

by PJ • New Contributor III

04-21-2022 12:45:17 PM

3331 Views
3 replies
3 kudos

Resolved! How should you optimize <1GB delta tables?

I have seen the following documentation that details how you can work with the OPTIMIZE function to improve storage and querying efficiency. However, most of the documentation focuses on big data, 10 GB or larger. I am working with a ~7million row ...

Data Engineering

3331 Views
3 replies
3 kudos

04-21-2022 12:45:17 PM

View Replies

Latest Reply

PJ
New Contributor III

04-21-2022 1:45:33 PM

3 kudos

Thank you @Hubert Dudek !! So I gather from your response that it's totally fine to have a delta table that lives under 1 file that's roughly 211 MB. And I can use OPTIMIZE in conjunction with ZORDER to filter on a frequently filtered, high cardina...

3 kudos

04-21-2022 1:45:33 PM

2 More Replies

by brickster_2018 • Databricks Employee

06-25-2021 3:43:20 PM

1285 Views
1 replies
0 kudos

I have a table with 150 columns and my OPTIMIZE command never finishes. How to tune it

Data Engineering

1285 Views
1 replies
0 kudos

06-25-2021 3:43:20 PM

View Replies

Latest Reply

amr
Databricks Employee

06-28-2021 11:06:40 AM

0 kudos

If the data in your table is huge, try to combine OPTIMIZE with WHERE so you only perform OPTIMIZE on a subset of the data rather than all data. see documentation here.

0 kudos

06-28-2021 11:06:40 AM

by User16826992666 • Databricks Employee

06-15-2021 9:52:33 AM

3611 Views
1 replies
0 kudos

Resolved! How often should I run OPTIMIZE on my Delta Tables?

I know it's important to periodically run Optimize on my Delta tables, but how often should I be doing this? Am I supposed to do this after every time I load data?

Data Engineering

3611 Views
1 replies
0 kudos

06-15-2021 9:52:33 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-18-2021 4:02:27 PM

0 kudos

It would depend on how frequently you update the table and how often you read it. If you have a daily ETL job updating a delta table, it might make sense to run OPTIMIZE at the end of it so that subsequent reads would benefit from the performance imp...

0 kudos

06-18-2021 4:02:27 PM

by User16790091296 • Databricks Employee

05-21-2021 11:40:49 AM

1889 Views
1 replies
0 kudos

What’s the best instance type to run OPTIMIZE (bin-packing and Z-Ordering) on?

I've been doing some research on optimizing data storage while implementing delta, however, I'm not sure which instance type would be best for this.

Data Engineering

1889 Views
1 replies
0 kudos

05-21-2021 11:40:49 AM

View Replies

Latest Reply

sajith_appukutt
Databricks Employee

06-17-2021 10:26:45 PM

0 kudos

OPTIMIZE as you alluded has two operations , Bin-packing and multi-dimensional clustering ( zorder)Bin-packing optimization is idempotent, meaning that if it is run twice on the same dataset, the second run has no effectZ-Ordering is not idempotent b...

0 kudos

06-17-2021 10:26:45 PM

Databricks Community

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

How can I optimize my data pipeline?

OPTIMIZE command failed to complete on partitioned dataset

Azure Databricks VM type for OPTIMIZE with ZORDER on a single column

Resolved! AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices

Resolved! How should you optimize <1GB delta tables?

I have a table with 150 columns and my OPTIMIZE command never finishes. How to tune it

Resolved! How often should I run OPTIMIZE on my Delta Tables?

What’s the best instance type to run OPTIMIZE (bin-packing and Z-Ordering) on?