Topics with Label: Optimize

Forum Posts

Sorted by:

Start a conversation

by User16869510359 • Esteemed Contributor

06-25-2021 3:43:20 PM

558 Views
1 replies
0 kudos

I have a table with 150 columns and my OPTIMIZE command never finishes. How to tune it

Data Engineering

558 Views
1 replies
0 kudos

06-25-2021 3:43:20 PM

View Replies

Latest Reply

amr
New Contributor III

06-28-2021 11:06:40 AM

0 kudos

If the data in your table is huge, try to combine OPTIMIZE with WHERE so you only perform OPTIMIZE on a subset of the data rather than all data. see documentation here.

0 kudos

06-28-2021 11:06:40 AM

by User16826992666 • Valued Contributor

06-25-2021 9:15:20 AM

813 Views
1 replies
0 kudos

How can I run OPTIMIZE on a table if I am streaming to it 24/7?

I have a table that I need to be continuously streaming into. I know it's best practice to run Optimize on my tables periodically. But if I never stop writing to the table, how and when can I run OPTIMIZE against it?

Data Engineering

813 Views
1 replies
0 kudos

06-25-2021 9:15:20 AM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 2:28:20 PM

0 kudos

If the streaming job is making bling appends to the delta table, then it's perfectly fine to run OPTIMIZE query in parallel.However, if the streaming job is performing MERGE or UPDATE then it can conflict with the OPTIMIZE operations. In such cases w...

0 kudos

06-25-2021 2:28:20 PM

by User16869510359 • Esteemed Contributor

06-25-2021 1:43:39 PM

556 Views
1 replies
1 kudos

Optimize Command not performing the bin packing

I have a daily OPTIMIZE job running, however, the number of files in the storage is not going down. Looks like the optimize is not helping to reduce the files.

Data Engineering

556 Views
1 replies
1 kudos

06-25-2021 1:43:39 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 1:45:16 PM

1 kudos

The files are not physically removed from the Storage by the optimize command. A VACUUM command has to be executed to achieve the same

1 kudos

06-25-2021 1:45:16 PM

by User16869510359 • Esteemed Contributor

06-25-2021 12:24:56 PM

710 Views
1 replies
0 kudos

Resolved! Delta Streaming and Optimize

I have a master delta table that is continuously getting written by a streaming job. I have optimize writes enabled and in addition, I run the OPTIMIZE command every 3 hours. However, I think the downstream streaming jobs which are streaming the data...

Data Engineering

710 Views
1 replies
0 kudos

06-25-2021 12:24:56 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-25-2021 12:26:24 PM

0 kudos

This is working as expected. For Delta streaming, the data files created in the first place will be used for streaming. The optimized files are not considered the downstream streaming job. This is the reason it's not recommended to run VACUUM with f...

0 kudos

06-25-2021 12:26:24 PM

by User16783854657 • New Contributor III

06-23-2021 2:28:51 PM

7044 Views
2 replies
0 kudos

What is the difference between OPTIMIZE and Auto Optimize?

I see that Delta Lake has an OPTIMIZE command and also table properties for Auto Optimize. What are the differences between these and when should I use one over the other?

Data Engineering

7044 Views
2 replies
0 kudos

06-23-2021 2:28:51 PM

View Replies

Latest Reply

User16869510359
Esteemed Contributor

06-24-2021 12:18:25 PM

0 kudos

From my Data+AI talk on Operating and Supporting Delta lake in production

0 kudos

06-24-2021 12:18:25 PM

1 More Replies

by aladda • Honored Contributor II

06-23-2021 9:16:35 PM

655 Views
1 replies
0 kudos

How frequently should Optimize be run on a Delta Table

Data Engineering

655 Views
1 replies
0 kudos

06-23-2021 9:16:35 PM

View Replies

Latest Reply

aladda
Honored Contributor II

06-23-2021 9:18:00 PM

0 kudos

Its typically a good idea to run optimize aligned with the frequency of updates to the Delta Table. However you also don't want to over do as there's a cost/performance trade-off. Unless there are very frequent updates to the table that can cause sma...

0 kudos

06-23-2021 9:18:00 PM

by aladda • Honored Contributor II

06-23-2021 9:13:35 PM

716 Views
1 replies
0 kudos

Resolved! What type of cluster configuration should one use to run Optimize on a Delta Table

Data Engineering

716 Views
1 replies
0 kudos

06-23-2021 9:13:35 PM

View Replies

Latest Reply

aladda
Honored Contributor II

06-23-2021 9:15:49 PM

0 kudos

Optimize merges small files into larger ones and can involve shuffling and creation of large in-memory partitions. Thus its recommended to use a memory optimized executor configuration to prevent spilling to disk. IN additional use of autoscaling wil...

0 kudos

06-23-2021 9:15:49 PM

by User16783854657 • New Contributor III

06-23-2021 3:08:09 PM

714 Views
1 replies
1 kudos

Does running OPTIMIZE on a delta table destroy the transaction history of table?

If I run OPTIMIZE on a Delta Lake table, will it prevent me from time travelling to a version before OPTIMIZE was run?

Data Engineering

714 Views
1 replies
1 kudos

06-23-2021 3:08:09 PM

View Replies

Latest Reply

User16783854657
New Contributor III

06-23-2021 3:34:03 PM

1 kudos

No, you will still be able to time travel to versions previous to the OPTIMIZE command. OPTIMIZE is just another transaction like MERGE, UPDATE, etc. Check out these docs to learn more about retention periods and the VACUUM command.

1 kudos

06-23-2021 3:34:03 PM

by Srikanth_Gupta_ • Valued Contributor

06-21-2021 10:17:45 AM

817 Views
1 replies
0 kudos

Resolved! Does size of optimized files after running OPTIMIZE varies between cloud providers (S3, Blob and GCS)?

are there any other parameters to consider running OPTIMIZE depending cloud vendor?

Data Engineering

817 Views
1 replies
0 kudos

06-21-2021 10:17:45 AM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

06-21-2021 11:03:17 AM

0 kudos

The optimize is not dependent on the cloud provider whatsoever. Optimize will produce the same results regardless of the underlying storage. It is idempotent, meaning if it is run twice on the same dataset the the second execution has no effect.

0 kudos

06-21-2021 11:03:17 AM

by JigaoLuo • New Contributor

12-25-2019 4:01:36 AM

4146 Views
3 replies
0 kudos

OPTIMIZE error: org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'OPTIMIZE'

Hi everyone. I am trying to learn the keyword OPTIMIZE from this blog using scala: https://docs.databricks.com/delta/optimizations/optimization-examples.html#delta-lake-on-databricks-optimizations-scala-notebook. But my local spark seems not able t...

Data Engineering

4146 Views
3 replies
0 kudos

12-25-2019 4:01:36 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-13-2020 2:30:18 PM

0 kudos

Hi Jigao, OPTIMIZE isn't in the open source delta API, so won't run on your local Spark instance - https://docs.delta.io/latest/api/scala/io/delta/tables/index.html?search=optimize

0 kudos

05-13-2020 2:30:18 PM

2 More Replies