- 4057 Views
- 2 replies
- 2 kudos
- 4057 Views
- 2 replies
- 2 kudos
Latest Reply
Hi @Alejandro Martinez Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...
1 More Replies
- 3406 Views
- 3 replies
- 1 kudos
Delta Lake provides optimizations that can help you accelerate your data lake operations. Here’s how you can improve query speed by optimizing the layout of data in storage.There are two ways you can optimize your data pipeline: 1) Notebook Optimizat...
- 3406 Views
- 3 replies
- 1 kudos
Latest Reply
some tips from me:Look for data skews; some partitions can be huge, some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit (get getNumPartitions()), especially SQL can divide it unequally to parti...
2 More Replies
- 1590 Views
- 1 replies
- 1 kudos
Trying to optimize delta table with following stats:size: 212,848 blobs, 31,162,417,246,985 bytescommand: OPTIMIZE <table> ZORDER BY (X, Y, Z)In Spark UI I can see all work divided to batches, and each batch start with 400 tasks to collect data. But ...
- 1590 Views
- 1 replies
- 1 kudos
Latest Reply
can you share some sample datasets for this by that we can debug and help you accordingly ThanksAviral
- 4735 Views
- 4 replies
- 4 kudos
- 4735 Views
- 4 replies
- 4 kudos
Latest Reply
Hi,The Standard_F16s_v2 is a compute optimize type machine. On the other-hand, for delta optimize (both bin-packing and Z-Ordering), we recommend Stabdard_DS_v2-series. Also, follow Hubert's recommendations.
3 More Replies
by
AP
• New Contributor III
- 4603 Views
- 5 replies
- 3 kudos
So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.Should we enable "optimized writes" by setting the following at a workspace level?spark.conf.set...
- 4603 Views
- 5 replies
- 3 kudos
Latest Reply
@AKSHAY PALLERLA Just checking in to see if you got a solution to the issue you shared above. Let us know!Thanks to @Werner Stinckens for jumping in, as always!
4 More Replies
by
PJ
• New Contributor III
- 2528 Views
- 3 replies
- 3 kudos
I have seen the following documentation that details how you can work with the OPTIMIZE function to improve storage and querying efficiency. However, most of the documentation focuses on big data, 10 GB or larger. I am working with a ~7million row ...
- 2528 Views
- 3 replies
- 3 kudos
Latest Reply
Thank you @Hubert Dudek !! So I gather from your response that it's totally fine to have a delta table that lives under 1 file that's roughly 211 MB. And I can use OPTIMIZE in conjunction with ZORDER to filter on a frequently filtered, high cardina...
2 More Replies
- 2702 Views
- 1 replies
- 0 kudos
I know it's important to periodically run Optimize on my Delta tables, but how often should I be doing this? Am I supposed to do this after every time I load data?
- 2702 Views
- 1 replies
- 0 kudos
Latest Reply
It would depend on how frequently you update the table and how often you read it. If you have a daily ETL job updating a delta table, it might make sense to run OPTIMIZE at the end of it so that subsequent reads would benefit from the performance imp...
- 1445 Views
- 1 replies
- 0 kudos
I've been doing some research on optimizing data storage while implementing delta, however, I'm not sure which instance type would be best for this.
- 1445 Views
- 1 replies
- 0 kudos
Latest Reply
OPTIMIZE as you alluded has two operations , Bin-packing and multi-dimensional clustering ( zorder)Bin-packing optimization is idempotent, meaning that if it is run twice on the same dataset, the second run has no effectZ-Ordering is not idempotent b...