โ12-01-2023 02:39 PM
Process to export a delta table is taking ~2hrs.
Delta table has 66 partitions with total size of ~6gb, 4million rows and 270 columns.
Used below command
df.coalesce(1).write.csv("path")
what are my options to reduce the time?
โ12-02-2023 01:58 PM - edited โ12-02-2023 02:02 PM
A very interesting task in front of you.... let me know how you solve it!
โ12-03-2023 07:40 AM
Hi @561064, Exporting a Delta table can indeed be time-consuming, especially when dealing with large datasets.
Letโs explore some strategies to optimize the export process and reduce the time:
Partitioning:
Compaction:
V-Order Optimization:
Delta Table Properties:
Remember that these optimizations can significantly improve export times, but the actual impact may vary based on your specific use case.
Experiment with different approaches to find the best combination for your Delta table export process. ๐
โ12-04-2023 11:07 AM
Hi Kainz,
None of the options I tried helped as the challenge is not reading but writing it to a one CSV file. df.repartition(numFiles).write.csv("path") has consumed the same amount of time as 'df.coalesce(1).write.csv("path")' in my case.
any other options I can explore?
โ12-03-2023 09:05 PM
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
โ12-03-2023 09:06 PM
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group