06-06-2023 12:01 AM
Hi Team,
We have few prod tables which are created in s3 bucket, that have grown now very large, these tables are getting real time data continuously from round the clock databricks workflows; we would like run the optimization commands(optimize, zorderby), without stopping/pausing the jobs, Could you please suggest if there is a way to accomplish this?
Thanks in advance !!
06-07-2023 01:10 AM
@Sriram Kumar Are they just inserts? Then you can optimize it without affecting it: https://docs.databricks.com/optimizations/isolation-level.html#write-conflicts-on-databricks.
06-09-2023 04:26 AM
Hi @Sriram Kumar,
Hope all is well!
Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
06-13-2023 07:27 AM
@Sriram Kumar :
To run optimization commands like OPTIMIZE and ZORDER BY on large tables in an S3 bucket without stopping or pausing the Databricks workflows that continuously update the tables, you can follow these steps:
Hope this helps!
06-14-2023 11:03 PM
Hi @Sriram Kumar
We haven't heard from you since the last response from @Suteja Kanuri . Kindly share the information with us, and in return, we will provide you with the necessary solution.
Thanks and Regards
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group