โ06-06-2023 12:01 AM
Hi Team,
We have few prod tables which are created in s3 bucket, that have grown now very large, these tables are getting real time data continuously from round the clock databricks workflows; we would like run the optimization commands(optimize, zorderby), without stopping/pausing the jobs, Could you please suggest if there is a way to accomplish this?
Thanks in advance !!
โ06-07-2023 01:10 AM
@Sriram Kumarโ Are they just inserts? Then you can optimize it without affecting it: https://docs.databricks.com/optimizations/isolation-level.html#write-conflicts-on-databricks.
โ06-09-2023 04:26 AM
Hi @Sriram Kumarโ,
Hope all is well!
Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
โ06-13-2023 07:27 AM
@Sriram Kumarโ :
To run optimization commands like OPTIMIZE and ZORDER BY on large tables in an S3 bucket without stopping or pausing the Databricks workflows that continuously update the tables, you can follow these steps:
Hope this helps!
โ06-14-2023 11:03 PM
Hi @Sriram Kumarโ
We haven't heard from you since the last response from @Suteja Kanuriโ โ . Kindly share the information with us, and in return, we will provide you with the necessary solution.
Thanks and Regards
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group