cancel
Showing results for 
Search instead for 
Did you mean: 

Z-ordering optimization with multithreading

yliu
New Contributor

Hi, 

I am wondering if multithreading will help with the performance for z-ordering optimization on multiple delta tables.

We are periodically doing optimization on thousands of tables and it easily takes a few days to finish the job. So we are looking for a way to optimize a number of tables in parallel. Will using multithreading make sense here speed up the process? We did a few rounds of testing in dev environment and it seems the optimization with multithreading does a better job. But we couldn't be sure as the tables in dev are not updated very frequently so sometimes the optimization is not actually writing anything. 

Thank you in advance for your help!

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @yliu , using multithreading can indeed help with the performance of Z-ordering optimization on multiple Delta tables. It is mentioned that the OPTIMIZE command has been improved to commit batches as soon as possible, instead of at the end, and the default number of threads OPTIMIZE runs in parallel has been reduced, which is a strict performance increase for large tables.

However, it's important to note that the effectiveness of multithreading will depend on the specific characteristics of your tables and your computing environment. For example, if your tables are very large and have many columns, using multithreading might speed up the process significantly. On the other hand, if your tables are smaller and have fewer columns, the improvement might be less noticeable.Furthermore, the OPTIMIZE operation now uses Hilbert space-filling curves by default, which provides better clustering characteristics than Z-order in higher dimensions. This approach can speed up read queries by skipping more data than Z-order.

So, based on your testing in a dev environment and the provided information, it does make sense to use multithreading to speed up the Z-ordering optimization process.

However, as the tables in dev are not updated very frequently, it would be advisable to continue monitoring and testing this approach in your production environment to ensure it continues to provide the desired performance improvements.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @yliu , using multithreading can indeed help with the performance of Z-ordering optimization on multiple Delta tables. It is mentioned that the OPTIMIZE command has been improved to commit batches as soon as possible, instead of at the end, and the default number of threads OPTIMIZE runs in parallel has been reduced, which is a strict performance increase for large tables.

However, it's important to note that the effectiveness of multithreading will depend on the specific characteristics of your tables and your computing environment. For example, if your tables are very large and have many columns, using multithreading might speed up the process significantly. On the other hand, if your tables are smaller and have fewer columns, the improvement might be less noticeable.Furthermore, the OPTIMIZE operation now uses Hilbert space-filling curves by default, which provides better clustering characteristics than Z-order in higher dimensions. This approach can speed up read queries by skipping more data than Z-order.

So, based on your testing in a dev environment and the provided information, it does make sense to use multithreading to speed up the Z-ordering optimization process.

However, as the tables in dev are not updated very frequently, it would be advisable to continue monitoring and testing this approach in your production environment to ensure it continues to provide the desired performance improvements.

yliu
New Contributor

Thank you for the detailed explanation and quick response! I will proceed with multithreading then and keep monitoring it. Thanks a lot :))

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.