cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Help with Databricks SQL Queries

alexacas
New Contributor II

Hi everyone,

Iโ€™m relatively new to Databricks and trying to optimize some SQL queries for better performance. Iโ€™ve noticed that certain queries take longer to run than expected. Does anyone have tips or best practices for writing efficient SQL in Databricks? Specifically, Iโ€™m interested in how to handle large datasets and any strategies for indexing or partitioning data effectively.

surcharge

1 ACCEPTED SOLUTION

Accepted Solutions

mhiltner
Databricks Employee
Databricks Employee

You can find some tips here: https://community.databricks.com/t5/technical-blog/top-10-query-performance-tuning-tips-for-databric... 

And here: https://www.databricks.com/discover/pages/optimize-data-workloads-guide 

My overall recommendation would be to check the query performance window and find which processes are taking the longest. Than you can understand whether a broadcast would help, or repartitioning or any other strategy. 

View solution in original post

2 REPLIES 2

mhiltner
Databricks Employee
Databricks Employee

You can find some tips here: https://community.databricks.com/t5/technical-blog/top-10-query-performance-tuning-tips-for-databric... 

And here: https://www.databricks.com/discover/pages/optimize-data-workloads-guide 

My overall recommendation would be to check the query performance window and find which processes are taking the longest. Than you can understand whether a broadcast would help, or repartitioning or any other strategy. 

filipniziol
Contributor III

Hi @alexacas ,

The best thing is to share the queries and table structures ๐Ÿ™‚

But my general approach is:
1. Use partitioning/zordering, or if you can upgrade runtime to 15.4, use liquid clustering, that is the new optimization technique.

2. Make sure you do not have many small files. Run DESCRIBE DETAIL on your tables to check if the files are of around 128 MB. If they are not, make sure to have maintenance job to run OPTIMIZE on your tables on regular basis. 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group