cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do you analyze performance

Newbienewbster
New Contributor

Curious to hear how you guys optimize compute. As in how you dig into the details of the Spark execution and improve?

1 REPLY 1

mhiltner
Contributor III

That is it. Usually, people take the time it takes to run a job/query/process as their KPI. 

Then you start to check which processes are taking more time, drilling down one by one. Sometimes it could be a misplaced .cache(), .collect() or display() that makes spark effectively calculate everything. You could also do the same for queries with the query profiler, checking whether there was shuffle, how many rows are being processed and whether there was disk spill. You can also check for skewness. 

I really like this blog: https://www.databricks.com/discover/pages/optimize-data-workloads-guide

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group