cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Best practices for optimizing Spark jobs

chris0991
New Contributor III

What are some best practices for optimizing Spark jobs in Databricks, especially when dealing large datasets? Any tips or resources would be greatly appreciated! Iโ€™m trying to analyze data on restaurant menu prices so that insights would be especially helpful!

2 REPLIES 2

-werners-
Esteemed Contributor III

There are so many.
Here are a few:
- look for data skew
- shuffle as less as possible
- avoid many small files
- use spark and not only pure python
- if using an autoscale cluster: check if you don't lose a lot of time scaling up/down

szymon_dybczak
Esteemed Contributor III

Good one @john34567 , made me chuckle but still this is a spam ๐Ÿ˜„