Databricks Community

kseyser · ‎05-19-2024

Im working on a project to predict compute (cores) required to run spark jobs. Has anyone work on this or something similar before? How did you get started?

Yeshwanth · ‎05-20-2024

@kseyser good day,

This documentation might help you in your use-case: https://docs.databricks.com/en/compute/cluster-config-best-practices.html#compute-sizing-considerati...

Kind regards,

Yesh

kseyser · ‎05-21-2024

Hi @Yeshwanth, thank you for directing me to the documentation. I don't know much about computations, so I'm still figuring things out. So is there like a straight forward (standard) way to calculate the compute (no. of cores & memory) required to run spark jobs based on certain data volume of the job, frequency of the jobs, and number of jobs? I read that the data is generally partitioned into 128MB and the executor memory is divided into 300 MB reserved memory, 60% execution memory, and 40% storage memory. How would this help me calculate the compute for a data of size, say 1.5 TB?

Databricks Community

Predicting compute required to run Spark jobs

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon