Databricks Community

kseyser · ‎05-19-2024

Im working on a project to predict compute (cores) required to run spark jobs. Has anyone work on this or something similar before? How did you get started?

Yeshwanth · ‎05-20-2024

@kseyser good day,

This documentation might help you in your use-case: https://docs.databricks.com/en/compute/cluster-config-best-practices.html#compute-sizing-considerati...

Kind regards,

Yesh

kseyser · ‎05-21-2024

Hi @Yeshwanth, thank you for directing me to the documentation. I don't know much about computations, so I'm still figuring things out. So is there like a straight forward (standard) way to calculate the compute (no. of cores & memory) required to run spark jobs based on certain data volume of the job, frequency of the jobs, and number of jobs? I read that the data is generally partitioned into 128MB and the executor memory is divided into 300 MB reserved memory, 60% execution memory, and 40% storage memory. How would this help me calculate the compute for a data of size, say 1.5 TB?

Databricks Community

Predicting compute required to run Spark jobs

Join Us as a Local Community Builder!

🚀 Weekly Delta (1 - 7 October): A Look Back at This Week’s Top Community Highlights!

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming

Level Up with Databricks Specialist Sessions

Announcing Data Intelligence for Cybersecurity