Question Submitted How to tune a job to avoid paying extra cost for EXPAND DISK? Is it due to the shuffle or data skew? Is there a way to configure the workers with larger disk? If not having EXPAND DISK, it will fail since no space left on the disk.

Machine Learning

Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.

2 REPLIES 2

Your worker disk size should almost never matter. You should be using cloud storage such as S3/ADL2/GCS. What operations are you running and what error message are you getting?

No error, just seeing the EXPAND DISK in cluster event logs. This is just a regular spark application. I am not sure if the cloud storage matters - a spark application uses it as input and output.

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Databricks Community

Question Submitted How to tune a job to avoid paying extra cost for EXPAND DISK? Is it due to the shuffle or data skew? Is there a way to configure the workers with larger disk? If not having EXPAND DISK, it will fail since no space left on the disk.

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟