Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
Question Submitted
How to tune a job to avoid paying extra cost for EXPAND DISK? Is it due to the shuffle or data skew? Is there a way to configure the workers with larger disk?
If not having EXPAND DISK, it will fail since no space left on the disk.
Your worker disk size should almost never matter. You should be using cloud storage such as S3/ADL2/GCS. What operations are you running and what error message are you getting?
No error, just seeing the EXPAND DISK in cluster event logs. This is just a regular spark application. I am not sure if the cloud storage matters - a spark application uses it as input and output.
Join Us as a Local Community Builder!
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!