Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
Question Submitted
How to tune a job to avoid paying extra cost for EXPAND DISK? Is it due to the shuffle or data skew? Is there a way to configure the workers with larger disk?
If not having EXPAND DISK, it will fail since no space left on the disk.
Your worker disk size should almost never matter. You should be using cloud storage such as S3/ADL2/GCS. What operations are you running and what error message are you getting?
No error, just seeing the EXPAND DISK in cluster event logs. This is just a regular spark application. I am not sure if the cloud storage matters - a spark application uses it as input and output.
Connect with Databricks Users in Your Area
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.