Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
Recently observed when we run a driver intensive code on a all purpose compute. The parallel runs of the same pattern/kind jobs are getting failed Example: Job triggerd on all purpose compute with compute stats of 4 core and 8 gigs ram for driver
Lets say my job is driver expensive and gonna exhaust all the compute and I have same pattern jobs (kind - Driver expensive) run in parallel (assume 5 parallel jobs has been triggered)
If my first job exhausts all the driver's compute (cpu) the other 4 jobs should be queued untill it gets resource But rather than all my other jobs are getting failed due to OOM in driver Yes we can use job cluster for this kind of workloads but ideally is there any reason behind why the jobs are not getting queued if it doesn't have resource for driver Whereas in case of executor compute exhaust the jobs are getting queued if it doesn't have resource for that workload execution
I don't feel this should be an expected behaviour. Do share your insights if am missing out on something.