One worker is one executor and one physical node
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2024 02:50 PM
Hi team,
Seems in Databricks, instead of like running Spark jobs on a k8s cluster, when a workflow running on a Job Compute/Cluster or instance pool, one physical node can only have one executor. Is this understanding right? If that is true, that means if I create a Job Cluster for my workflow with high-end instance type, I need to config the executor with bigger values? For example, if I specify Node type as 122GB+16core, as one node runs one executor the normal config on k8s like
spark.executor.memory 16gspark.executor.cores 4
will incur a big waste right?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-28-2024 05:18 AM
Executor = worker in databricks workflow context.
So indeed, if you set executor.cores = 4 and have 16 cores you would not use 12 cores.
So adjusting spark.executor.cores and spark.task.cpus might be a good idea.
There is also the option to define this dynamically (on job level!) with spark.dynamicAllocation.enabled.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-28-2024 11:44 PM
Thanks. With spark.dynamicAllocation.enabled, each executor still uses one physical node right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-29-2024 12:34 AM
It can change the number of executors dynamically if I am not mistaken.
But maybe Databricks has hammered the 1-1 ratio in stone, something to test I'd say.