Monday
Hi everyone, sorry, I’m new here. I’m considering migrating to Databricks, but I need to clarify a few things first.
When I define and launch an application, I see that I can specify the number of workers, and then later configure the number of executors.
My questions are:
Are those workers running on different machines?
Can I define how many executors run on each worker?
If so, is this controlled through Spark configuration variables?
Thanks!
Tuesday
Hi @dvd_lg_bricks ,
1. Yes.In Databricks:
A cluster = driver node + one or more worker nodes
Each worker is a separate (VM)
2. Databricks runs one executor per worker node. Therefore, the terms executor and worker are used interchangeably in the context of the Databricks architecture.
Tuesday
@dvd_lg_bricks - Databricks does the spark-cluster management.
Are those workers running on different machines? - Yes, it runs one VM per worker
Can I define how many executors run on each worker? - No, Databricks runs 1 executor per worker
If so, is this controlled through Spark configuration variables? - not on databricks. but on vanilla spark, you would manage it using below 3 variables -
spark.executor.instances
spark.executor.cores
spark.executor.memory
Tuesday
Thank you both, @szymon_dybczak @Raman_Unifeye for your feedback. I really appreciate the time and clarity you provided.
Tuesday
No problem @dvd_lg_bricks 🙂
Tuesday
I mean: while we’re at it @szymon_dybczak or @Raman_Unifeye , is there a place where all available Databricks configuration parameters are documented? I have some pipelines that rely on special settings, such as changing the serializer, enabling Apache Arrow, and a few other uncommon configs.
Tuesday
perhaps you are looking for this - https://docs.databricks.com/aws/en/spark/conf
10 hours ago
Hello again,
I have a couple of questions regarding Databricks cluster configuration and best practices.
Are there any recommended best practices or guidelines for configuring Databricks clusters (e.g. sizing, cores per executor, memory settings, etc.) depending on the workload?
In on-premise Spark deployments, it is sometimes recommended to leave a certain number of CPU cores or a percentage of CPU/memory reserved for the operating system and the JVM (for example, not allocating 100% of resources to Spark executors).
Is there an equivalent recommendation or consideration in Databricks-managed environments, or is resource management fully handled by the platform?
Thanks in advance for your help.
5 hours ago
Hi @dvd_lg_bricks ,
There's a section about cluster configuration and tuning in Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads:
6 hours ago
Resource management for a cluster is fully handled by Databricks. Ideally, one would be more focused on the cluster sizing based on their workload type and set the autoscaling parameters for the cost/load balancing.
In our various recent implementation, we are moving more towards serverless to get away with the management overhead. Obviously, serverless do not fit for every use-case, so still need to work on hybrid method, having serverless-first approach.
5 hours ago
Your Databricks question about workers versus executors. Many teams encounter the same sizing and configuration issues when evaluating a migration. At Kanerika, we help companies plan cluster architecture, optimize Spark workloads, and avoid overspend during the move. If you want, I can share a quick sizing framework we use with clients. Would you be open to a brief 15-minute exchange next week?
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now