cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Questions About Workers and Executors Configuration in Databricks

dvd_lg_bricks
New Contributor

Hi everyone, sorry, I’m new here. I’m considering migrating to Databricks, but I need to clarify a few things first.

When I define and launch an application, I see that I can specify the number of workers, and then later configure the number of executors.
My questions are:

  1. Are those workers running on different machines?

  2. Can I define how many executors run on each worker?

  3. If so, is this controlled through Spark configuration variables?

Thanks!

10 REPLIES 10

szymon_dybczak
Esteemed Contributor III

Hi @dvd_lg_bricks ,

1. Yes.In Databricks:

  • A cluster = driver node + one or more worker nodes

  • Each worker is a separate (VM)

2.  Databricks runs one executor per worker node. Therefore, the terms executor and worker are used interchangeably in the context of the Databricks architecture.

 

Raman_Unifeye
Contributor III

@dvd_lg_bricks - Databricks does the spark-cluster management. 

  1. Are those workers running on different machines? - Yes, it runs one VM per worker 

  2. Can I define how many executors run on each worker? - No, Databricks runs 1 executor per worker

  3. If so, is this controlled through Spark configuration variables? - not on databricks. but on vanilla spark, you would manage it using below 3 variables - 

    • spark.executor.instances

    • spark.executor.cores

    • spark.executor.memory

       

       


RG #Driving Business Outcomes with Data Intelligence

dvd_lg_bricks
New Contributor

Thank you both, @szymon_dybczak @Raman_Unifeye for your feedback. I really appreciate the time and clarity you provided.

szymon_dybczak
Esteemed Contributor III

No problem @dvd_lg_bricks  🙂

dvd_lg_bricks
New Contributor

I mean: while we’re at it @szymon_dybczak or @Raman_Unifeye , is there a place where all available Databricks configuration parameters are documented? I have some pipelines that rely on special settings, such as changing the serializer, enabling Apache Arrow, and a few other uncommon configs.

perhaps you are looking for this - https://docs.databricks.com/aws/en/spark/conf

 


RG #Driving Business Outcomes with Data Intelligence

dvd_lg_bricks
New Contributor

Hello again,

I have a couple of questions regarding Databricks cluster configuration and best practices.

Are there any recommended best practices or guidelines for configuring Databricks clusters (e.g. sizing, cores per executor, memory settings, etc.) depending on the workload?

In on-premise Spark deployments, it is sometimes recommended to leave a certain number of CPU cores or a percentage of CPU/memory reserved for the operating system and the JVM (for example, not allocating 100% of resources to Spark executors).
Is there an equivalent recommendation or consideration in Databricks-managed environments, or is resource management fully handled by the platform?

Thanks in advance for your help.

szymon_dybczak
Esteemed Contributor III

Hi @dvd_lg_bricks ,

There's a section about cluster configuration and tuning in Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads:

Comprehensive Guide to Optimize Data Workloads | Databricks

Raman_Unifeye
Contributor III

Resource management for a cluster is fully handled by Databricks. Ideally, one would be more focused on the cluster sizing based on their workload type and set the autoscaling parameters for the cost/load balancing.

In our various recent implementation, we are moving more towards serverless to get away with the management overhead. Obviously, serverless do not fit for every use-case, so still need to work on hybrid method, having serverless-first approach.


RG #Driving Business Outcomes with Data Intelligence

Abeshek
Visitor

Your Databricks question about workers versus executors. Many teams encounter the same sizing and configuration issues when evaluating a migration. At Kanerika, we help companies plan cluster architecture, optimize Spark workloads, and avoid overspend during the move. If you want, I can share a quick sizing framework we use with clients. Would you be open to a brief 15-minute exchange next week?