cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Questions About Workers and Executors Configuration in Databricks

dvd_lg_bricks
New Contributor

Hi everyone, sorry, I’m new here. I’m considering migrating to Databricks, but I need to clarify a few things first.

When I define and launch an application, I see that I can specify the number of workers, and then later configure the number of executors.
My questions are:

  1. Are those workers running on different machines?

  2. Can I define how many executors run on each worker?

  3. If so, is this controlled through Spark configuration variables?

Thanks!

6 REPLIES 6

szymon_dybczak
Esteemed Contributor III

Hi @dvd_lg_bricks ,

1. Yes.In Databricks:

  • A cluster = driver node + one or more worker nodes

  • Each worker is a separate (VM)

2.  Databricks runs one executor per worker node. Therefore, the terms executor and worker are used interchangeably in the context of the Databricks architecture.

 

Raman_Unifeye
Contributor III

@dvd_lg_bricks - Databricks does the spark-cluster management. 

  1. Are those workers running on different machines? - Yes, it runs one VM per worker 

  2. Can I define how many executors run on each worker? - No, Databricks runs 1 executor per worker

  3. If so, is this controlled through Spark configuration variables? - not on databricks. but on vanilla spark, you would manage it using below 3 variables - 

    • spark.executor.instances

    • spark.executor.cores

    • spark.executor.memory

       

       


RG #Driving Business Outcomes with Data Intelligence

dvd_lg_bricks
New Contributor

Thank you both, @szymon_dybczak @Raman_Unifeye for your feedback. I really appreciate the time and clarity you provided.

szymon_dybczak
Esteemed Contributor III

No problem @dvd_lg_bricks  🙂

dvd_lg_bricks
New Contributor

I mean: while we’re at it @szymon_dybczak or @Raman_Unifeye , is there a place where all available Databricks configuration parameters are documented? I have some pipelines that rely on special settings, such as changing the serializer, enabling Apache Arrow, and a few other uncommon configs.

perhaps you are looking for this - https://docs.databricks.com/aws/en/spark/conf

 


RG #Driving Business Outcomes with Data Intelligence