yesterday
Hi everyone, sorry, I’m new here. I’m considering migrating to Databricks, but I need to clarify a few things first.
When I define and launch an application, I see that I can specify the number of workers, and then later configure the number of executors.
My questions are:
Are those workers running on different machines?
Can I define how many executors run on each worker?
If so, is this controlled through Spark configuration variables?
Thanks!
5 hours ago
Hi @dvd_lg_bricks ,
1. Yes.In Databricks:
A cluster = driver node + one or more worker nodes
Each worker is a separate (VM)
2. Databricks runs one executor per worker node. Therefore, the terms executor and worker are used interchangeably in the context of the Databricks architecture.
4 hours ago
@dvd_lg_bricks - Databricks does the spark-cluster management.
Are those workers running on different machines? - Yes, it runs one VM per worker
Can I define how many executors run on each worker? - No, Databricks runs 1 executor per worker
If so, is this controlled through Spark configuration variables? - not on databricks. but on vanilla spark, you would manage it using below 3 variables -
spark.executor.instances
spark.executor.cores
spark.executor.memory
4 hours ago
Thank you both, @szymon_dybczak @Raman_Unifeye for your feedback. I really appreciate the time and clarity you provided.
3 hours ago
No problem @dvd_lg_bricks 🙂
44m ago
I mean: while we’re at it @szymon_dybczak or @Raman_Unifeye , is there a place where all available Databricks configuration parameters are documented? I have some pipelines that rely on special settings, such as changing the serializer, enabling Apache Arrow, and a few other uncommon configs.
25m ago
perhaps you are looking for this - https://docs.databricks.com/aws/en/spark/conf
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now