- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2025 12:28 AM
I am trying to run the spark job in community edition, but when I noticed in the spark UI the whole data is reading on the driver node, instead of reading it on the worker node.
does community version will not support for the worker node?
- Labels:
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2025 01:15 AM
In Databricks Community Edition, the compute environment is set up as a single-node cluster. This means there is only one node, which serves both as the driver and the worker. Because of this, all data processing—including reading data—is performed on this single node. There are no separate worker nodes available in the Community Edition, so you won’t see distributed data processing across multiple workers. That’s why, in the Spark UI, it appears that all data is being read and processed on the driver node.
If you need true distributed processing with separate worker nodes, you would need to use the full (paid) version of Databricks or another Spark environment that supports multi-node clusters. In summary, the Community Edition does not support separate worker nodes; all operations happen on the single node available.
Aviral
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2025 02:37 AM
okay.. got it. Thank you for the response @Aviral-Bhardwaj