Error: Community edition spark job skewnees

Pavankumar7
New Contributor III

I am trying to run the spark job in community edition, but when I noticed in the spark UI the whole data is reading on the driver node, instead of reading it on the worker node.

does community version will not support for the worker node?

Aviral-Bhardwaj
Esteemed Contributor III

In Databricks Community Edition, the compute environment is set up as a single-node cluster. This means there is only one node, which serves both as the driver and the worker. Because of this, all data processing—including reading data—is performed on this single node. There are no separate worker nodes available in the Community Edition, so you won’t see distributed data processing across multiple workers. That’s why, in the Spark UI, it appears that all data is being read and processed on the driver node.

If you need true distributed processing with separate worker nodes, you would need to use the full (paid) version of Databricks or another Spark environment that supports multi-node clusters. In summary, the Community Edition does not support separate worker nodes; all operations happen on the single node available.

Aviral

AviralBhardwaj

View solution in original post

okay.. got it. Thank you for the response @Aviral-Bhardwaj