cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Error: Community edition spark job skewnees

Pavankumar7
New Contributor III

I am trying to run the spark job in community edition, but when I noticed in the spark UI the whole data is reading on the driver node, instead of reading it on the worker node.

does community version will not support for the worker node?

1 ACCEPTED SOLUTION

Accepted Solutions

Aviral-Bhardwaj
Esteemed Contributor III

In Databricks Community Edition, the compute environment is set up as a single-node cluster. This means there is only one node, which serves both as the driver and the worker. Because of this, all data processingโ€”including reading dataโ€”is performed on this single node. There are no separate worker nodes available in the Community Edition, so you wonโ€™t see distributed data processing across multiple workers. Thatโ€™s why, in the Spark UI, it appears that all data is being read and processed on the driver node.

If you need true distributed processing with separate worker nodes, you would need to use the full (paid) version of Databricks or another Spark environment that supports multi-node clusters. In summary, the Community Edition does not support separate worker nodes; all operations happen on the single node available.

Aviral

AviralBhardwaj

View solution in original post

2 REPLIES 2

Aviral-Bhardwaj
Esteemed Contributor III

In Databricks Community Edition, the compute environment is set up as a single-node cluster. This means there is only one node, which serves both as the driver and the worker. Because of this, all data processingโ€”including reading dataโ€”is performed on this single node. There are no separate worker nodes available in the Community Edition, so you wonโ€™t see distributed data processing across multiple workers. Thatโ€™s why, in the Spark UI, it appears that all data is being read and processed on the driver node.

If you need true distributed processing with separate worker nodes, you would need to use the full (paid) version of Databricks or another Spark environment that supports multi-node clusters. In summary, the Community Edition does not support separate worker nodes; all operations happen on the single node available.

Aviral

AviralBhardwaj

okay.. got it. Thank you for the response @Aviral-Bhardwaj 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now