cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

Getting python version errors when using pyspark rdd using databricks connect

Surajv
New Contributor III

Hi community, 

When I use pyspark rdd related functions in my environment using databricks connect, I get below error: 

Databricks cluster version: 12.2. 

`RuntimeError: Python in worker has different version 3.9 than that in driver 3.10, PySpark cannot run with different minor versions. Please check environment variables PYSPARK PYTHON...`

How can I resolve it? 

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Surajv, The error message you’re encountering indicates a Python version mismatch between the Spark worker and the Spark driver.

To resolve this issue, follow these steps:

  1. Install Correct Python Version on Worker Node:

  2. Update Spark Configuration:

    • Edit or create the ./conf/spark-defaults.conf file (you can copy it from spark-defaults.conf.template) on both the master and worker nodes.
    • Add the following line to the configuration file:
      spark.pyspark.python /usr/bin/python3
      
    • Save the changes and restart both the master and worker nodes.
  3. Verify Environment Variables:

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.