cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Installing pyspark.pandas

DiCamps
New Contributor II

Hello guys,

I'm trying to migrate a python project from Pandas to Pandas API on Spark, on Azure Databricks using MLFlow on a conda env.

The thing is I'm getting the next error:

Traceback (most recent call last):

 File "/databricks/mlflow/projects/x/data_validation.py", line 13, in <module>

  import pyspark.pandas as ps

ModuleNotFoundError: No module named 'pyspark.pandas'

Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2

I also tried to install it from my config file, the one I use to create the conda env, but it's not working 😞

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

View solution in original post

1 REPLY 1

-werners-
Esteemed Contributor III

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now