cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Installing pyspark.pandas

DiCamps
New Contributor II

Hello guys,

I'm trying to migrate a python project from Pandas to Pandas API on Spark, on Azure Databricks using MLFlow on a conda env.

The thing is I'm getting the next error:

Traceback (most recent call last):

 File "/databricks/mlflow/projects/x/data_validation.py", line 13, in <module>

  import pyspark.pandas as ps

ModuleNotFoundError: No module named 'pyspark.pandas'

Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2

I also tried to install it from my config file, the one I use to create the conda env, but it's not working ๐Ÿ˜ž

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

View solution in original post

1 REPLY 1

-werners-
Esteemed Contributor III

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?