cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Installing pyspark.pandas

DiCamps
New Contributor II

Hello guys,

I'm trying to migrate a python project from Pandas to Pandas API on Spark, on Azure Databricks using MLFlow on a conda env.

The thing is I'm getting the next error:

Traceback (most recent call last):

 File "/databricks/mlflow/projects/x/data_validation.py", line 13, in <module>

  import pyspark.pandas as ps

ModuleNotFoundError: No module named 'pyspark.pandas'

Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2

I also tried to install it from my config file, the one I use to create the conda env, but it's not working ๐Ÿ˜ž

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

View solution in original post

1 REPLY 1

-werners-
Esteemed Contributor III

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group