cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Installing pyspark.pandas

DiCamps
New Contributor II

Hello guys,

I'm trying to migrate a python project from Pandas to Pandas API on Spark, on Azure Databricks using MLFlow on a conda env.

The thing is I'm getting the next error:

Traceback (most recent call last):

 File "/databricks/mlflow/projects/x/data_validation.py", line 13, in <module>

  import pyspark.pandas as ps

ModuleNotFoundError: No module named 'pyspark.pandas'

Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2

I also tried to install it from my config file, the one I use to create the conda env, but it's not working ๐Ÿ˜ž

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

View solution in original post

1 REPLY 1

-werners-
Esteemed Contributor III

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!