- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2022 04:25 AM
Hello guys,
I'm trying to migrate a python project from Pandas to Pandas API on Spark, on Azure Databricks using MLFlow on a conda env.
The thing is I'm getting the next error:
Traceback (most recent call last):
File "/databricks/mlflow/projects/x/data_validation.py", line 13, in <module>
import pyspark.pandas as ps
ModuleNotFoundError: No module named 'pyspark.pandas'
Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2
I also tried to install it from my config file, the one I use to create the conda env, but it's not working 😞
- Labels:
-
Azure
-
Python Project
-
Spark Pandas Api
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2022 05:40 AM
it should be yes.
can you elaborate on how you create your notebook (and the conda env you talk about)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2022 05:40 AM
it should be yes.
can you elaborate on how you create your notebook (and the conda env you talk about)?

