Hello guys,
I'm trying to migrate a python project from Pandas to Pandas API on Spark, on Azure Databricks using MLFlow on a conda env.
The thing is I'm getting the next error:
Traceback (most recent call last):
File "/databricks/mlflow/projects/x/data_validation.py", line 13, in <module>
import pyspark.pandas as ps
ModuleNotFoundError: No module named 'pyspark.pandas'
Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2
I also tried to install it from my config file, the one I use to create the conda env, but it's not working ๐