Databricks Community

DiCamps · ‎09-29-2022

Hello guys,

I'm trying to migrate a python project from Pandas to Pandas API on Spark, on Azure Databricks using MLFlow on a conda env.

The thing is I'm getting the next error:

Traceback (most recent call last):

File "/databricks/mlflow/projects/x/data_validation.py", line 13, in <module>

import pyspark.pandas as ps

ModuleNotFoundError: No module named 'pyspark.pandas'

Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2

I also tried to install it from my config file, the one I use to create the conda env, but it's not working 😞

-werners- · ‎09-29-2022

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

View solution in original post

-werners- · ‎09-29-2022

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

Databricks Community

Installing pyspark.pandas

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐