Databricks Community

DiCamps · ‎09-29-2022

Hello guys,

I'm trying to migrate a python project from Pandas to Pandas API on Spark, on Azure Databricks using MLFlow on a conda env.

The thing is I'm getting the next error:

Traceback (most recent call last):

File "/databricks/mlflow/projects/x/data_validation.py", line 13, in <module>

import pyspark.pandas as ps

ModuleNotFoundError: No module named 'pyspark.pandas'

Isn't the package supposed to be part of Spark already? We're using clusters on runtime version 10.4 LTS, which I understand is having Apache Spark 3.2.1, and I've seen that Pandas API on Spark should be included since 3.2

I also tried to install it from my config file, the one I use to create the conda env, but it's not working 😞

-werners- · ‎09-29-2022

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

View solution in original post

-werners- · ‎09-29-2022

it should be yes.

can you elaborate on how you create your notebook (and the conda env you talk about)?

Databricks Community

Installing pyspark.pandas

DAIS 2026 | Day 3 Recap: That's a wrap. Empty boxes & full hearts.

‌✨‌ DAIS 2026 Community Virtual Contest – Winners Announced! 🏆

🌟 Community Pulse: Your Weekly Roundup! June 08 – 14, 2026

Solution Accelerator Series | Building a Chatbot With Large Language Models (LLMs)

Build apps without jumping through hoops