cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

setup justfile command in order to launch your spark application

seefoods
Valued Contributor

Hello Guys, 


Actually, I build a just file for my project which will be execute my wheel job task using command line, but when i run my wheel task i have encountered this error. 
from pyspark.sql.connect.expressions import PythonUDFEnvironment
ImportError: cannot import name 'PythonUDFEnvironment' from 'pyspark.sql.connect.expressions

Someone know how to solve this issue? 

justfile 

default:
@just --list

install:
poetry install

poetry-remove-pyspark:
poetry show pyspark # Is PySpark already installed?

poetry-uninstall-pyspark:
poetry remove pyspark

test-connexion-databricks:
databricks-connect test

poetry-install-pyspark:
poetry add "pyspark (>=3.5.5,<=4.1.1)"

poetry-add-pyspark-connect:
poetry add databricks-connect@~17.3 # Or X.Y to match your cluster version.

edpiqual-example:
python -c "from edpiqual.entrypoint import main; main()" \
--ingestion_catalog_name default \
--product_catalog_name workspace \
--data_bundle Transactions --data_bundle_object test1\
--format csv --source_type autoloader --trigger availableNow \
--agreement_version 1 --start_date 2026-03-08T04:01:40.285Z \
--write_mode upsert --upsert_column reference_id


this is my just file 

2 REPLIES 2

mderela
Contributor

This error typically happens when there’s a version mismatch between your local pyspark installation and databricks-connect.
PythonUDFEnvironment was introduced in a specific version of the Databricks Connect SDK. If you have a standalone pyspark package installed alongside databricks-connect, it shadows the correct one bundled with the connector.

 

If this is the main issue: try to remove pyspark, remove only databricks-connect. Via poetry, verify if pyspark is not installed (poetry show). 

anuj_lathi
Databricks Employee
Databricks Employee

This ImportError happens because you have both standalone pyspark and databricks-connect installed, and they conflict with each other. databricks-connect bundles its own version of PySpark internally — when the standalone pyspark package is also present, Python imports from the wrong one, which doesn't have PythonUDFEnvironment.

Fix: Remove the standalone pyspark and only use databricks-connect:

# Remove standalone pyspark first

poetry-remove-pyspark:

    poetry remove pyspark

 

# Install databricks-connect (which bundles compatible pyspark)

poetry-add-databricks-connect:

    poetry add databricks-connect@~17.3

 

# Verify no standalone pyspark is installed

check-deps:

    poetry show pyspark 2>&1 || echo "OK: no standalone pyspark"

    poetry show databricks-connect

 

Key rules to avoid this:

  1. Never install `pyspark` alongside `databricks-connect` — they conflict
  2. databricks-connect version must match your cluster DBR version (e.g., ~17.3 for DBR 17.3)
  3. After removing pyspark, clear any cached .pyc files: find . -name "*.pyc" -delete

If you need standalone PySpark for local-only testing (no Databricks), keep them in separate Poetry dependency groups and never activate both simultaneously.

Anuj Lathi
Solutions Engineer @ Databricks