Re: Cannot import pyspark.pipelines module

Saf4Databricks · ‎10-17-2025

Question: What could be a cause of the following error of my code in a Databricks notebook, and how can we fix the error? I'm using latest Free Edition of Databricks that has runtime version 17.2 and PySpark version 4.0.0.

Error:
ImportError: cannot import name 'pipelines' from 'pyspark' (/databricks/python/lib/python3.12/site-packages/pyspark/__init__.py)

Following is the top line of the Databricks notebook that throws the error:

from pyspark import pipelines as dp

NOTE: According to the following quote from Basics of Python for pipeline development from Databricks' team, we need to import the above module for creating Lakeflow Declarative pipelines using Python:

All Lakeflow Declarative Pipelines Python APIs are implemented in the `pyspark.pipelines` module.

Also, as we know PySpark is an integral and primary programming interface used within the Databricks platform. So, what I may be missing here that causes the error?

dkushari · ‎10-18-2025

Hi @Saf4Databricks - Are you trying to use it from a standalone Databricks notebook? You should only use it from with Lakeflow Declarative Pipeline (LDP). The link you shared is about LDP. Here is an example where I used it.

Saf4Databricks · ‎10-18-2025

Hi @dkushari Thank you for responding. I'm working on the following tutorial from your team: What is change data capture (CDC)? | Databricks on AWS. Code in step 1 runs fine, but the code in step 2 fails at the following line on top:

from pyspark import pipelines as dp

What you would you suggest I should do to make this tutorial from your Databricks team work?

dkushari · ‎10-18-2025

Hi @Saf4Databricks - Yes this is for LDP only as you can see here in the heading. And as you can see here that point number 1 and 2 is where you create and copy the notebook but you do not run it. In point 3 you create the pipeline and then you can run it like I showed you in my previous post.

View solution in original post