cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Cannot import pyspark.pipelines module

Saf4Databricks
New Contributor III

Question: What could be a cause of the following error of my code in a Databricks notebook, and how can we fix the error? I'm using latest Free Edition of Databricks that has runtime version 17.2 and PySpark version 4.0.0.

Error:
ImportError: cannot import name 'pipelines' from 'pyspark' (/databricks/python/lib/python3.12/site-packages/pyspark/__init__.py)

Following is the top line of the Databricks notebook that throws the error:

from pyspark import pipelines as dp

NOTE: According to the following quote from Basics of Python for pipeline development from Databricks' team, we need to import the above module for creating Lakeflow Declarative pipelines using Python:

All Lakeflow Declarative Pipelines Python APIs are implemented in the `pyspark.pipelines` module.

Also, as we know PySpark is an integral and primary programming interface used within the Databricks platform. So, what I may be missing here that causes the error?

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @Saf4Databricks - Yes this is for LDP only as you can see here in the heading. And as you can see here that point number 1 and 2 is where you create and copy the notebook but you do not run it. In point 3 you create the pipeline and then you can run it like I showed you in my previous post.

dkushari_1-1760829574829.png

 

dkushari_0-1760829525424.png

 

View solution in original post

3 REPLIES 3

dkushari
Databricks Employee
Databricks Employee

Hi @Saf4Databricks - Are you trying to use it from a standalone Databricks notebook? You should only use it from with Lakeflow Declarative Pipeline (LDP). The link you shared is about LDP. Here is an example where I used it.

dkushari_0-1760814706612.png

dkushari_1-1760814715212.png

 

 

Hi @dkushari Thank you for responding. I'm working on the following tutorial from your team: What is change data capture (CDC)? | Databricks on AWS. Code in step 1 runs fine, but the code in step 2 fails at the following line on top:

from pyspark import pipelines as dp

What you would you suggest I should do to make this tutorial from your Databricks team work?

Hi @Saf4Databricks - Yes this is for LDP only as you can see here in the heading. And as you can see here that point number 1 and 2 is where you create and copy the notebook but you do not run it. In point 3 you create the pipeline and then you can run it like I showed you in my previous post.

dkushari_1-1760829574829.png

 

dkushari_0-1760829525424.png