cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Declarative Pipeline error: Name 'kdf' is not defined. Did you mean: 'sdf'

liquibricks
Contributor

We have a Lakeflow Spark Declarative Pipeline using the new PySpark Pipelines API. This was working fine until about 7am (Central European) this morning when the pipeline started failing with a PYTHON.NAME_ERROR: name 'kdf' is not defined. Did you mean: 'sdf'?

The code has not changed and nothing in our infrastructure has changed, so i'm not sure why this has suddenly started happening. I've raised an error with Azure (we're on Azure Databricks), but they are slow so i'm shouting into the void here. Any suggestions?  😅

 
Usage:
 
      from pyspark import pipelines as dp
...
      @dp.table(name=f"{mycatalog}.{myschema}.{mytable}")
      def create_streaming_table(😞
         return (
           ....
         )
9 REPLIES 9

liquibricks
Contributor

The Azure region (Norway East) might have something to do with it. An example pipeline works fine in Sweden Central.

liquibricks
Contributor

Another clue: the SQL API seems to work, while a Python-based pipeline still fails with this eror.

zkaliszamisza
New Contributor

Hi, we have the same issue - could you share the json with detailed error?
Does it by any chance mention error in line 79?

I'm not sure if i can share the complete json. For us the json error refers to line 99, which is a blank link in the src file it refers to. 

zkaliszamisza
New Contributor

For us it happened in westeurope around the same time

Is it still occurring for you today (+24 hours after the start)? It is for us.

It is - we have raised a ticket to Databricks

liquibricks
Contributor

It turns out this problem was caused by a package that was pip installed using an init script. This package had for some reason started pulling in pandas 3.x (despite the fact that the package itself had not been updated), and our Databricks contact informed us that DLT does not support Pandas 3 at this time (at least until the next version of the DBR). Once we pinned the pandas version to < 3 then the pipelines started working again.

No idea why pandas 3 was suddenly chosen, nor why the error message is so cryptic, but glad we found a solution in the end!

yassine_eal
Visitor

 

Hi, I ran into the same error starting late last week.

The issue was caused by dependency version conflicts between my custom package and the libraries preinstalled on the Databricks serverless environment.

I fixed it by pinning all my package dependencies to the versions already provided by the serverless cluster, which resolved the error.