Thursday
We have a Lakeflow Spark Declarative Pipeline using the new PySpark Pipelines API. This was working fine until about 7am (Central European) this morning when the pipeline started failing with a PYTHON.NAME_ERROR: name 'kdf' is not defined. Did you mean: 'sdf'?
The code has not changed and nothing in our infrastructure has changed, so i'm not sure why this has suddenly started happening. I've raised an error with Azure (we're on Azure Databricks), but they are slow so i'm shouting into the void here. Any suggestions? 😅
Thursday
The Azure region (Norway East) might have something to do with it. An example pipeline works fine in Sweden Central.
Thursday
Another clue: the SQL API seems to work, while a Python-based pipeline still fails with this eror.
Thursday
Hi, we have the same issue - could you share the json with detailed error?
Does it by any chance mention error in line 79?
Thursday
I'm not sure if i can share the complete json. For us the json error refers to line 99, which is a blank link in the src file it refers to.
Thursday
For us it happened in westeurope around the same time
Thursday
Is it still occurring for you today (+24 hours after the start)? It is for us.
Friday
It is - we have raised a ticket to Databricks
yesterday
It turns out this problem was caused by a package that was pip installed using an init script. This package had for some reason started pulling in pandas 3.x (despite the fact that the package itself had not been updated), and our Databricks contact informed us that DLT does not support Pandas 3 at this time (at least until the next version of the DBR). Once we pinned the pandas version to < 3 then the pipelines started working again.
No idea why pandas 3 was suddenly chosen, nor why the error message is so cryptic, but glad we found a solution in the end!
15 hours ago
Hi, I ran into the same error starting late last week.
The issue was caused by dependency version conflicts between my custom package and the libraries preinstalled on the Databricks serverless environment.
I fixed it by pinning all my package dependencies to the versions already provided by the serverless cluster, which resolved the error.