01-22-2026 06:27 AM
We have a Lakeflow Spark Declarative Pipeline using the new PySpark Pipelines API. This was working fine until about 7am (Central European) this morning when the pipeline started failing with a PYTHON.NAME_ERROR: name 'kdf' is not defined. Did you mean: 'sdf'?
The code has not changed and nothing in our infrastructure has changed, so i'm not sure why this has suddenly started happening. I've raised an error with Azure (we're on Azure Databricks), but they are slow so i'm shouting into the void here. Any suggestions? 😅
01-26-2026 05:45 AM
It turns out this problem was caused by a package that was pip installed using an init script. This package had for some reason started pulling in pandas 3.x (despite the fact that the package itself had not been updated), and our Databricks contact informed us that DLT does not support Pandas 3 at this time (at least until the next version of the DBR). Once we pinned the pandas version to < 3 then the pipelines started working again.
No idea why pandas 3 was suddenly chosen, nor why the error message is so cryptic, but glad we found a solution in the end!
01-22-2026 06:30 AM
The Azure region (Norway East) might have something to do with it. An example pipeline works fine in Sweden Central.
01-22-2026 07:09 AM
Another clue: the SQL API seems to work, while a Python-based pipeline still fails with this eror.
01-22-2026 10:46 AM
Hi, we have the same issue - could you share the json with detailed error?
Does it by any chance mention error in line 79?
01-22-2026 01:06 PM
I'm not sure if i can share the complete json. For us the json error refers to line 99, which is a blank link in the src file it refers to.
01-22-2026 10:47 AM
For us it happened in westeurope around the same time
01-22-2026 11:07 PM
Is it still occurring for you today (+24 hours after the start)? It is for us.
01-23-2026 02:07 AM
It is - we have raised a ticket to Databricks
01-26-2026 05:45 AM
It turns out this problem was caused by a package that was pip installed using an init script. This package had for some reason started pulling in pandas 3.x (despite the fact that the package itself had not been updated), and our Databricks contact informed us that DLT does not support Pandas 3 at this time (at least until the next version of the DBR). Once we pinned the pandas version to < 3 then the pipelines started working again.
No idea why pandas 3 was suddenly chosen, nor why the error message is so cryptic, but glad we found a solution in the end!
01-27-2026 01:13 AM
Hi, I ran into the same error starting late last week.
The issue was caused by dependency version conflicts between my custom package and the libraries preinstalled on the Databricks serverless environment.
I fixed it by pinning all my package dependencies to the versions already provided by the serverless cluster, which resolved the error.
01-28-2026 10:25 AM
Thank you @yassine_eal , it was the same thing here. I haven't analyzed it yet to discover which dependency is breaking the dlt pipeline, but here are my dependencies if it helps anyone.
dependencies = [
"azure-identity>=1.25.1",
"httpx>=0.27.0",
"msal>=1.34.0",
"office365-rest-python-client>=2.6.2",
"openpyxl>=3.1.5",
"pandas>=2.3.3",
"tenacity>=9.0.0",
]
01-29-2026 02:37 AM
@dhpaulino_ you need to include the upper limit on your pandas library to ensure that v3 is not installed:
"pandas>=2.3.3,<3"