Databricks Community

maninegi05 · 4 weeks ago

Hello,

Suddenly our DLT pipelines we're getting failures saying that

LookupError: Traceback (most recent call last):

    result_df = result_df.withColumn("input_file_path", col("_metadata.file_path")).withColumn(
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^

LookupError: <ContextVar name='parent_header'

For the failing pipelines, when looking at the Update Details - > Logs -> Configuration tab, that the failed pipelines take runtime "dlt:16.4.10-delta-pipelines-dlt-release-dp-20251009-rc0-commit-8c6b818-image-4a72116".

Did something change on the Databricks end? For us nothing changed in the settings and seems like a sudden disruption of DLT pipelines that were previously just running successfully.

Thank you in advance.

Louis_Frolio · 3 weeks ago

Greetings @maninegi05 , I did some digging internally and I believe some recent changes to the DLT image may be to blame. We are aware of regression issue and are actively working to address them.

TL/DR

Why you might see “LookupError: ContextVar 'parent_header'” at that line

This specific error originates from Python’s contextvars usage in IPython/Jupyter kernels. In notebook-driven pipelines, certain libraries (logging, display hooks, pretty printers, or transitive dependencies) can attempt to access a Jupyter context that isn’t present in the DLT execution environment, and a change in the 16.4.10 image appears to have made this interaction more brittle. The symptom can show up at innocuous lines (like withColumn(col("_metadata.file_path"))) because the failure is triggered when the runtime tries to format or log dataframe expression objects, not necessarily by the Spark API itself. The above runtime-level changes and regressions match the timeframe of your disruption.

Mitigations to help unblock you

Try the following low-risk steps while the hotfix completes across regions:

If you’re on the Preview channel, switch the pipeline to the Current channel for production workloads. DLT does not let you pick an exact DBR; channel selection is the supported control surface.
Replace _metadata.file_path with the built-in input_file_name() for now:
```
from pyspark.sql import functions as F

result_df = result_df.withColumn("input_file_path", F.input_file_name())
```
This often sidesteps the Jupyter contextvar involvement and is compatible with Auto Loader/file-based sources, even if it’s not identical to _metadata.file_path in all edge cases.
Scan for implicit IPython/Jupyter hooks in your pipeline notebooks or shared utils:
- Avoid importing IPython, using display hooks, or pretty-printing dataframe plans/columns during pipeline initialization.
- Check logging formatters or decorators that might pull in IPython pretty printers.
If the failures persist, collect and share these details to expedite an engineering review:
- Pipeline ID(s), workspace and region, the exact image key you cited, and the full stack trace from Update Details → Logs.
- Whether code paths pass non-boolean values (like None) to @dlt.table(... temporary=...) or private=... — one 16.4.10 regression specifically affected Python typing in those decorators and was hotfixed.
- Whether any schema inference vs declared schema mismatches appeared after the image upgrade (there was a 16.4.10 issue in that space that engineering has been mitigating).

What you can expect next

Engineering has been actively deploying fixes for the 16.4.10 image regressions; if your workspace hasn’t picked up the hotfix yet, the above mitigations should limit disruption in the interim.
If this remains blocking, an Engineering Support ticket with the artifacts above will allow Lakeflow/DLT oncall to confirm whether your workspace needs a targeted pin/rollback or to apply the already-available hotfix in your region.

Notes on runtime control in DLT

You can’t directly select a DBR version for DLT pipelines; use channels (Current/Preview). Databricks recommends Current for production.

Hope this helps get you to a quick resolution.

Cheers, Louis.

View solution in original post

Khaja_Zaffer · 3 weeks ago

May be there is internally some updates from databricks

Can Check and Switch Your Pipeline Channel, In the DLT pipeline settings (under Advanced > Channel), confirm if it's set to "Preview". Switch to "Current" for a more stable engine version, then trigger a full refresh. This often resolves issues from preview builds.

If it's already done then someone from databricks can answer

maninegi05 · 3 weeks ago

Thanks for the reply, but its already set to current.

Louis_Frolio · 3 weeks ago

Greetings @maninegi05 , I did some digging internally and I believe some recent changes to the DLT image may be to blame. We are aware of regression issue and are actively working to address them.

TL/DR

Why you might see “LookupError: ContextVar 'parent_header'” at that line

This specific error originates from Python’s contextvars usage in IPython/Jupyter kernels. In notebook-driven pipelines, certain libraries (logging, display hooks, pretty printers, or transitive dependencies) can attempt to access a Jupyter context that isn’t present in the DLT execution environment, and a change in the 16.4.10 image appears to have made this interaction more brittle. The symptom can show up at innocuous lines (like withColumn(col("_metadata.file_path"))) because the failure is triggered when the runtime tries to format or log dataframe expression objects, not necessarily by the Spark API itself. The above runtime-level changes and regressions match the timeframe of your disruption.

Mitigations to help unblock you

Try the following low-risk steps while the hotfix completes across regions:

If you’re on the Preview channel, switch the pipeline to the Current channel for production workloads. DLT does not let you pick an exact DBR; channel selection is the supported control surface.
Replace _metadata.file_path with the built-in input_file_name() for now:
```
from pyspark.sql import functions as F

result_df = result_df.withColumn("input_file_path", F.input_file_name())
```
This often sidesteps the Jupyter contextvar involvement and is compatible with Auto Loader/file-based sources, even if it’s not identical to _metadata.file_path in all edge cases.
Scan for implicit IPython/Jupyter hooks in your pipeline notebooks or shared utils:
- Avoid importing IPython, using display hooks, or pretty-printing dataframe plans/columns during pipeline initialization.
- Check logging formatters or decorators that might pull in IPython pretty printers.
If the failures persist, collect and share these details to expedite an engineering review:
- Pipeline ID(s), workspace and region, the exact image key you cited, and the full stack trace from Update Details → Logs.
- Whether code paths pass non-boolean values (like None) to @dlt.table(... temporary=...) or private=... — one 16.4.10 regression specifically affected Python typing in those decorators and was hotfixed.
- Whether any schema inference vs declared schema mismatches appeared after the image upgrade (there was a 16.4.10 issue in that space that engineering has been mitigating).

What you can expect next

Engineering has been actively deploying fixes for the 16.4.10 image regressions; if your workspace hasn’t picked up the hotfix yet, the above mitigations should limit disruption in the interim.
If this remains blocking, an Engineering Support ticket with the artifacts above will allow Lakeflow/DLT oncall to confirm whether your workspace needs a targeted pin/rollback or to apply the already-available hotfix in your region.