4 weeks ago - last edited 4 weeks ago
Hello,
Suddenly our DLT pipelines we're getting failures saying that
LookupError: Traceback (most recent call last):
result_df = result_df.withColumn("input_file_path", col("_metadata.file_path")).withColumn(
^^^^^^^^^^^^^^^^^^^^^^^^^^
LookupError: <ContextVar name='parent_header'For the failing pipelines, when looking at the Update Details - > Logs -> Configuration tab, that the failed pipelines take runtime "dlt:16.4.10-delta-pipelines-dlt-release-dp-20251009-rc0-commit-8c6b818-image-4a72116".
Did something change on the Databricks end? For us nothing changed in the settings and seems like a sudden disruption of DLT pipelines that were previously just running successfully.
Thank you in advance.
3 weeks ago
Greetings @maninegi05 , I did some digging internally and I believe some recent changes to the DLT image may be to blame. We are aware of regression issue and are actively working to address them.
TL/DR
This specific error originates from Python’s contextvars usage in IPython/Jupyter kernels. In notebook-driven pipelines, certain libraries (logging, display hooks, pretty printers, or transitive dependencies) can attempt to access a Jupyter context that isn’t present in the DLT execution environment, and a change in the 16.4.10 image appears to have made this interaction more brittle. The symptom can show up at innocuous lines (like withColumn(col("_metadata.file_path"))) because the failure is triggered when the runtime tries to format or log dataframe expression objects, not necessarily by the Spark API itself. The above runtime-level changes and regressions match the timeframe of your disruption.
Try the following low-risk steps while the hotfix completes across regions:
If you’re on the Preview channel, switch the pipeline to the Current channel for production workloads. DLT does not let you pick an exact DBR; channel selection is the supported control surface.
Replace _metadata.file_path with the built-in input_file_name() for now:
from pyspark.sql import functions as F
result_df = result_df.withColumn("input_file_path", F.input_file_name())
This often sidesteps the Jupyter contextvar involvement and is compatible with Auto Loader/file-based sources, even if it’s not identical to _metadata.file_path in all edge cases.
Scan for implicit IPython/Jupyter hooks in your pipeline notebooks or shared utils:
If the failures persist, collect and share these details to expedite an engineering review:
None) to @dlt.table(... temporary=...) or private=... — one 16.4.10 regression specifically affected Python typing in those decorators and was hotfixed.Hope this helps get you to a quick resolution.
Cheers, Louis.
3 weeks ago
May be there is internally some updates from databricks
Can Check and Switch Your Pipeline Channel, In the DLT pipeline settings (under Advanced > Channel), confirm if it's set to "Preview". Switch to "Current" for a more stable engine version, then trigger a full refresh. This often resolves issues from preview builds.
If it's already done then someone from databricks can answer
3 weeks ago
Thanks for the reply, but its already set to current.
3 weeks ago
Greetings @maninegi05 , I did some digging internally and I believe some recent changes to the DLT image may be to blame. We are aware of regression issue and are actively working to address them.
TL/DR
This specific error originates from Python’s contextvars usage in IPython/Jupyter kernels. In notebook-driven pipelines, certain libraries (logging, display hooks, pretty printers, or transitive dependencies) can attempt to access a Jupyter context that isn’t present in the DLT execution environment, and a change in the 16.4.10 image appears to have made this interaction more brittle. The symptom can show up at innocuous lines (like withColumn(col("_metadata.file_path"))) because the failure is triggered when the runtime tries to format or log dataframe expression objects, not necessarily by the Spark API itself. The above runtime-level changes and regressions match the timeframe of your disruption.
Try the following low-risk steps while the hotfix completes across regions:
If you’re on the Preview channel, switch the pipeline to the Current channel for production workloads. DLT does not let you pick an exact DBR; channel selection is the supported control surface.
Replace _metadata.file_path with the built-in input_file_name() for now:
from pyspark.sql import functions as F
result_df = result_df.withColumn("input_file_path", F.input_file_name())
This often sidesteps the Jupyter contextvar involvement and is compatible with Auto Loader/file-based sources, even if it’s not identical to _metadata.file_path in all edge cases.
Scan for implicit IPython/Jupyter hooks in your pipeline notebooks or shared utils:
If the failures persist, collect and share these details to expedite an engineering review:
None) to @dlt.table(... temporary=...) or private=... — one 16.4.10 regression specifically affected Python typing in those decorators and was hotfixed.Hope this helps get you to a quick resolution.
Cheers, Louis.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now