mark_ott
Databricks Employee
Databricks Employee

The error message you are observing in your DLT pipeline logs, specifically:

text
java.lang.NumberFormatException: For input string: "Fri, 29 Aug 2025 09:02:07 GMT"

suggests that something in your pipeline (likely library or code responsible for Azure Data Lake Gen2 (ADL Gen2) operations) is attempting to parse a date string as a numeric value, such as a timestamp or epoch time, and failing.

Root Cause

  • The error originates from the NativeADLGen2RequestComparisonHandler, part of the (likely Databricks/Spark) library that talks to Azure Data Lake Gen2.

  • The handler is expecting a numeric value (usually, a Unix timestamp, e.g., 1693296000), but it's receiving a formatted date string, e.g., "Fri, 29 Aug 2025 09:02:07 GMT".

Why is this happening now?

  • Library Update or Backend Change: The format of the value returned (or logged) may have changed either due to a code/library update or a backend change on Microsoft/Azure's side.

  • Misconfigured Pipeline or Upstream Data Issue: If any feature in your pipeline switches format or passes metadata with invalid types, it can also cause this type of error.

  • External API/Response Change: If ADL Gen2 or some middleware changed how it formats headers or metadata (for instance, Last-Modified or similar fields), this could result in the current code being unable to handle the new format.

Why execution is unaffected

  • This appears to be a logging or comparison-related issue, where the function is intended for debug/logging or non-essential request validation. It catches and logs the error but does not bubble it up or halt processing.

  • The error might occur after streaming "triggers" or update cycles, explaining the high frequency.

How to Fix or Mitigate

Immediate Workarounds:

  • Since the error doesn't break functionality, you may continue unaffected, though frequent logging can obscure real issues or fill up logs quickly.

  • If possible, reduce the log level for this handler in your log4j configuration to avoid clutter in your logs.

Long-term Solutions:

  • Check for library updates: Make sure your Databricks, Spark, or any custom connector libraries for ADL Gen2 are up to date. Recent versions may have patched this issue if it’s a known bug.

  • Raise a support ticket: If using a managed service like Databricks, raise a ticket with them, quoting the handler name and error. They may have knowledge of recent changes.

  • Check pipeline config and metadata: Make sure that all fields, especially those involving timestamps or modification dates, are passed in the correct expected format.

  • Review release notes for Spark, Databricks Runtime, and Azure ADLS SDKs for any breaking changes related to date/time handling in the past few months.

Additional Notes

  • If you're using custom code/logic for ADLS file interactions, audit any places where you serialize or deserialize timestamps.

  • If this is strictly happening after certain DLT operations, consider temporarily disabling streaming tasks or checkpointing to see if the error stops.

This is a known class of error during changes in serialization/deserialization of metadata fields across cloud storage SDKs. Ensuring version compatibility and reporting to your cloud provider can help resolve it at the root if it's a backend or SDK bug.