cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Problems with cluster shutdown in DLT

LucasAntoniolli
Visitor

[Issue] DLT finishes processing, but cluster remains active due to log write error

Hi everyone, I'm running into a problem with my DLT pipeline and was hoping someone here could help or has experienced something similar.

Problem Description

The pipeline completes data processing successfully, but the cluster stays active for a long time, even though no data is being processed anymore.

After checking the Driver Logs, I noticed that the system keeps trying to write execution logs and cluster information, but encounters an error each time. As a result, it retries every minute and ends up stuck in this loop.

Error Snippet 

09/25/12 11:13:57 ERROR NativeADLGen2RequestComparisonHandler: Error in request comparison
java.lang.NumberFormatException: For input string: "Fri, 12 Sep 2025 11:13:58 GMT"
at java.base/java.lang.Long.parseLong(Long.java:711)
...
at com.databricks.sql.io.NativeADLGen2RequestComparisonHandler.do Handle(NativeADLGen2RequestComparisonHandler.Scala:94) 

It seems that when DLT tries to write to its own event log, it first attempts to read the current log state (e.g., Loading version 306944). The bug appears during this read operation, where it throws a NumberFormatException when parsing a timestamp.

Observations

  • The error does not crash the pipeline, but it seems to trigger a retry mechanism.

  • This leads to a loop: it tries to read → fails → waits → tries again — keeping the cluster alive unnecessarily.

Question

Has anyone else faced this issue? Any idea how to work around it or resolve it?

Thanks in advance!

3 REPLIES 3

nayan_wylde
Honored Contributor II

Here are some quick workarounds that you can try

1. Development mode keeps a cluster warm for rapid iteration. Production mode stops the cluster right after the run finishes. If you must stay in dev mode, tune the pipelines.clusterShutdown.delay so the cluster doesn’t linger. Change the mode for cost savings.

2. In the driver logs, you’ll see the NumberFormatException repeating roughly every minute even after the pipeline reports “completed”. That’s the smoking gun.  If you’re on a recent DBR (e.g., 15.x/16.x), try pinning the pipeline to DBR 14.3 LTS or, conversely, to the latest LTS to see if the ADLS client code path differs.

I tested returning the LTS to version 15.4 where the problem was not occurring (current version is 16.4) but in the pipeline it is not accepting to fix the LTS to a previous version, I tried to return it using cluster policies but it automatically pulls the latest version. In the Pipeline in the channel option there are only two options, current and preview, causing the LTS that I put in the policy to be ignored. I also tested putting the LEGACY runtime in the JSON but the DLT no longer accepts this LEGACY parameter.

nayan_wylde
Honored Contributor II

Can you please try one more option. If you’re on Preview, move to Current (or vice versa). Sometimes the regression only exists in one channel.