โ07-15-2025 02:11 AM
We have a job submitted through the Spark Connect API, and running on Serverless Compute.
The job got canceled twice and left a total of 14 queries orphan, they are in a weird state because the running time is not increasing, but they are there showing up as running.
There is no UI for serverless compute, the spark ui is not available either given that the compute is managed by databricks too, the api for cancelling the queries returns an empty response which apparently is supposed to, but the queries are still there in a running state.
Any way to cancel these queries? There is no cancel button in the UI either..
โ07-15-2025 02:37 AM
โ07-15-2025 02:52 AM
Hey @Khaja_Zaffer, appreciate your reply, but the cancel query does not work, it is a serverless compute, so if a new session is created, it can't communicate with the old one anymore.
โ07-15-2025 02:38 AM
You can configure a timeout for your Spark queries by setting the spark.databricks.queryWatchdog.timeoutInSeconds configuration property. This will automatically terminate any query that exceeds the specified execution time, preventing them from becoming long-running orphans.
โ07-15-2025 02:54 AM
We do have timeouts, and there also default timeouts too otherwise, the issue is not that the query is running for longer than that timeout, but that is in this weird state where it shows as running but it is not getting updated metrics of running time or anything like that
โ07-15-2025 02:56 AM
I think we need to check internals on this issue.
better create a ticket with databricks.
Please raise the ticket using this lik https://help.databricks.com/s/contact-us?ReqType=training Please explain the issue clearly so that it will be easy for supoort team to help easily.
โ07-15-2025 02:59 AM
I already did and the support redirected me here. the ticket I opened is: 00699724
โ07-15-2025 03:18 AM
Just asking are you using Azure cloud?
โ07-15-2025 03:26 AM
nope, AWS
โ07-15-2025 03:21 AM
ALSO, did you make any recent code changes or network changes?
โ07-15-2025 03:28 AM
The only one was to increase the "spark.databricks.execution.timeout" because the query needed more than 2.5h unfortunately
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now