11-04-2022 12:54 PM
Hello,
I'm trying to use Databricks on Azure with a Spark structured streaming job and an having very mysterious issue.
I boiled the job down it it's basics for testing, reading from a Kafka topic and writing to console in a forEachBatch.
On local, everything works fine indefinately.
On Databricks, the task terminates after just over 5 minutes with a "Cancelled" status.
There are no errors in the log, just this, which appears to be a graceful shutdown request of some kind, but I don't know where it's coming from
22/11/04 18:31:30 INFO DriverCorral$: Cleaning the wrapper ReplId-1ea30-8e4c0-48422-a (currently in status Running(ReplId-1ea30-8e4c0-48422-a,ExecutionId(job-774316032912321-run-84401-action-5645198327600153),RunnableCommandId(9102993760433650959)))
22/11/04 18:31:30 INFO DAGScheduler: Asked to cancel job group 2207618020913201706_9102993760433650959_job-774316032912321-run-84401-action-5645198327600153
22/11/04 18:31:30 INFO ScalaDriverLocal: cancelled jobGroup:2207618020913201706_9102993760433650959_job-774316032912321-run-84401-action-5645198327600153
22/11/04 18:31:30 INFO ScalaDriverWrapper: Stopping streams for commandId pattern: CommandIdPattern(2207618020913201706,None,Some(job-774316032912321-run-84401-action-5645198327600153)).
22/11/04 18:31:30 INFO DatabricksStreamingQueryListener: Stopping the stream [id=d41eff2a-4de6-4f17-8d1c-659d1c1b8d98, runId=5bae9fb4-b5e1-45a0-af1e-a2f2553592c9]
22/11/04 18:31:30 INFO DAGScheduler: Asked to cancel job group 5bae9fb4-b5e1-45a0-af1e-a2f2553592c9
22/11/04 18:31:30 INFO TaskSchedulerImpl: Cancelling stage 366
22/11/04 18:31:30 INFO TaskSchedulerImpl: Killing all running tasks in stage 366: Stage cancelled
22/11/04 18:31:30 INFO MicroBatchExecution: QueryExecutionThread.interruptAndAwaitExecutionThreadTermination called with streaming query exit timeout=15000 ms
Any thoughts?
11-08-2022 11:24 AM
Hi @JESSE LANCASTER, Please share your code stack here.
11-08-2022 11:27 AM
Hi @JESSE LANCASTER, Structured Streaming provides fault-tolerance and data consistency for streaming queries; using Databricks workflows, you can easily configure your Structured Streaming queries to restart on failure automatically.
You can restart the query after a failure by enabling checkpointing for a streaming query.
The restarted query continues where the failed one left off.
11-09-2022 08:55 AM
Hi @JESSE LANCASTER , We haven’t heard from you since the last response from me , and I was checking back to see if my suggestions helped you.
Or else, If you have any solution, please share it with the community, as it can be helpful to others.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
11-09-2022 09:16 AM
Kaniz,
Unfortunately that information is not useful.
1). I'm familiar with structured streaming and checkpoints, I've developed with spark for many years, just not on Databricks
2) This doesn't address the reason for the failure, a streaming job should run without interruption and not have to be restarted every 5 minutes
3) I tried setting up a retry policy, however it doesn't trigger (presumably because it's a cancellation according to the status not a failure) so even if I wanted to just restart the job every 5 minutes with a retry policy I cannot.
11-09-2022 09:19 AM
Hi @JESSE LANCASTER, Thank you for your response. Can you please share your code stack here?
11-09-2022 10:05 AM
Scala, Spark with EventHubs via Kafka interface
11-11-2022 02:03 PM
??
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.