cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Workflow is stuck on the first task and doesnt do anyworkload

cool_cool_cool
New Contributor II

Heya 🙂

I have a workflow in databricks with 2 tasks. They are configured to run on the same job cluster, and the second task depends on the first.
I have a weird behavior that happened twice now - the job takes a long time (it usually finishes within 30 minutes) but it has been running for more than 10 hours. The weird behavior is that the first task is on "Running" state, but when I look at the spark UI I dont see any jobs/stages/tasks/sql queries - expect from the fact that all of the executers are up and running.

In both cases I saw the following message in the error logs:

```

appcds_setup elapsed time: 0.000
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
Tue Oct 15 06:08:16 2024 Connection to spark from PID 1478
Tue Oct 15 06:08:16 2024 Initialized gateway on port 38197
Tue Oct 15 06:08:17 2024 Connected to spark.
Tue Oct 15 06:08:23 2024 Connection to spark from PID 1572
Tue Oct 15 06:08:23 2024 Initialized gateway on port 45679
Tue Oct 15 06:08:23 2024 Connected to spark.
ERROR:root:KeyboardInterrupt while sending command.
Traceback (most recent call last):
File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/clientserver.py", line 536, in send_command
answer = smart_decode(self.stream.readline()[:-1])
File "/usr/lib/python3.10/socket.py", line 705, in readinto
return self._sock.recv_into(b)
KeyboardInterrupt

```

 

This workflow is scheduled to run every 2 hours, and it usually works fine, but it the last 3 days or so it happened twice and I didnt find anything about it.

Any ideas?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group