Heya 🙂
I have a workflow in databricks with 2 tasks. They are configured to run on the same job cluster, and the second task depends on the first.
I have a weird behavior that happened twice now - the job takes a long time (it usually finishes within 30 minutes) but it has been running for more than 10 hours. The weird behavior is that the first task is on "Running" state, but when I look at the spark UI I dont see any jobs/stages/tasks/sql queries - expect from the fact that all of the executers are up and running.
In both cases I saw the following message in the error logs:
```
appcds_setup elapsed time: 0.000
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
Tue Oct 15 06:08:16 2024 Connection to spark from PID 1478
Tue Oct 15 06:08:16 2024 Initialized gateway on port 38197
Tue Oct 15 06:08:17 2024 Connected to spark.
Tue Oct 15 06:08:23 2024 Connection to spark from PID 1572
Tue Oct 15 06:08:23 2024 Initialized gateway on port 45679
Tue Oct 15 06:08:23 2024 Connected to spark.
ERROR:root:KeyboardInterrupt while sending command.
Traceback (most recent call last):
File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/clientserver.py", line 536, in send_command
answer = smart_decode(self.stream.readline()[:-1])
File "/usr/lib/python3.10/socket.py", line 705, in readinto
return self._sock.recv_into(b)
KeyboardInterrupt
```
This workflow is scheduled to run every 2 hours, and it usually works fine, but it the last 3 days or so it happened twice and I didnt find anything about it.
Any ideas?