Hey,
we're observing the following problem when trying to run a notebook on a cluster with libraries installed:
All notebook cells are stuck in "Waiting to run" (also ones containing only a '1+1' statement).
When running the cluster without installing the packages, notebook execution works fine.
We're using a private setup with public network access disabled in Azure Databricks.
We tried this on both a Shared Cluster and a Single User Cluster.
The runtime version is 13.3 LTS with Worker/Driver Type of Standard_DS3_v2 and 2 workers.
Termination is configured after 30 minutes of inactivity.
The packages we are trying to install are spark-xml and a custom written plain python package without dependencies. Both packages are uploaded as files to the workspace (for the Single User Cluster) and in a Unity Catalog Volume (for the Shared Cluster).
In the Event log, the clusters show the expected STARTING -> RUNNING -> DRIVER HEALTHY messages. The checkmarks in the Compute UI and on the library page are all green.
One thing that catched my eye when checking the Driver Logs:
For a cluster without packages installed (i.e. running normally), the Standard error looks like this:
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
Wed Nov 15 08:28:57 2023 Connection to spark from PID 1157
Wed Nov 15 08:28:57 2023 Initialized gateway on port 39503
Wed Nov 15 08:28:58 2023 Connected to spark.
On a cluster with packages installed, the last three lines are missing.
Also, after a few minutes, the Event log shows the message METASTORE DOWN for the cluster with libraries installed.
I would greatly appreciate any thoughts on this!