Databricks Community

DataGuy2 · ‎01-15-2026

Hello Databricks Community,

I’m facing multiple issues while working in Azure Databricks notebooks, and I’d appreciate guidance or troubleshooting suggestions.

Issue 1: Failed to reconnect

While running a notebook, I frequently see a “Failed to reconnect” message in the UI.

Error details:

org.apache.http.conn.HttpHostConnectException: Connect to westus.azuredatabricks.net:443 [10.115.69.106] failed: Connection timed out (Connection timed out)

This causes the notebook to stay in a Waiting state and prevents execution.

Issue 2: Network connection timeout

The error suggests a timeout while connecting to the Databricks control plane (westus.azuredatabricks.net:443).
This occurs intermittently and impacts productivity.

Issue 3: Notebooks taking longer than expected to run

Even when the notebook starts successfully:

Jobs take significantly longer than expected
Some cells remain in Waiting state for a long time
Performance degradation is noticeable compared to earlier runs
Environment details:
- Azure Databricks (Region: West US)
- Notebook language: Python (PySpark)
- Workload includes reading/transformation and writing data to Snowflake
- Issue observed across multiple notebooks

emma_s · ‎01-20-2026

Hi, there a few things that could cause these types of problems.

1. Azure service availablity (when these happen check the Azure service availability to make sure there are no outages)

2. Local network connection problems (verify all your other internet connections are working fine)

3. The biggest and most likely cause is that the clusters your trying to run on are incorrectly sized for the jobs, so they are slow or you're unable to connect at all.

I would recommend starting by looking at the cluster utilisation metrics in the UI, you should be able to see whether the clusters you're trying to run are over utilised. Look for the following

CPU Usage: Consistently high (>80%) usage may indicate under-provisioned nodes.
Memory Usage: Look for memory saturation, which can lead to task spill or slowdowns.
Disk I/O: High or maxed-out disk I/O could be a bottleneck if your tasks read/write intensive datasets.

View solution in original post

emma_s · ‎01-20-2026

Hi, there a few things that could cause these types of problems.

1. Azure service availablity (when these happen check the Azure service availability to make sure there are no outages)

2. Local network connection problems (verify all your other internet connections are working fine)

3. The biggest and most likely cause is that the clusters your trying to run on are incorrectly sized for the jobs, so they are slow or you're unable to connect at all.

I would recommend starting by looking at the cluster utilisation metrics in the UI, you should be able to see whether the clusters you're trying to run are over utilised. Look for the following

CPU Usage: Consistently high (>80%) usage may indicate under-provisioned nodes.
Memory Usage: Look for memory saturation, which can lead to task spill or slowdowns.
Disk I/O: High or maxed-out disk I/O could be a bottleneck if your tasks read/write intensive datasets.