cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks notebook Issue

DataGuy2
New Contributor

Hello Databricks Community,

I’m facing multiple issues while working in Azure Databricks notebooks, and I’d appreciate guidance or troubleshooting suggestions.

Issue 1: Failed to reconnect

While running a notebook, I frequently see a “Failed to reconnect” message in the UI.

Error details:

org.apache.http.conn.HttpHostConnectException: Connect to westus.azuredatabricks.net:443 [10.115.69.106] failed: Connection timed out (Connection timed out)

This causes the notebook to stay in a Waiting state and prevents execution.

Issue 2: Network connection timeout

The error suggests a timeout while connecting to the Databricks control plane (westus.azuredatabricks.net:443).
This occurs intermittently and impacts productivity.

Issue 3: Notebooks taking longer than expected to run

Even when the notebook starts successfully:

  • Jobs take significantly longer than expected

  • Some cells remain in Waiting state for a long time

  • Performance degradation is noticeable compared to earlier runs

    Environment details:

    • Azure Databricks (Region: West US)

    • Notebook language: Python (PySpark)

    • Workload includes reading/transformation and writing data to Snowflake

    • Issue observed across multiple notebooks

1 ACCEPTED SOLUTION

Accepted Solutions

emma_s
Databricks Employee
Databricks Employee

Hi, there a few things that could cause these types of problems.

1. Azure service availablity (when these happen check the Azure service availability to make sure there are no outages)

2. Local network connection problems (verify all your other internet connections are working fine)

3. The biggest and most likely cause is that the clusters your trying to run on are incorrectly sized for the jobs, so they are slow or you're unable to connect at all.

I would recommend starting by looking at the cluster utilisation metrics in the UI, you should be able to see whether the clusters you're trying to run are over utilised. Look for the following 

 

  • CPU Usage: Consistently high (>80%) usage may indicate under-provisioned nodes.
  • Memory Usage: Look for memory saturation, which can lead to task spill or slowdowns.
  • Disk I/O: High or maxed-out disk I/O could be a bottleneck if your tasks read/write intensive datasets.

emma_s_0-1768932414168.png

 

 

View solution in original post

1 REPLY 1

emma_s
Databricks Employee
Databricks Employee

Hi, there a few things that could cause these types of problems.

1. Azure service availablity (when these happen check the Azure service availability to make sure there are no outages)

2. Local network connection problems (verify all your other internet connections are working fine)

3. The biggest and most likely cause is that the clusters your trying to run on are incorrectly sized for the jobs, so they are slow or you're unable to connect at all.

I would recommend starting by looking at the cluster utilisation metrics in the UI, you should be able to see whether the clusters you're trying to run are over utilised. Look for the following 

 

  • CPU Usage: Consistently high (>80%) usage may indicate under-provisioned nodes.
  • Memory Usage: Look for memory saturation, which can lead to task spill or slowdowns.
  • Disk I/O: High or maxed-out disk I/O could be a bottleneck if your tasks read/write intensive datasets.

emma_s_0-1768932414168.png