Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Showing results for 
Search instead for 
Did you mean: 

databricks.sql.exc.RequestError OpenSession error None


I'm trying to access to a Databricks SQL Warehouse with python. I'm able to connect with a token on a Compute Instance on Azure Machine Learning. It's a VM with conda installed, I create an env in python 3.10.

from databricks import sql as dbsql


But once I create a job and I Launch it in a compute Cluster with a custom Dockerfile


ENV https_proxy http://xxxxxx:yyyy
ENV no_proxy xxxxxx

RUN mkdir -p /usr/share/man/man1

RUN wget \
    && tar xvf openjdk-19.0.1_linux-x64_bin.tar.gz \
    && mv jdk-19.0.1 /opt/

ENV JAVA_HOME /opt/jdk-19.0.1

# Install requirements with pip conf for Jfrog
COPY pip.conf pip.conf

# python installs (python 3.10 inside all azure ubuntu images)
COPY requirements.txt .
RUN pip install -r requirements.txt && rm requirements.txt

# set command
CMD ["bash"]

My image is created and starts to run my code, but fails on previous code sample. I am using the same values of https_proxy and no_poxy in my compute instance and compute cluster.

2024-01-22 13:30:13,520 - thrift_backend - Error during request to server: {"method": "OpenSession", "session-id": null, "query-id": null, "http-code": null, "error-message": "", "original-exception": "Retry request would exceed Retry policy max retry duration of 900.0 seconds", "no-retry-reason": "non-retryable error", "bounded-retry-delay": null, "attempt": "1/30", "elapsed-seconds": "846.7684090137482/900.0"}
Traceback (most recent call last):
  File "/mnt/azureml/cr/j/67f1e8c93a8942d582fb7babc030101b/exe/wd/", line 198, in <module>
  File "/mnt/azureml/cr/j/67f1e8c93a8942d582fb7babc030101b/exe/wd/", line 31, in main
    return dbsql.connect(
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/", line 51, in connect
    return Connection(server_hostname, http_path, access_token, **kwargs)
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/", line 235, in __init__
    self._open_session_resp = self.thrift_backend.open_session(
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/", line 576, in open_session
    response = self.make_request(self._client.OpenSession, open_session_req)
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/", line 505, in make_request
    self._handle_request_error(error_info, attempt, elapsed)
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/", line 335, in _handle_request_error
    raise network_request_error
databricks.sql.exc.RequestError: Error during request to server

 In both, I am using the lastest version of databricks-sql-connector (3.0.1)


Esteemed Contributor III
Esteemed Contributor III

Hi, Could you please try and let us know if this helps (adding a new token)?


I am already recreating a new token at each time I init my Spark session. I do this using the Azure's oauth2 service to get a token lasting 1 hour and then using databricks API 2.0 to generate a new PAT.
And this code is working in local and in compute instances in Azure, but not Compute Clusters.

What I also tried: To generate a token in UI, working in local, then using it in my code in my compute cluster, and not working with the above error.

Cloud it be a network issue? I'm creating both compute instance/cluster in terraform:


resource "azurerm_machine_learning_compute_cluster" "cluster" {
  for_each = local.compute_cluster_configurations

  name     = each.key
  location = var.context.location

  vm_priority                   = each.value.vm_priority
  vm_size                       = each.value.vm_size
  machine_learning_workspace_id =
  subnet_resource_id            =

  # AML-05
  ssh_public_access_enabled = false
  node_public_ip_enabled    = false

  identity {
    type = "UserAssigned"
    identity_ids = [

  scale_settings {
    min_node_count                       = each.value.min_node_count
    max_node_count                       = each.value.max_node_count
    scale_down_nodes_after_idle_duration = each.value.scale_down_nodes_after_idle_duration


# For each user, create a compute instance
resource "azurerm_machine_learning_compute_instance" "this" {
  for_each = local.all_users

  name                          = "${split("@", trimspace(local.all_users[each.key]["user_principal_name"]))[0]}-DS2-V2"
  location                      = var.context.location
  machine_learning_workspace_id =
  virtual_machine_size          = "STANDARD_DS2_V2"
  identity {
    type = "UserAssigned"
    identity_ids = [
  assign_to_user {
    object_id = each.key
    tenant_id = var.tenant_id
  node_public_ip_enabled = false
  subnet_resource_id     =
  description            = "Compute instance generated by Terraform for : ${local.all_users[each.key]["display_name"]} | ${local.all_users[each.key]["user_principal_name"]} | ${each.key} "

I'm using the same subnet, so it should react the same in network.


The issue was that the new version of databricks-sql-connector (3.0.1) does not handle well error messages. So It gave a generic error and a timeout where it should have given me 403 and instant error message without a 900 second timeout.

I've commented on a github issue for more debugging.

But I'm still wondering why I got 403 error from my compute cluster and not my compute instance where they have the same roles. So I had to add a role on the group handling both Service Principal in databricks to user SQL warehouse. Which is odd.