cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

databricks.sql.exc.RequestError OpenSession error None

Etyr
Contributor

I'm trying to access to a Databricks SQL Warehouse with python. I'm able to connect with a token on a Compute Instance on Azure Machine Learning. It's a VM with conda installed, I create an env in python 3.10.

from databricks import sql as dbsql

dbsql.connect(
        server_hostname="databricks_address",
        http_path="http_path",
        access_token="dapi....",
    )

But once I create a job and I Launch it in a compute Cluster with a custom Dockerfile

FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest


ENV https_proxy http://xxxxxx:yyyy
ENV no_proxy xxxxxx

RUN mkdir -p /usr/share/man/man1

RUN wget https://download.java.net/java/GA/jdk19.0.1/afdd2e245b014143b62ccb916125e3ce/10/GPL/openjdk-19.0.1_linux-x64_bin.tar.gz \
    && tar xvf openjdk-19.0.1_linux-x64_bin.tar.gz \
    && mv jdk-19.0.1 /opt/

ENV JAVA_HOME /opt/jdk-19.0.1
ENV PATH="${PATH}:$JAVA_HOME/bin"

# Install requirements with pip conf for Jfrog
COPY pip.conf pip.conf
ENV PIP_CONFIG_FILE pip.conf


# python installs (python 3.10 inside all azure ubuntu images)
COPY requirements.txt .
RUN pip install -r requirements.txt && rm requirements.txt

# set command
CMD ["bash"]

My image is created and starts to run my code, but fails on previous code sample. I am using the same values of https_proxy and no_poxy in my compute instance and compute cluster.

2024-01-22 13:30:13,520 - thrift_backend - Error during request to server: {"method": "OpenSession", "session-id": null, "query-id": null, "http-code": null, "error-message": "", "original-exception": "Retry request would exceed Retry policy max retry duration of 900.0 seconds", "no-retry-reason": "non-retryable error", "bounded-retry-delay": null, "attempt": "1/30", "elapsed-seconds": "846.7684090137482/900.0"}
Traceback (most recent call last):
  File "/mnt/azureml/cr/j/67f1e8c93a8942d582fb7babc030101b/exe/wd/main.py", line 198, in <module>
    main()
  File "/mnt/azureml/cr/j/67f1e8c93a8942d582fb7babc030101b/exe/wd/main.py", line 31, in main
    return dbsql.connect(
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/__init__.py", line 51, in connect
    return Connection(server_hostname, http_path, access_token, **kwargs)
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/client.py", line 235, in __init__
    self._open_session_resp = self.thrift_backend.open_session(
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 576, in open_session
    response = self.make_request(self._client.OpenSession, open_session_req)
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 505, in make_request
    self._handle_request_error(error_info, attempt, elapsed)
  File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 335, in _handle_request_error
    raise network_request_error
databricks.sql.exc.RequestError: Error during request to server

 In both, I am using the lastest version of databricks-sql-connector (3.0.1)

3 REPLIES 3

Debayan
Esteemed Contributor III

Hi, Could you please try https://github.com/databricks/databricks-sql-python/issues/23 and let us know if this helps (adding a new token)?

Hello,

I am already recreating a new token at each time I init my Spark session. I do this using the Azure's oauth2 service to get a token lasting 1 hour and then using databricks API 2.0 to generate a new PAT.
And this code is working in local and in compute instances in Azure, but not Compute Clusters.

What I also tried: To generate a token in UI, working in local, then using it in my code in my compute cluster, and not working with the above error.

Cloud it be a network issue? I'm creating both compute instance/cluster in terraform:

 

resource "azurerm_machine_learning_compute_cluster" "cluster" {
  for_each = local.compute_cluster_configurations

  name     = each.key
  location = var.context.location

  vm_priority                   = each.value.vm_priority
  vm_size                       = each.value.vm_size
  machine_learning_workspace_id = module.mlw_01.id
  subnet_resource_id            = module.subnet_aml.id

  # AML-05
  ssh_public_access_enabled = false
  node_public_ip_enabled    = false

  identity {
    type = "UserAssigned"
    identity_ids = [
      azurerm_user_assigned_identity.compute_cluster_managed_identity.id
    ]
  }

  scale_settings {
    min_node_count                       = each.value.min_node_count
    max_node_count                       = each.value.max_node_count
    scale_down_nodes_after_idle_duration = each.value.scale_down_nodes_after_idle_duration
  }
}

 

# For each user, create a compute instance
resource "azurerm_machine_learning_compute_instance" "this" {
  for_each = local.all_users

  name                          = "${split("@", trimspace(local.all_users[each.key]["user_principal_name"]))[0]}-DS2-V2"
  location                      = var.context.location
  machine_learning_workspace_id = module.mlw_01.id
  virtual_machine_size          = "STANDARD_DS2_V2"
  identity {
    type = "UserAssigned"
    identity_ids = [
      azurerm_user_assigned_identity.this[each.key].id
    ]
  }
  assign_to_user {
    object_id = each.key
    tenant_id = var.tenant_id
  }
  node_public_ip_enabled = false
  subnet_resource_id     = module.subnet_aml.id
  description            = "Compute instance generated by Terraform for : ${local.all_users[each.key]["display_name"]} | ${local.all_users[each.key]["user_principal_name"]} | ${each.key} "
}

I'm using the same subnet, so it should react the same in network.

Etyr
Contributor

The issue was that the new version of databricks-sql-connector (3.0.1) does not handle well error messages. So It gave a generic error and a timeout where it should have given me 403 and instant error message without a 900 second timeout.

https://github.com/databricks/databricks-sql-python/issues/333

I've commented on a github issue for more debugging.

But I'm still wondering why I got 403 error from my compute cluster and not my compute instance where they have the same roles. So I had to add a role on the group handling both Service Principal in databricks to user SQL warehouse. Which is odd.

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group