databricks.sql.exc.RequestError OpenSession error None
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2024 06:59 AM
I'm trying to access to a Databricks SQL Warehouse with python. I'm able to connect with a token on a Compute Instance on Azure Machine Learning. It's a VM with conda installed, I create an env in python 3.10.
from databricks import sql as dbsql dbsql.connect( server_hostname="databricks_address", http_path="http_path", access_token="dapi....", )
But once I create a job and I Launch it in a compute Cluster with a custom Dockerfile
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu22.04:latest ENV https_proxy http://xxxxxx:yyyy ENV no_proxy xxxxxx RUN mkdir -p /usr/share/man/man1 RUN wget https://download.java.net/java/GA/jdk19.0.1/afdd2e245b014143b62ccb916125e3ce/10/GPL/openjdk-19.0.1_linux-x64_bin.tar.gz \ && tar xvf openjdk-19.0.1_linux-x64_bin.tar.gz \ && mv jdk-19.0.1 /opt/ ENV JAVA_HOME /opt/jdk-19.0.1 ENV PATH="${PATH}:$JAVA_HOME/bin" # Install requirements with pip conf for Jfrog COPY pip.conf pip.conf ENV PIP_CONFIG_FILE pip.conf # python installs (python 3.10 inside all azure ubuntu images) COPY requirements.txt . RUN pip install -r requirements.txt && rm requirements.txt # set command CMD ["bash"]
My image is created and starts to run my code, but fails on previous code sample. I am using the same values of https_proxy and no_poxy in my compute instance and compute cluster.
2024-01-22 13:30:13,520 - thrift_backend - Error during request to server: {"method": "OpenSession", "session-id": null, "query-id": null, "http-code": null, "error-message": "", "original-exception": "Retry request would exceed Retry policy max retry duration of 900.0 seconds", "no-retry-reason": "non-retryable error", "bounded-retry-delay": null, "attempt": "1/30", "elapsed-seconds": "846.7684090137482/900.0"} Traceback (most recent call last): File "/mnt/azureml/cr/j/67f1e8c93a8942d582fb7babc030101b/exe/wd/main.py", line 198, in <module> main() File "/mnt/azureml/cr/j/67f1e8c93a8942d582fb7babc030101b/exe/wd/main.py", line 31, in main return dbsql.connect( File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/__init__.py", line 51, in connect return Connection(server_hostname, http_path, access_token, **kwargs) File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/client.py", line 235, in __init__ self._open_session_resp = self.thrift_backend.open_session( File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 576, in open_session response = self.make_request(self._client.OpenSession, open_session_req) File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 505, in make_request self._handle_request_error(error_info, attempt, elapsed) File "/opt/miniconda/lib/python3.10/site-packages/databricks/sql/thrift_backend.py", line 335, in _handle_request_error raise network_request_error databricks.sql.exc.RequestError: Error during request to server
In both, I am using the lastest version of databricks-sql-connector (3.0.1)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2024 07:32 PM
Hi, Could you please try https://github.com/databricks/databricks-sql-python/issues/23 and let us know if this helps (adding a new token)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2024 11:57 PM
Hello,
I am already recreating a new token at each time I init my Spark session. I do this using the Azure's oauth2 service to get a token lasting 1 hour and then using databricks API 2.0 to generate a new PAT.
And this code is working in local and in compute instances in Azure, but not Compute Clusters.
What I also tried: To generate a token in UI, working in local, then using it in my code in my compute cluster, and not working with the above error.
Cloud it be a network issue? I'm creating both compute instance/cluster in terraform:
resource "azurerm_machine_learning_compute_cluster" "cluster" {
for_each = local.compute_cluster_configurations
name = each.key
location = var.context.location
vm_priority = each.value.vm_priority
vm_size = each.value.vm_size
machine_learning_workspace_id = module.mlw_01.id
subnet_resource_id = module.subnet_aml.id
# AML-05
ssh_public_access_enabled = false
node_public_ip_enabled = false
identity {
type = "UserAssigned"
identity_ids = [
azurerm_user_assigned_identity.compute_cluster_managed_identity.id
]
}
scale_settings {
min_node_count = each.value.min_node_count
max_node_count = each.value.max_node_count
scale_down_nodes_after_idle_duration = each.value.scale_down_nodes_after_idle_duration
}
}
# For each user, create a compute instance
resource "azurerm_machine_learning_compute_instance" "this" {
for_each = local.all_users
name = "${split("@", trimspace(local.all_users[each.key]["user_principal_name"]))[0]}-DS2-V2"
location = var.context.location
machine_learning_workspace_id = module.mlw_01.id
virtual_machine_size = "STANDARD_DS2_V2"
identity {
type = "UserAssigned"
identity_ids = [
azurerm_user_assigned_identity.this[each.key].id
]
}
assign_to_user {
object_id = each.key
tenant_id = var.tenant_id
}
node_public_ip_enabled = false
subnet_resource_id = module.subnet_aml.id
description = "Compute instance generated by Terraform for : ${local.all_users[each.key]["display_name"]} | ${local.all_users[each.key]["user_principal_name"]} | ${each.key} "
}
I'm using the same subnet, so it should react the same in network.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2024 12:39 AM
The issue was that the new version of databricks-sql-connector (3.0.1) does not handle well error messages. So It gave a generic error and a timeout where it should have given me 403 and instant error message without a 900 second timeout.
https://github.com/databricks/databricks-sql-python/issues/333
I've commented on a github issue for more debugging.
But I'm still wondering why I got 403 error from my compute cluster and not my compute instance where they have the same roles. So I had to add a role on the group handling both Service Principal in databricks to user SQL warehouse. Which is odd.

