cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks SQL connectivity in Python with Service Principals

harripy
New Contributor III

Tried to use M2M OAuth connectivity on Databricks SQL Warehouse in Python:

from databricks.sdk.core import Config, oauth_service_principal
from databricks import sql

....
config = Config(host=f"https://{host}",
                                client_id=db_user,
                                client_secret=passwd)
oauth_sp = oauth_service_principal(config)
return sql.connect(server_hostname=host,
                   http_path=http_path,
                   catalog=catalog,
                   schema=schema,
                   credentials_provider=oauth_sp)

  The params mentioned are properly assigned, but this fails to an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/venv/lib/python3.11/site-packages/databricks/sql/__init__.py", line 50, in connect
    return Connection(server_hostname, http_path, access_token, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/venv/lib/python3.11/site-packages/databricks/sql/client.py", line 189, in __init__
    self._session_handle = self.thrift_backend.open_session(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/venv/lib/python3.11/site-packages/databricks/sql/thrift_backend.py", line 464, in open_session
    response = self.make_request(self._client.OpenSession, open_session_req)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/venv/lib/python3.11/site-packages/databricks/sql/thrift_backend.py", line 393, in make_request
    self._handle_request_error(error_info, attempt, elapsed)
  File "/path/venv/lib/python3.11/site-packages/databricks/sql/thrift_backend.py", line 261, in _handle_request_error
    raise network_request_error
databricks.sql.exc.RequestError: Error during request to server

The credentials provider seems to be set correctly, yet the connection is tried to be established with the access_token - what am I missing?

7 REPLIES 7

Yeshwanth
Valued Contributor II
Valued Contributor II

@harripy , could you please brief us about your use case? So that we can understand the issue better.

harripy
New Contributor III

We have configured a service principal for Databricks SQL Warehouse and we want to do DML calls on the database through Python connection (mostly INSERT-SELECT-clauses). I found documentation stating that it SHOULD be possible to get the connection to the SQL Warehouse just by providing the m2m-auth configuration (having the databricks sp client id and databricks sp client secret) as the credentials_provider. During execution the error message though indicates that the connection is tried to be opened with access token instead of the m2m authentication. I might be missing out some crucial configuration somewhere, but it would be nice to get this working.

Current working solution is for us:

  1. connect to workspace with sp client/secret
  2. request token from workspace
  3. connect using the given token
  4. on connection close delete the token

daniel_sahal
Esteemed Contributor

@harripy 
I've just looked through the Databricks SDK code and it looks like m2m auth is generating oauth bearer token to authenticate, that's why you see "access_token", but it's different from PAT (https://github.com/databricks/databricks-sdk-py/blob/68feadf82f12d387621704f2957d743bfc86f4bd/databr...).

So it looks like it's able to authenticate with Databricks through M2M and get access token. 
It's hard to guess what's the issue with the logs that you've provided.

What i suggest is to enable debug logging and paste the logs.
https://github.com/databricks/databricks-sdk-py/tree/68feadf82f12d387621704f2957d743bfc86f4bd?tab=re... 

harripy
New Contributor III

alright, made following test call:

dbConfig = {'targetDvUser':db_user,
            'targetDvPassword':password, 
            'targetDvHost':'dbc-098946cd-8b3a.cloud.databricks.com',
            'targetDvCatalog':'ade_test',
            'targetHttpPath':'/sql/1.0/warehouses/7b15eacc3fef2e55',
            'targetSchema':'manager_test_rdv'}

import logging,sys
from databricks.sdk.core import Config, oauth_service_principal
from databricks import sql
from databricks.sql.exc import Error, OperationalError

logging.basicConfig(stream=sys.stderr,
                    level=logging.INFO,
                    format='%(asctime)s [%(name)s][%(levelname)s] %(message)s')
logging.getLogger('databricks.sdk').setLevel(logging.DEBUG)

config = Config(host=f"https://{dbConfig['targetDvHost']}",
                                client_id=dbConfig['targetDvUser'],
                                client_secret=dbConfig['targetDvPassword'])
oauth_sp = oauth_service_principal(config)
conn = sql.connect(server_hostname=dbConfig['targetDvHost'],
                   http_path=dbConfig['targetHttpPath'],
                   catalog=dbConfig['targetDvCatalog'],
                   schema=dbConfig['targetSchema'],
                   credentials_provider=oauth_sp)

And the response was as follows:

>>> dbConfig = {'targetDvUser':db_user,
...             'targetDvPassword':password,
...             'targetDvHost':'dbc-098946cd-8b3a.cloud.databricks.com',
...             'targetDvCatalog':'ade_test',
...             'targetHttpPath':'/sql/1.0/warehouses/7b15eacc3fef2e55',
...             'targetSchema':'manager_test_rdv'}
>>> import logging,sys
>>> from databricks.sdk.core import Config, oauth_service_principal
>>> from databricks import sql
>>> from databricks.sql.exc import Error, OperationalError
>>> logging.basicConfig(stream=sys.stderr,
...                     level=logging.INFO,
...                     format='%(asctime)s [%(name)s][%(levelname)s] %(message)s')
>>> logging.getLogger('databricks.sdk').setLevel(logging.DEBUG)
>>> config = Config(host=f"https://{dbConfig['targetDvHost']}",
...                                 client_id=dbConfig['targetDvUser'],
...                                 client_secret=dbConfig['targetDvPassword'])
2024-04-26 12:37:43,611 [databricks.sdk][DEBUG] Attempting to configure auth: pat
2024-04-26 12:37:43,611 [databricks.sdk][DEBUG] Attempting to configure auth: basic
2024-04-26 12:37:43,611 [databricks.sdk][DEBUG] Attempting to configure auth: metadata-service
2024-04-26 12:37:43,611 [databricks.sdk][DEBUG] Attempting to configure auth: oauth-m2m
>>> oauth_sp = oauth_service_principal(config)
>>> conn = sql.connect(server_hostname=dbConfig['targetDvHost'],
...                    http_path=dbConfig['targetHttpPath'],
...                    catalog=dbConfig['targetDvCatalog'],
...                    schema=dbConfig['targetSchema'],
...                    credentials_provider=oauth_sp)
2024-04-26 12:38:22,632 [databricks.sdk.oauth][DEBUG] Retrieving token for f15f66cb-d173-432b-b530-896d2834d4c7
2024-04-26 12:38:22,967 [databricks.sql.thrift_backend][INFO] Error during request to server: {"method": "OpenSession", "session-id": null, "query-id": null, "http-code": null, "error-message": "", "original-exception": "'dict' object is not callable", "no-retry-reason": "non-retryable error", "bounded-retry-delay": null, "attempt": "1/30", "elapsed-seconds": "0.00017309188842773438/900.0"}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/harri.pylkkanen/workspace/dv-executor/venv/lib/python3.11/site-packages/databricks/sql/__init__.py", line 50, in connect
    return Connection(server_hostname, http_path, access_token, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/harri.pylkkanen/workspace/dv-executor/venv/lib/python3.11/site-packages/databricks/sql/client.py", line 189, in __init__
    self._session_handle = self.thrift_backend.open_session(
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/harri.pylkkanen/workspace/dv-executor/venv/lib/python3.11/site-packages/databricks/sql/thrift_backend.py", line 464, in open_session
    response = self.make_request(self._client.OpenSession, open_session_req)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/harri.pylkkanen/workspace/dv-executor/venv/lib/python3.11/site-packages/databricks/sql/thrift_backend.py", line 393, in make_request
    self._handle_request_error(error_info, attempt, elapsed)
  File "/Users/harri.pylkkanen/workspace/dv-executor/venv/lib/python3.11/site-packages/databricks/sql/thrift_backend.py", line 261, in _handle_request_error
    raise network_request_error
databricks.sql.exc.RequestError: Error during request to server

So indeed the oauth bearer token is generated (or being generated), but then the open session crashes - is this a service principal permissions related thing or some configuration missed somewhere?

daniel_sahal
Esteemed Contributor

@harripy 
Looking at the SDK logs - i've got no idea what could happen here. I think that SQL logs could be handy. Could you set databricks.sql logging level to DEBUG as well?

harripy
New Contributor III

These log lines will be added with databricks.sql DEBUG:

2024-04-29 08:47:10,276 [databricks.sql.thrift_backend][DEBUG] retry parameter: _retry_delay_min given_or_default 1.0
2024-04-29 08:47:10,277 [databricks.sql.thrift_backend][DEBUG] retry parameter: _retry_delay_max given_or_default 60.0
2024-04-29 08:47:10,277 [databricks.sql.thrift_backend][DEBUG] retry parameter: _retry_stop_after_attempts_count given_or_default 30
2024-04-29 08:47:10,277 [databricks.sql.thrift_backend][DEBUG] retry parameter: _retry_stop_after_attempts_duration given_or_default 900.0
2024-04-29 08:47:10,277 [databricks.sql.thrift_backend][DEBUG] retry parameter: _retry_delay_default given_or_default 5.0
2024-04-29 08:47:10,309 [databricks.sql.thrift_backend][DEBUG] Sending request: TOpenSessionReq(client_protocol=None, username=None, password=None, configuration={'spark.thriftserver.arrowBasedRowSet.timestampAsString': 'false'}, getInfos=None, client_protocol_i64=42247, connectionProperties=None, initialNamespace=TNamespace(catalogName='ade_test', schemaName='manager_test_rdv'), canUseMultipleCatalogs=True, sessionId=None)
2024-04-29 08:47:10,310 [databricks.sql.thrift_backend][INFO] Error during request to server: {"method": "OpenSession", "session-id": null, "query-id": null, "http-code": null, "error-message": "", "original-exception": "'dict' object is not callable", "no-retry-reason": "non-retryable error", "bounded-retry-delay": null, "attempt": "1/30", "elapsed-seconds": "0.00023293495178222656/900.0"}
Traceback (most recent call last):

kavyakavuri
New Contributor

I am facing the same issue with the same error logs as @harripy. Can you please help @Yeshwanth @Dani ?