SDK Workspace client HTTP Connection Pool
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2023 08:16 AM
Hello.
Do you know how to solve issue with the HTTPSConnectionPool when we are using SDK WorkspaceClient in notebook via workflow?
I would like to trigger job when some conditions are met. These condition are done using Python. I am using SDK to trigger the job (run_now() + WorkspaceClient()).; When I am running notebook manually using 'run all' button, everything works fine. Host and port is recognised correctly. I can easily use run_now() function to trigger another job.
However, when I added notebook (with the whole logic) to the existing workflow as another task (that is dependent on different one), HTTPSConnectionPool has no info about host. Below error appeared
HTTPSConnectionPool(host='none', port=443): Max retries exceeded with url: /api/2.1/jobs/list?name=XXX (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known'))
Do you know how we can use notebook with SDK in a workflow? Should I set up some additional env variables or credentials?
Based on the Databricks documentation, by default, the Databricks SDK for Python uses default Databricks notebook authentication. And there is no special requirements needed. In my case it doesn't work.
Any suggestion?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2023 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-19-2023 12:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-27-2023 03:14 AM
@Hi @KNYSJOA
I got the same error when i tried to use databricks-connect, in a local juypter notebook.
Putting log in DEBUG might help you :
logging.basicConfig(level=logging.DEBUG)
Then i discovered the env variable DATABRICKS_AUTH_TYPE was set to "metadata-service", which made the connection failed. I changed it to "pat" (as it is by default in databricks SDK : https://databricks-sdk-py.readthedocs.io/en/latest/authentication.html) and it worked. Of course you have to set up a profile in .databrickscfg, or give them in your code.
Hope it helps !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-28-2023 11:18 AM - edited 11-28-2023 11:20 AM
It seems like the issue you're facing with the HTTPSConnectionPool in the SDK WorkspaceClient when using it within a workflow may be related to the environment variables or credentials not being propagated correctly. When running the notebook manually, it recognizes the host and port, but within the workflow, there's an error with the connection pool. Check if the necessary environment variables and credentials are set properly within the workflow environment. Additionally, ensure that the notebook task in the workflow inherits the correct context or configurations from the preceding task. It might be helpful to explicitly set the host and port in your notebook code or review any specific requirements for SDK usage in the workflow. If the default Databricks notebook authentication is not working as expected, you might need to explore alternative authentication methods or contact Databricks support for further assistance.
Maybe this will be useful for you too!