cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SDK Workspace client HTTP Connection Pool

KNYSJOA
New Contributor

Hello.

Do you know how to solve issue with the HTTPSConnectionPool when we are using SDK WorkspaceClient in notebook via workflow?

I would like to trigger job when some conditions are met. These condition are done using Python. I am using SDK to trigger the job (run_now() + WorkspaceClient()).; When I am running notebook manually using 'run all' button, everything works fine. Host and port is recognised correctly. I can easily use run_now() function to trigger another job.

However, when I added notebook (with the whole logic) to the existing workflow as another task (that is dependent on different one), HTTPSConnectionPool has no info about host. Below error appeared

HTTPSConnectionPool(host='none', port=443): Max retries exceeded with url: /api/2.1/jobs/list?name=XXX (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known'))

Do you know how we can use notebook with SDK in a workflow? Should I set up some additional env variables or credentials?

Based on the Databricks documentation, by default, the Databricks SDK for Python uses default Databricks notebook authentication. And there is no special requirements needed. In my case it doesn't work.

Any suggestion?

 

4 REPLIES 4

Atanu
Databricks Employee
Databricks Employee

Anonymous
Not applicable

Hi @KNYSJOA 

Does @Atanu  answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you.

We'd love to hear from you.

Thanks!

gouzmi
New Contributor II

@Hi @KNYSJOA 
I got the same error when i tried to use databricks-connect, in a local juypter notebook.
Putting log in DEBUG might help you :

logging.basicConfig(level=logging.DEBUG)

Then i discovered the env variable DATABRICKS_AUTH_TYPE was set to "metadata-service", which made the connection failed. I changed it to "pat" (as it is by default in databricks SDK : https://databricks-sdk-py.readthedocs.io/en/latest/authentication.html) and it worked. Of course you have to set up a profile in .databrickscfg, or give them in your code.

Hope it helps !

Dribka
New Contributor III

It seems like the issue you're facing with the HTTPSConnectionPool in the SDK WorkspaceClient when using it within a workflow may be related to the environment variables or credentials not being propagated correctly. When running the notebook manually, it recognizes the host and port, but within the workflow, there's an error with the connection pool. Check if the necessary environment variables and credentials are set properly within the workflow environment. Additionally, ensure that the notebook task in the workflow inherits the correct context or configurations from the preceding task. It might be helpful to explicitly set the host and port in your notebook code or review any specific requirements for SDK usage in the workflow. If the default Databricks notebook authentication is not working as expected, you might need to explore alternative authentication methods or contact Databricks support for further assistance.

Maybe this will be useful for you too!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group