cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the best way to use credentials for API calls from databricks notebook?

lugger1
New Contributor III

Hello, I have an Databricks account on Azure, and the goal is to compare different image tagging services from Azure, GCP, AWS via corresponding API calls, with Python notebook. I have problems with GCP vision API calls, specifically with credentials: as far as I understand, the one necessary step is to set 'GOOGLE_APPLICATION_CREDENTIALS' environment variable in my databricks notebook with something like

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='/folder1/credentials.json'

where '/folder1/credentials.json' is the place my notebook looks for json file with credentials (notebook is in the same folder, /folder1/notebook_api_test).

I am getting this path by looking into Workspace-> Copy file path in the Databricks web page. But this approach doesn't work, when cell is executed, I am getting this error:

DefaultCredentialsError: File /folder1/credentials.json was not found.

What is the right way to deal with credentials to access google vision API from Azure Databricks notebook?

1 ACCEPTED SOLUTION

Accepted Solutions

lugger1
New Contributor III

Ok, here is a trick: in my case, the file with GCP credentials is stored in notebook workspace storage, which is not visible to os.environ() command.

So solution is to read a content of this file, and save it to the cluster storage attached to the notebook, which is created with the cluster and is erased when cluster is gone (so we need to repeat this procedure every time the cluster is re-created). According to this link, we can read the content of the credentials json file stored in notebook workspace with

with open('/Workspace/folder1/cred.json'): #note that I need a full path here, for some reason
content = f.read()

and then according to his doc,, we need to save it on another place in a new file (with the same name in my case, cred.json), namely on cluster storage attached to the notebook (which is visible to os-related functions, like os.environ()), with

fd = os.open("cred.json", os.O_RDWR|os.O_CREAT)
ret = os.write(fd,content.encode())  
#need to add .encode(), or will get TypeError: a bytes-like object is required, not 'str'
os.close(fd)

Only after that we can continue with setting an environment variable, required for GCP authentication:

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='./cred.json'

and then API calls should work fine, without DefaultCredentialsError.

View solution in original post

1 REPLY 1

lugger1
New Contributor III

Ok, here is a trick: in my case, the file with GCP credentials is stored in notebook workspace storage, which is not visible to os.environ() command.

So solution is to read a content of this file, and save it to the cluster storage attached to the notebook, which is created with the cluster and is erased when cluster is gone (so we need to repeat this procedure every time the cluster is re-created). According to this link, we can read the content of the credentials json file stored in notebook workspace with

with open('/Workspace/folder1/cred.json'): #note that I need a full path here, for some reason
content = f.read()

and then according to his doc,, we need to save it on another place in a new file (with the same name in my case, cred.json), namely on cluster storage attached to the notebook (which is visible to os-related functions, like os.environ()), with

fd = os.open("cred.json", os.O_RDWR|os.O_CREAT)
ret = os.write(fd,content.encode())  
#need to add .encode(), or will get TypeError: a bytes-like object is required, not 'str'
os.close(fd)

Only after that we can continue with setting an environment variable, required for GCP authentication:

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='./cred.json'

and then API calls should work fine, without DefaultCredentialsError.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group