Databricks Community

lugger1 · ‎04-19-2023

Hello, I have an Databricks account on Azure, and the goal is to compare different image tagging services from Azure, GCP, AWS via corresponding API calls, with Python notebook. I have problems with GCP vision API calls, specifically with credentials: as far as I understand, the one necessary step is to set 'GOOGLE_APPLICATION_CREDENTIALS' environment variable in my databricks notebook with something like

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='/folder1/credentials.json'

where '/folder1/credentials.json' is the place my notebook looks for json file with credentials (notebook is in the same folder, /folder1/notebook_api_test).

I am getting this path by looking into Workspace-> Copy file path in the Databricks web page. But this approach doesn't work, when cell is executed, I am getting this error:

DefaultCredentialsError: File /folder1/credentials.json was not found.

What is the right way to deal with credentials to access google vision API from Azure Databricks notebook?

lugger1 · ‎04-20-2023

Ok, here is a trick: in my case, the file with GCP credentials is stored in notebook workspace storage, which is not visible to os.environ() command.

So solution is to read a content of this file, and save it to the cluster storage attached to the notebook, which is created with the cluster and is erased when cluster is gone (so we need to repeat this procedure every time the cluster is re-created). According to this link, we can read the content of the credentials json file stored in notebook workspace with

with open('/Workspace/folder1/cred.json'): #note that I need a full path here, for some reason
content = f.read()

and then according to his doc,, we need to save it on another place in a new file (with the same name in my case, cred.json), namely on cluster storage attached to the notebook (which is visible to os-related functions, like os.environ()), with

fd = os.open("cred.json", os.O_RDWR|os.O_CREAT)
ret = os.write(fd,content.encode())  
#need to add .encode(), or will get TypeError: a bytes-like object is required, not 'str'
os.close(fd)

Only after that we can continue with setting an environment variable, required for GCP authentication:

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='./cred.json'

and then API calls should work fine, without DefaultCredentialsError.

View solution in original post

lugger1 · ‎04-20-2023

Ok, here is a trick: in my case, the file with GCP credentials is stored in notebook workspace storage, which is not visible to os.environ() command.

So solution is to read a content of this file, and save it to the cluster storage attached to the notebook, which is created with the cluster and is erased when cluster is gone (so we need to repeat this procedure every time the cluster is re-created). According to this link, we can read the content of the credentials json file stored in notebook workspace with

with open('/Workspace/folder1/cred.json'): #note that I need a full path here, for some reason
content = f.read()

and then according to his doc,, we need to save it on another place in a new file (with the same name in my case, cred.json), namely on cluster storage attached to the notebook (which is visible to os-related functions, like os.environ()), with

fd = os.open("cred.json", os.O_RDWR|os.O_CREAT)
ret = os.write(fd,content.encode())  
#need to add .encode(), or will get TypeError: a bytes-like object is required, not 'str'
os.close(fd)

Only after that we can continue with setting an environment variable, required for GCP authentication:

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='./cred.json'

and then API calls should work fine, without DefaultCredentialsError.

Databricks Community

What is the best way to use credentials for API calls from databricks notebook?

Photos

Connect with Databricks Users in Your Area

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!