cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to create connection between Databricks & BigQuery

519776
New Contributor III

Hi,

I would like to connect our BigQuery env to Databricks, So I created a service account but where should I configure the service account in Databricks? I read databricks documention and it`s not clear at all.

Thanks for your help

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

without the pointy brackets. they are placeholders for values.

so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.

View solution in original post

15 REPLIES 15

-werners-
Esteemed Contributor III

https://docs.databricks.com/external-data/bigquery.html

Can you elaborate what is not clear?

519776
New Contributor III

yeah, part number 2 - setup Databricks, there is the below code

credentials <base64-keys>

spark.hadoop.google.cloud.auth.service.account.enable true

spark.hadoop.fs.gs.auth.service.account.email <client_email>

spark.hadoop.fs.gs.project.id <project_id>

spark.hadoop.fs.gs.auth.service.account.private.key <private_key>

spark.hadoop.fs.gs.auth.service.account.private.key.id <private_key_id>

what should it replace instead of <base64-keys> ? the google service account key (json) ? if yes what part of it ?

-werners-
Esteemed Contributor III

the base64-keys is generated from the json key file:

To configure a cluster to access BigQuery tables, you must provide your JSON key file as a Spark configuration. Use a local tool to Base64-encode your JSON key file. For security purposes do not use a web-based or remote tool that could access your keys.

The JSON key file is created right above the following section:

https://docs.databricks.com/external-data/bigquery.html#create-a-google-cloud-storage-gcs-bucket-for...

519776
New Contributor III

So basically it should look like this :

credentials <adfasdfsadfadsfsdafsd>

spark.hadoop.google.cloud.auth.service.account.enable true

spark.hadoop.fs.gs.auth.service.account.email <user@service.com>

spark.hadoop.fs.gs.project.id <project-dd>

spark.hadoop.fs.gs.auth.service.account.private.key <fdsfsdfsdgfd>

spark.hadoop.fs.gs.auth.service.account.private.key.id <gsdfgsdgdsg>

? Do I need to add "" ?

-werners-
Esteemed Contributor III

without the pointy brackets. they are placeholders for values.

so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.

519776
New Contributor III

Thanks werners.

it now working, when I'm runnning the below script:

df = spark.read.format("bigquery").option("table","sandbox.test").load()

im getting the below error:

519776
New Contributor III

image.pngimage.png

-werners-
Esteemed Contributor III

are you sure the path to the table is correct?

the example is a bit different:

"bigquery-public-data.samples.shakespeare"

<catalog>.<db>.<table>

519776
New Contributor III

I also changed the path to "test_proj.sandbox.test".

the error is :

A project ID is required for this service but could not be determined from the builder or the environment. Please set a project ID using the builder.

-werners-
Esteemed Contributor III

I guess something still has to be configured on BigQuery.

can you check this thread?

https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/40

519776
New Contributor III

Works 🙂

Thanks werners, many thanks .

308655
New Contributor II

Thank you. For me, setting parent project ID solved it. This is also in the documentation

spark.read.format("bigquery") \

.option("table", table) \

.option("project", <project-id>) \

.option("parentProject", <parent-project-id>) \

.load()

I didn't have to set the various spark.hadoop.fs.gs config variables for the cluster, as it seemed content with the base64 credentials.

519776
New Contributor III

I familar with this doc, it is not clear (please find my previous comment)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.