cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to create connection between Databricks & BigQuery

519776
New Contributor III

Hi,

I would like to connect our BigQuery env to Databricks, So I created a service account but where should I configure the service account in Databricks? I read databricks documention and it`s not clear at all.

Thanks for your help

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

without the pointy brackets. they are placeholders for values.

so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.

View solution in original post

15 REPLIES 15

-werners-
Esteemed Contributor III

https://docs.databricks.com/external-data/bigquery.html

Can you elaborate what is not clear?

519776
New Contributor III

yeah, part number 2 - setup Databricks, there is the below code

credentials <base64-keys>

spark.hadoop.google.cloud.auth.service.account.enable true

spark.hadoop.fs.gs.auth.service.account.email <client_email>

spark.hadoop.fs.gs.project.id <project_id>

spark.hadoop.fs.gs.auth.service.account.private.key <private_key>

spark.hadoop.fs.gs.auth.service.account.private.key.id <private_key_id>

what should it replace instead of <base64-keys> ? the google service account key (json) ? if yes what part of it ?

-werners-
Esteemed Contributor III

the base64-keys is generated from the json key file:

To configure a cluster to access BigQuery tables, you must provide your JSON key file as a Spark configuration. Use a local tool to Base64-encode your JSON key file. For security purposes do not use a web-based or remote tool that could access your keys.

The JSON key file is created right above the following section:

https://docs.databricks.com/external-data/bigquery.html#create-a-google-cloud-storage-gcs-bucket-for...

519776
New Contributor III

So basically it should look like this :

credentials <adfasdfsadfadsfsdafsd>

spark.hadoop.google.cloud.auth.service.account.enable true

spark.hadoop.fs.gs.auth.service.account.email <user@service.com>

spark.hadoop.fs.gs.project.id <project-dd>

spark.hadoop.fs.gs.auth.service.account.private.key <fdsfsdfsdgfd>

spark.hadoop.fs.gs.auth.service.account.private.key.id <gsdfgsdgdsg>

? Do I need to add "" ?

-werners-
Esteemed Contributor III

without the pointy brackets. they are placeholders for values.

so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.

519776
New Contributor III

Thanks werners.

it now working, when I'm runnning the below script:

df = spark.read.format("bigquery").option("table","sandbox.test").load()

im getting the below error:

519776
New Contributor III

image.pngimage.png

-werners-
Esteemed Contributor III

are you sure the path to the table is correct?

the example is a bit different:

"bigquery-public-data.samples.shakespeare"

<catalog>.<db>.<table>

519776
New Contributor III

I also changed the path to "test_proj.sandbox.test".

the error is :

A project ID is required for this service but could not be determined from the builder or the environment. Please set a project ID using the builder.

-werners-
Esteemed Contributor III

I guess something still has to be configured on BigQuery.

can you check this thread?

https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/40

519776
New Contributor III

Works 🙂

Thanks werners, many thanks .

308655
New Contributor II

Thank you. For me, setting parent project ID solved it. This is also in the documentation

spark.read.format("bigquery") \

.option("table", table) \

.option("project", <project-id>) \

.option("parentProject", <parent-project-id>) \

.load()

I didn't have to set the various spark.hadoop.fs.gs config variables for the cluster, as it seemed content with the base64 credentials.

mcwir
Contributor

519776
New Contributor III

I familar with this doc, it is not clear (please find my previous comment)

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!