Databricks Community

519776 · ‎12-01-2022

Hi,

I would like to connect our BigQuery env to Databricks, So I created a service account but where should I configure the service account in Databricks? I read databricks documention and it`s not clear at all.

Thanks for your help

-werners- · ‎12-01-2022

without the pointy brackets. they are placeholders for values.

so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.

View solution in original post

-werners- · ‎12-01-2022

https://docs.databricks.com/external-data/bigquery.html

Can you elaborate what is not clear?

519776 · ‎12-01-2022

yeah, part number 2 - setup Databricks, there is the below code

credentials <base64-keys>

spark.hadoop.google.cloud.auth.service.account.enable true

spark.hadoop.fs.gs.auth.service.account.email <client_email>

spark.hadoop.fs.gs.project.id <project_id>

spark.hadoop.fs.gs.auth.service.account.private.key <private_key>

spark.hadoop.fs.gs.auth.service.account.private.key.id <private_key_id>

what should it replace instead of <base64-keys> ? the google service account key (json) ? if yes what part of it ?

-werners- · ‎12-01-2022

the base64-keys is generated from the json key file:

To configure a cluster to access BigQuery tables, you must provide your JSON key file as a Spark configuration. Use a local tool to Base64-encode your JSON key file. For security purposes do not use a web-based or remote tool that could access your keys.

The JSON key file is created right above the following section:

https://docs.databricks.com/external-data/bigquery.html#create-a-google-cloud-storage-gcs-bucket-for...

519776 · ‎12-01-2022

So basically it should look like this :

credentials <adfasdfsadfadsfsdafsd>

spark.hadoop.google.cloud.auth.service.account.enable true

spark.hadoop.fs.gs.auth.service.account.email <user@service.com>

spark.hadoop.fs.gs.project.id <project-dd>

spark.hadoop.fs.gs.auth.service.account.private.key <fdsfsdfsdgfd>

spark.hadoop.fs.gs.auth.service.account.private.key.id <gsdfgsdgdsg>

? Do I need to add "" ?

-werners- · ‎12-01-2022

without the pointy brackets. they are placeholders for values.

so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.

519776 · ‎12-01-2022

Thanks werners.

it now working, when I'm runnning the below script:

df = spark.read.format("bigquery").option("table","sandbox.test").load()

im getting the below error:

519776 · ‎12-01-2022

-werners- · ‎12-01-2022

are you sure the path to the table is correct?

the example is a bit different:

"bigquery-public-data.samples.shakespeare"

519776 · ‎12-01-2022

I also changed the path to "test_proj.sandbox.test".

the error is :

A project ID is required for this service but could not be determined from the builder or the environment. Please set a project ID using the builder.

-werners- · ‎12-01-2022

I guess something still has to be configured on BigQuery.

can you check this thread?

https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/40

519776 · ‎12-01-2022

Works 🙂

Thanks werners, many thanks .

308655 · ‎02-02-2023

Thank you. For me, setting parent project ID solved it. This is also in the documentation

spark.read.format("bigquery") \

.option("table", table) \

.option("project", <project-id>) \

.option("parentProject", <parent-project-id>) \

.load()

I didn't have to set the various spark.hadoop.fs.gs config variables for the cluster, as it seemed content with the base64 credentials.

mcwir · ‎12-01-2022

Here are the docs : https://docs.databricks.com/external-data/bigquery.html?_ga=2.254305484.510683761.1669885489-4634740...

519776 · ‎12-01-2022

I familar with this doc, it is not clear (please find my previous comment)

Databricks Community

How to create connection between Databricks & BigQuery

Connect with Databricks Users in Your Area

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Introducing SAP Databricks

Serverless Compute for Notebooks, Workflows and Pipelines is now Generally Available on Google Cloud

Welcoming BladeBridge to Databricks: Accelerating Data Warehouse Migrations to Lakehouse

Databricks Clean Rooms: Now Generally Available on AWS and Azure