- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 01:42 AM
Hi,
I would like to connect our BigQuery env to Databricks, So I created a service account but where should I configure the service account in Databricks? I read databricks documention and it`s not clear at all.
Thanks for your help
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 03:46 AM
without the pointy brackets. they are placeholders for values.
so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 03:17 AM
https://docs.databricks.com/external-data/bigquery.html
Can you elaborate what is not clear?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 03:21 AM
yeah, part number 2 - setup Databricks, there is the below code
credentials <base64-keys>
spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.fs.gs.auth.service.account.email <client_email>
spark.hadoop.fs.gs.project.id <project_id>
spark.hadoop.fs.gs.auth.service.account.private.key <private_key>
spark.hadoop.fs.gs.auth.service.account.private.key.id <private_key_id>
what should it replace instead of <base64-keys> ? the google service account key (json) ? if yes what part of it ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 03:24 AM
the base64-keys is generated from the json key file:
To configure a cluster to access BigQuery tables, you must provide your JSON key file as a Spark configuration. Use a local tool to Base64-encode your JSON key file. For security purposes do not use a web-based or remote tool that could access your keys.
The JSON key file is created right above the following section:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 03:35 AM
So basically it should look like this :
credentials <adfasdfsadfadsfsdafsd>
spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.fs.gs.auth.service.account.email <user@service.com>
spark.hadoop.fs.gs.project.id <project-dd>
spark.hadoop.fs.gs.auth.service.account.private.key <fdsfsdfsdgfd>
spark.hadoop.fs.gs.auth.service.account.private.key.id <gsdfgsdgdsg>
? Do I need to add "" ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 03:46 AM
without the pointy brackets. they are placeholders for values.
so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 04:17 AM
Thanks werners.
it now working, when I'm runnning the below script:
df = spark.read.format("bigquery").option("table","sandbox.test").load()
im getting the below error:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 04:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 04:25 AM
are you sure the path to the table is correct?
the example is a bit different:
"bigquery-public-data.samples.shakespeare"
<catalog>.<db>.<table>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 04:33 AM
I also changed the path to "test_proj.sandbox.test".
the error is :
A project ID is required for this service but could not be determined from the builder or the environment. Please set a project ID using the builder.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 04:38 AM
I guess something still has to be configured on BigQuery.
can you check this thread?
https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/40
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 04:43 AM
Works 🙂
Thanks werners, many thanks .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-02-2023 07:53 PM
Thank you. For me, setting parent project ID solved it. This is also in the documentation
spark.read.format("bigquery") \
.option("table", table) \
.option("project", <project-id>) \
.option("parentProject", <parent-project-id>) \
.load()
I didn't have to set the various spark.hadoop.fs.gs config variables for the cluster, as it seemed content with the base64 credentials.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 03:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-01-2022 03:23 AM
I familar with this doc, it is not clear (please find my previous comment)

