12-01-2022 01:42 AM
Hi,
I would like to connect our BigQuery env to Databricks, So I created a service account but where should I configure the service account in Databricks? I read databricks documention and it`s not clear at all.
Thanks for your help
12-01-2022 03:46 AM
without the pointy brackets. they are placeholders for values.
so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.
12-01-2022 03:17 AM
https://docs.databricks.com/external-data/bigquery.html
Can you elaborate what is not clear?
12-01-2022 03:21 AM
yeah, part number 2 - setup Databricks, there is the below code
credentials <base64-keys>
spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.fs.gs.auth.service.account.email <client_email>
spark.hadoop.fs.gs.project.id <project_id>
spark.hadoop.fs.gs.auth.service.account.private.key <private_key>
spark.hadoop.fs.gs.auth.service.account.private.key.id <private_key_id>
what should it replace instead of <base64-keys> ? the google service account key (json) ? if yes what part of it ?
12-01-2022 03:24 AM
the base64-keys is generated from the json key file:
To configure a cluster to access BigQuery tables, you must provide your JSON key file as a Spark configuration. Use a local tool to Base64-encode your JSON key file. For security purposes do not use a web-based or remote tool that could access your keys.
The JSON key file is created right above the following section:
12-01-2022 03:35 AM
So basically it should look like this :
credentials <adfasdfsadfadsfsdafsd>
spark.hadoop.google.cloud.auth.service.account.enable true
spark.hadoop.fs.gs.auth.service.account.email <user@service.com>
spark.hadoop.fs.gs.project.id <project-dd>
spark.hadoop.fs.gs.auth.service.account.private.key <fdsfsdfsdgfd>
spark.hadoop.fs.gs.auth.service.account.private.key.id <gsdfgsdgdsg>
? Do I need to add "" ?
12-01-2022 03:46 AM
without the pointy brackets. they are placeholders for values.
so unless you want to enter a variable which you already declared (like credentials in your example), put the double quotes.
12-01-2022 04:17 AM
Thanks werners.
it now working, when I'm runnning the below script:
df = spark.read.format("bigquery").option("table","sandbox.test").load()
im getting the below error:
12-01-2022 04:18 AM
12-01-2022 04:25 AM
are you sure the path to the table is correct?
the example is a bit different:
"bigquery-public-data.samples.shakespeare"
<catalog>.<db>.<table>
12-01-2022 04:33 AM
I also changed the path to "test_proj.sandbox.test".
the error is :
A project ID is required for this service but could not be determined from the builder or the environment. Please set a project ID using the builder.
12-01-2022 04:38 AM
I guess something still has to be configured on BigQuery.
can you check this thread?
https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/40
12-01-2022 04:43 AM
Works 🙂
Thanks werners, many thanks .
02-02-2023 07:53 PM
Thank you. For me, setting parent project ID solved it. This is also in the documentation
spark.read.format("bigquery") \
.option("table", table) \
.option("project", <project-id>) \
.option("parentProject", <parent-project-id>) \
.load()
I didn't have to set the various spark.hadoop.fs.gs config variables for the cluster, as it seemed content with the base64 credentials.
12-01-2022 03:21 AM
12-01-2022 03:23 AM
I familar with this doc, it is not clear (please find my previous comment)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group