cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can not read data from GCS

shihs
New Contributor

I am trying to use Databricks to read data on Google Cloud Storage (GCS) with Databricks on Google Cloud. I followed the steps from https://docs.gcp.databricks.com/storage/gcs.html.

I have tried Access GCS buckets using Google Cloud service accounts on clusters, but I still couldn't read the file on GCS with the code below

 

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("test").getOrCreate()
df = spark.read.format("csv").load("gs://mybucket/test.csv")

The error message I got
```
"xxx@xxx.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
```

I also tried Access a GCS bucket directly with a Google Cloud service account key. I stucked in Step 4 & 5. Since step 5 uses `{{secrets/scope/gsa_private_key}}` and `{{secrets/scope/gsa_private_key_id}}` to get thegsa_private_key and gsa_private_key_id. I am not quite sure where should I do the step 4? I think it doesn't make to do it on local computer, however, it is also weird to do it on the cluster terminal. 

Please help me solve this problem. Thanks in advance! 

 

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group