<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Can not read data from GCS in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/can-not-read-data-from-gcs/m-p/46394#M5911</link>
    <description>&lt;P&gt;I am trying to&amp;nbsp;use Databricks to read data on Google Cloud Storage (GCS) with&amp;nbsp;&lt;A href="https://cloud.google.com/databricks" target="_self"&gt;&lt;SPAN&gt;Databricks on Google Cloud&lt;/SPAN&gt;&lt;/A&gt;. I followed the steps from&amp;nbsp;&lt;A href="https://docs.gcp.databricks.com/storage/gcs.html" target="_self"&gt;https://docs.gcp.databricks.com/storage/gcs.html.&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I have tried&amp;nbsp;&lt;A href="https://docs.gcp.databricks.com/storage/gcs.html#access-gcs-buckets-using-google-cloud-service-accounts-on-clusters" target="_self"&gt;Access GCS buckets using Google Cloud service accounts on clusters&lt;/A&gt;, but I still couldn't read the file on GCS with the code below&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;from pyspark.sql import SparkSession&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;spark = SparkSession.builder.appName("test").getOrCreate()&lt;/STRONG&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;df = spark.read.format("csv"&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;).load("gs://mybucket/test.csv")&lt;/STRONG&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;The error message I got&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;```&lt;BR /&gt;&lt;STRONG&gt;"xxx@xxx.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",&lt;/STRONG&gt;&lt;BR /&gt;```&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;I also tried&amp;nbsp;&lt;A href="https://docs.gcp.databricks.com/storage/gcs.html#access-a-gcs-bucket-directly-with-a-google-cloud-service-account-key" target="_self"&gt;Access a GCS bucket directly with a Google Cloud service account key.&amp;nbsp;&lt;/A&gt;I stucked in Step 4 &amp;amp; 5. Since step 5 uses `&lt;SPAN class=""&gt;&lt;STRONG&gt;{{secrets/scope/gsa_private_key}}&lt;/STRONG&gt;` and `&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;{{secrets/scope/gsa_private_key_id}}&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;` to get thegsa_private_key and&amp;nbsp;gsa_private_key_id. I am not quite sure where should I do the step 4? I think it doesn't make to do it on local computer, however, it is also weird to do it on the cluster terminal.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Please help me solve this problem. Thanks in advance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 27 Sep 2023 20:02:38 GMT</pubDate>
    <dc:creator>shihs</dc:creator>
    <dc:date>2023-09-27T20:02:38Z</dc:date>
    <item>
      <title>Can not read data from GCS</title>
      <link>https://community.databricks.com/t5/get-started-discussions/can-not-read-data-from-gcs/m-p/46394#M5911</link>
      <description>&lt;P&gt;I am trying to&amp;nbsp;use Databricks to read data on Google Cloud Storage (GCS) with&amp;nbsp;&lt;A href="https://cloud.google.com/databricks" target="_self"&gt;&lt;SPAN&gt;Databricks on Google Cloud&lt;/SPAN&gt;&lt;/A&gt;. I followed the steps from&amp;nbsp;&lt;A href="https://docs.gcp.databricks.com/storage/gcs.html" target="_self"&gt;https://docs.gcp.databricks.com/storage/gcs.html.&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I have tried&amp;nbsp;&lt;A href="https://docs.gcp.databricks.com/storage/gcs.html#access-gcs-buckets-using-google-cloud-service-accounts-on-clusters" target="_self"&gt;Access GCS buckets using Google Cloud service accounts on clusters&lt;/A&gt;, but I still couldn't read the file on GCS with the code below&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;from pyspark.sql import SparkSession&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;spark = SparkSession.builder.appName("test").getOrCreate()&lt;/STRONG&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;df = spark.read.format("csv"&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;).load("gs://mybucket/test.csv")&lt;/STRONG&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;The error message I got&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;```&lt;BR /&gt;&lt;STRONG&gt;"xxx@xxx.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",&lt;/STRONG&gt;&lt;BR /&gt;```&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;I also tried&amp;nbsp;&lt;A href="https://docs.gcp.databricks.com/storage/gcs.html#access-a-gcs-bucket-directly-with-a-google-cloud-service-account-key" target="_self"&gt;Access a GCS bucket directly with a Google Cloud service account key.&amp;nbsp;&lt;/A&gt;I stucked in Step 4 &amp;amp; 5. Since step 5 uses `&lt;SPAN class=""&gt;&lt;STRONG&gt;{{secrets/scope/gsa_private_key}}&lt;/STRONG&gt;` and `&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;{{secrets/scope/gsa_private_key_id}}&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;` to get thegsa_private_key and&amp;nbsp;gsa_private_key_id. I am not quite sure where should I do the step 4? I think it doesn't make to do it on local computer, however, it is also weird to do it on the cluster terminal.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Please help me solve this problem. Thanks in advance!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Sep 2023 20:02:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/can-not-read-data-from-gcs/m-p/46394#M5911</guid>
      <dc:creator>shihs</dc:creator>
      <dc:date>2023-09-27T20:02:38Z</dc:date>
    </item>
  </channel>
</rss>

