<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Exporting table to GCS bucket using job in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/exporting-table-to-gcs-bucket-using-job/m-p/117022#M45413</link>
    <description>&lt;P&gt;Hi all,&lt;BR /&gt;&lt;BR /&gt;Usecase: I want to send the result of a query to GCS bucket location in json format.&lt;/P&gt;&lt;P&gt;Approach: From my java based application I create a job and that job will be running a notebook`. Notebook will have something like this&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;query &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"SELECT * FROM table"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(query)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;SPAN&gt;gcs_path &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"gs://&amp;lt;bucket&amp;gt;/path/"&lt;/SPAN&gt;&lt;DIV&gt;&lt;SPAN&gt;df.write.&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"maxRecordsPerFile"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;int&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"100"&lt;/SPAN&gt;&lt;SPAN&gt;)).&lt;/SPAN&gt;&lt;SPAN&gt;mode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;json&lt;/SPAN&gt;&lt;SPAN&gt;(gcs_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;```&lt;/DIV&gt;&lt;DIV&gt;I am able to provide access to my gcs bucket using a service account json which has access to my gcs account. But for my usecase. I cant provide the service account information to the databricks account. But rather I am okay with exposing an access token which will be created from the service account.&lt;BR /&gt;&lt;BR /&gt;I tried something like&lt;/DIV&gt;&lt;DIV&gt;```&lt;BR /&gt;spark.conf.set("spark.hadoop.fs.gs.auth.type", "OAuth")&lt;BR /&gt;spark.conf.set("spark.hadoop.fs.gs.auth.access.token", access_token)&lt;/DIV&gt;&lt;DIV&gt;```&lt;BR /&gt;&lt;BR /&gt;which didn't had any effect. I am getting below error in my notebook&lt;BR /&gt;&lt;SPAN class=""&gt;Py4JJavaError: &lt;/SPAN&gt;&lt;SPAN&gt;An error occurred while calling o476.json. : java.io.IOException: Error getting access token from metadata server at:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Kind of stuck in this. Any help would be appreciated.&lt;BR /&gt;Thanks,&lt;/DIV&gt;&lt;DIV&gt;Aswin&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Tue, 29 Apr 2025 17:39:18 GMT</pubDate>
    <dc:creator>aswinvishnu</dc:creator>
    <dc:date>2025-04-29T17:39:18Z</dc:date>
    <item>
      <title>Exporting table to GCS bucket using job</title>
      <link>https://community.databricks.com/t5/data-engineering/exporting-table-to-gcs-bucket-using-job/m-p/117022#M45413</link>
      <description>&lt;P&gt;Hi all,&lt;BR /&gt;&lt;BR /&gt;Usecase: I want to send the result of a query to GCS bucket location in json format.&lt;/P&gt;&lt;P&gt;Approach: From my java based application I create a job and that job will be running a notebook`. Notebook will have something like this&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;query &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"SELECT * FROM table"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark&lt;/SPAN&gt;&lt;SPAN&gt;.sql&lt;/SPAN&gt;&lt;SPAN&gt;(query)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;SPAN&gt;gcs_path &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;"gs://&amp;lt;bucket&amp;gt;/path/"&lt;/SPAN&gt;&lt;DIV&gt;&lt;SPAN&gt;df.write.&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"maxRecordsPerFile"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;int&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"100"&lt;/SPAN&gt;&lt;SPAN&gt;)).&lt;/SPAN&gt;&lt;SPAN&gt;mode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;json&lt;/SPAN&gt;&lt;SPAN&gt;(gcs_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;```&lt;/DIV&gt;&lt;DIV&gt;I am able to provide access to my gcs bucket using a service account json which has access to my gcs account. But for my usecase. I cant provide the service account information to the databricks account. But rather I am okay with exposing an access token which will be created from the service account.&lt;BR /&gt;&lt;BR /&gt;I tried something like&lt;/DIV&gt;&lt;DIV&gt;```&lt;BR /&gt;spark.conf.set("spark.hadoop.fs.gs.auth.type", "OAuth")&lt;BR /&gt;spark.conf.set("spark.hadoop.fs.gs.auth.access.token", access_token)&lt;/DIV&gt;&lt;DIV&gt;```&lt;BR /&gt;&lt;BR /&gt;which didn't had any effect. I am getting below error in my notebook&lt;BR /&gt;&lt;SPAN class=""&gt;Py4JJavaError: &lt;/SPAN&gt;&lt;SPAN&gt;An error occurred while calling o476.json. : java.io.IOException: Error getting access token from metadata server at:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Kind of stuck in this. Any help would be appreciated.&lt;BR /&gt;Thanks,&lt;/DIV&gt;&lt;DIV&gt;Aswin&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 29 Apr 2025 17:39:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exporting-table-to-gcs-bucket-using-job/m-p/117022#M45413</guid>
      <dc:creator>aswinvishnu</dc:creator>
      <dc:date>2025-04-29T17:39:18Z</dc:date>
    </item>
    <item>
      <title>Re: Exporting table to GCS bucket using job</title>
      <link>https://community.databricks.com/t5/data-engineering/exporting-table-to-gcs-bucket-using-job/m-p/117067#M45418</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/162656"&gt;@aswinvishnu&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;GCS support in Spark via Hadoop Connectors has specific limitations, and using a raw access token (OAuth token) instead of a service account key file is tricky, especially in Databricks.&lt;BR /&gt;You’re trying to use access token–based authentication, but GCS's Hadoop connector (used under the hood by Spark) typically expects:&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1. Service Account key file (standard)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2. Or ADC (Application Default Credentials) from the environment/metadata server (in GCP-native&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;services&amp;nbsp; like&amp;nbsp; GKE or Dataproc)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;BR /&gt;Databricks is not natively GCP, so it doesn't have access to the GCP metadata server, hence the error:&lt;BR /&gt;Error getting access token from metadata server..&lt;/P&gt;&lt;P&gt;Use spark.hadoop.fs.gs.auth.type=ACCESS_TOKEN (Not "OAuth")&lt;BR /&gt;If you insist on using an access token instead of a key file, change your auth type:&lt;/P&gt;&lt;P&gt;spark.conf.set("spark.hadoop.fs.gs.auth.type", "ACCESS_TOKEN")&lt;BR /&gt;spark.conf.set("spark.hadoop.fs.gs.auth.access.token", access_token)&lt;/P&gt;&lt;P&gt;This is the correct config to pass a bearer token manually (OAuth is for interactive user flows; ACCESS_TOKEN is for static token use like this).&lt;BR /&gt;However, this still may not work reliably in Spark unless you're using the right version of the GCS connector (&amp;gt;= 2.2.0). Databricks may bundle older or customized versions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Apr 2025 02:48:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exporting-table-to-gcs-bucket-using-job/m-p/117067#M45418</guid>
      <dc:creator>lingareddy_Alva</dc:creator>
      <dc:date>2025-04-30T02:48:13Z</dc:date>
    </item>
    <item>
      <title>Re: Exporting table to GCS bucket using job</title>
      <link>https://community.databricks.com/t5/data-engineering/exporting-table-to-gcs-bucket-using-job/m-p/117461#M45503</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24053"&gt;@lingareddy_Alva&lt;/a&gt;,&lt;BR /&gt;Thanks for the reply. I tried the 'ACCESS_TOKEN' auth type too, but it didn't made any difference.&lt;/P&gt;</description>
      <pubDate>Fri, 02 May 2025 03:13:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exporting-table-to-gcs-bucket-using-job/m-p/117461#M45503</guid>
      <dc:creator>aswinvishnu</dc:creator>
      <dc:date>2025-05-02T03:13:24Z</dc:date>
    </item>
    <item>
      <title>Re: Exporting table to GCS bucket using job</title>
      <link>https://community.databricks.com/t5/data-engineering/exporting-table-to-gcs-bucket-using-job/m-p/117821#M45574</link>
      <description>&lt;P&gt;Consider using GCS signed URLs or access tokens for secure access.&lt;/P&gt;</description>
      <pubDate>Tue, 06 May 2025 08:23:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/exporting-table-to-gcs-bucket-using-job/m-p/117821#M45574</guid>
      <dc:creator>LorelaiSpence</dc:creator>
      <dc:date>2025-05-06T08:23:13Z</dc:date>
    </item>
  </channel>
</rss>

