<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Ingest files from GCS with Auto Loader in DLT pipeline running on AWS in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/ingest-files-from-gcs-with-auto-loader-in-dlt-pipeline-running/m-p/137749#M50805</link>
    <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Your error is due to how Databricks on AWS is trying to access GCS: it's defaulting to using the GCP metadata server (which only exists on Google Cloud VMs), not the service account key you provided. This is a common issue when connecting GCS from non-GCP platforms like Databricks on AWS.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Why This Happens&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;By default, the Databricks GCS connector attempts to use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="reset interactable cursor-pointer decoration-1 underline-offset-1 text-super hover:underline font-semibold" href="http://169.254.169.254/" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class="text-box-trim-both"&gt;GCE metadata server&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to obtain authentication if not configured otherwise.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;On AWS, this metadata server doesn't exist (404 error you see), so authentication fails.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Merely supplying the credentials to Auto Loader's&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.option()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;dictionary is often not enough—the Hadoop GCS connector expects them as Spark Hadoop configuration properties.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Solution: Configure GCS Credentials in Spark/Hadoop&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;You need to set your GCS credentials in the Hadoop configuration, not just as&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;cloudFiles.*&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;options.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Steps to Fix:&lt;/H2&gt;
&lt;OL class="marker:text-quiet list-decimal"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Set Hadoop Options Globally&lt;/STRONG&gt;&lt;BR /&gt;Before you read from GCS, add this in your notebook or cluster initialization:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;set&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"spark.hadoop.google.cloud.auth.service.account.enable"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;set&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"spark.hadoop.google.cloud.auth.service.account.json.keyfile"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"/dbfs/tmp/gcs-key.json"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Dump your key JSON to a file (ephemeral ok for testing):&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;with&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;open&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"/dbfs/tmp/gcs-key.json"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"w"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;as&lt;/SPAN&gt; f&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    f&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;write&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;json&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;dumps&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;gcs_key&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Remove Auth Info from .options()&lt;/STRONG&gt;&lt;BR /&gt;You don't need to pass individual credential fields as&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;cloudFiles.*&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;options. Only use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;cloudFiles.format&lt;/CODE&gt;, etc., there. The keyfile method is preferred for Hadoop.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Restart Your Cluster if Needed&lt;/STRONG&gt;&lt;BR /&gt;Hadoop configs may require a restart if set at cluster level.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Double Check Permissions&lt;/STRONG&gt;&lt;BR /&gt;Ensure your GCP service account actually has&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;Storage Object Viewer&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(and perhaps&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;Storage Object Admin&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for writing) on the target bucket.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Sample Minimal Correct Setup&lt;/H2&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; json

&lt;SPAN class="token token"&gt;# Write GCP key to DBFS&lt;/SPAN&gt;
&lt;SPAN class="token token"&gt;with&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;open&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"/dbfs/tmp/gcs-key.json"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"w"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;as&lt;/SPAN&gt; f&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    f&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;write&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;json&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;dumps&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;gcs_key&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;

spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;set&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"spark.hadoop.google.cloud.auth.service.account.enable"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;set&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"spark.hadoop.google.cloud.auth.service.account.json.keyfile"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"/dbfs/tmp/gcs-key.json"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;

cloudfile &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="token token punctuation"&gt;{&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"csv"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;"cloudFiles.inferColumnTypes"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;"cloudFiles.allowOverwrites"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;
&lt;SPAN class="token token punctuation"&gt;}&lt;/SPAN&gt;

&lt;SPAN class="token token decorator annotation punctuation"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;&lt;/SPAN&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;view&lt;/SPAN&gt;
&lt;SPAN class="token token"&gt;def&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;consumo_das_contas_raw&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;return&lt;/SPAN&gt; &lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;
        spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;readStream
        &lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;format&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'cloudFiles'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
        &lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;options&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token operator"&gt;**&lt;/SPAN&gt;cloudfile&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
        &lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;load&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation"&gt;f'gs://&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation interpolation punctuation"&gt;{&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation interpolation"&gt;gcs_bucket_name&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation interpolation punctuation"&gt;}&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation"&gt;/cost-reports/daily/*/billing_report_*.csv'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This setup should allow Databricks on AWS to read files from GCS using your provided service account key, sidestepping the metadata API.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;&amp;nbsp;&lt;/H2&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;References&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Databricks docs:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="reset interactable cursor-pointer decoration-1 underline-offset-1 text-super hover:underline font-semibold" href="https://docs.databricks.com/en/storage/google-cloud-storage.html" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class="text-box-trim-both"&gt;Accessing Google Cloud Storage from Databricks on AWS&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Google docs:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="reset interactable cursor-pointer decoration-1 underline-offset-1 text-super hover:underline font-semibold" href="https://cloud.google.com/storage/docs/access-control/iam-roles" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class="text-box-trim-both"&gt;GCS service account permissions&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;If you need to do this securely, consider using a secrets manager to inject paths/keys, and clean up the temp file after use.&lt;/P&gt;</description>
    <pubDate>Wed, 05 Nov 2025 12:40:27 GMT</pubDate>
    <dc:creator>mark_ott</dc:creator>
    <dc:date>2025-11-05T12:40:27Z</dc:date>
    <item>
      <title>Ingest files from GCS with Auto Loader in DLT pipeline running on AWS</title>
      <link>https://community.databricks.com/t5/data-engineering/ingest-files-from-gcs-with-auto-loader-in-dlt-pipeline-running/m-p/112516#M44239</link>
      <description>&lt;P&gt;I have some DLT pipelines working fine ingesting files from S3. Now I'm trying to build a pipeline to ingest files from GCS using Auto Loader. I'm running Databricks on AWS.&lt;/P&gt;&lt;P&gt;The code I have:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import dlt
import json
from pyspark.sql.functions import col

# Get GCS credentials
gcs_key = json.loads(dbutils.secrets.get(scope="finops", key="[REDACTED]"))

# Get environment from configuration
environment = spark.conf.get("environment")

gcs_bucket_name = "[REDACTED]"

# Define cloud file options
cloudfile = {
    "cloudFiles.format": "csv",
    "cloudFiles.inferColumnTypes": "true",
    "cloudFiles.allowOverwrites": "true",
    "cloudFiles.projectId": gcs_key.get("project_id"),
    "cloudFiles.client": gcs_key.get("client_id"),
    "cloudFiles.clientEmail": gcs_key.get("client_email"),
    "cloudFiles.privateKey": gcs_key.get("private_key"),
    "cloudFiles.privateKeyId": gcs_key.get("private_key_id")
}

# Consumo das contas

valid_records = {
    "valid_date": "day IS NOT NULL",
    "valid_project": "project_id IS NOT NULL",
    "valid_service": "service_id IS NOT NULL",
}

# Define the streaming view with metadata columns
@dlt.view
@dlt.expect_all_or_drop(valid_records)
def consumo_das_contas_raw():
    return (
        spark.readStream
        .format('cloudFiles')
        .options(**cloudfile)
        .load(f'gs://{gcs_bucket_name}/cost-reports/daily/*/billing_report_*.csv')
        .withColumn("update_timestamp", col("_metadata.file_modification_time"))  # Use file modification time
    )

full_table_name = f"{environment}_[REDACTED].gcp_fin_ops.consumo_das_contas"

dlt.create_streaming_table(full_table_name)

dlt.apply_changes(
  target = full_table_name,
  source = "consumo_das_contas_raw",
  keys = ["day", "project_id", "service_id"],
  sequence_by = "update_timestamp",
  except_column_list = ["update_timestamp"],
  stored_as_scd_type = 1
)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;When I run it I get the error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;py4j.protocol.Py4JJavaError: Traceback (most recent call last):
  File "/Workspace/Users/[REDACTED]/.bundle/FinOps/dev/files/src/finops_gcp_pipeline", cell 1, line 41, in consumo_das_contas_raw
    .load(f'gs://{gcs_bucket_name}/cost-reports/daily/*/billing_report_*.csv')
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

py4j.protocol.Py4JJavaError: An error occurred while calling o902.load.
: java.io.IOException: Error getting access token from metadata server at: http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
	at shaded.databricks.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:250)
	at shaded.databricks.com.google.cloud.hadoop.util.CredentialFactory.getCredential(CredentialFactory.java:389)
	at shaded.databricks.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getCredential(GoogleHadoopFileSystemBase.java:1807)
	at shaded.databricks.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.createGcsFs(GoogleHadoopFileSystemBase.java:1951)
	at shaded.databricks.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.configure(GoogleHadoopFileSystemBase.java:1936)
	at shaded.databricks.com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.initialize(GoogleHadoopFileSystemBase.java:501)
	at com.databricks.common.filesystem.LokiGCFS.initialize(LokiGCFS.scala:37)
	...
Caused by: shaded.databricks.com.google.api.client.http.HttpResponseException: 404 Not Found
GET http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
endpoint not available
	at shaded.databricks.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1113)
	at shaded.databricks.com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:192)
	at shaded.databricks.com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:470)
	at shaded.databricks.com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:247)
	... 269 more&lt;/LI-CODE&gt;&lt;P&gt;It looks like it's trying to use some function that only exists in Databricks on GCP to connect. I hope someone has a clue on what is happening.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Mar 2025 18:18:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingest-files-from-gcs-with-auto-loader-in-dlt-pipeline-running/m-p/112516#M44239</guid>
      <dc:creator>fscaravelli</dc:creator>
      <dc:date>2025-03-13T18:18:57Z</dc:date>
    </item>
    <item>
      <title>Re: Ingest files from GCS with Auto Loader in DLT pipeline running on AWS</title>
      <link>https://community.databricks.com/t5/data-engineering/ingest-files-from-gcs-with-auto-loader-in-dlt-pipeline-running/m-p/137749#M50805</link>
      <description>&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Your error is due to how Databricks on AWS is trying to access GCS: it's defaulting to using the GCP metadata server (which only exists on Google Cloud VMs), not the service account key you provided. This is a common issue when connecting GCS from non-GCP platforms like Databricks on AWS.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Why This Happens&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;By default, the Databricks GCS connector attempts to use the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="reset interactable cursor-pointer decoration-1 underline-offset-1 text-super hover:underline font-semibold" href="http://169.254.169.254/" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class="text-box-trim-both"&gt;GCE metadata server&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to obtain authentication if not configured otherwise.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;On AWS, this metadata server doesn't exist (404 error you see), so authentication fails.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Merely supplying the credentials to Auto Loader's&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.option()&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;dictionary is often not enough—the Hadoop GCS connector expects them as Spark Hadoop configuration properties.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Solution: Configure GCS Credentials in Spark/Hadoop&lt;/H2&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;You need to set your GCS credentials in the Hadoop configuration, not just as&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;cloudFiles.*&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;options.&lt;/P&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Steps to Fix:&lt;/H2&gt;
&lt;OL class="marker:text-quiet list-decimal"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Set Hadoop Options Globally&lt;/STRONG&gt;&lt;BR /&gt;Before you read from GCS, add this in your notebook or cluster initialization:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;set&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"spark.hadoop.google.cloud.auth.service.account.enable"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;set&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"spark.hadoop.google.cloud.auth.service.account.json.keyfile"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"/dbfs/tmp/gcs-key.json"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Dump your key JSON to a file (ephemeral ok for testing):&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;with&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;open&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"/dbfs/tmp/gcs-key.json"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"w"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;as&lt;/SPAN&gt; f&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    f&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;write&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;json&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;dumps&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;gcs_key&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Remove Auth Info from .options()&lt;/STRONG&gt;&lt;BR /&gt;You don't need to pass individual credential fields as&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;cloudFiles.*&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;options. Only use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;cloudFiles.format&lt;/CODE&gt;, etc., there. The keyfile method is preferred for Hadoop.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Restart Your Cluster if Needed&lt;/STRONG&gt;&lt;BR /&gt;Hadoop configs may require a restart if set at cluster level.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;&lt;STRONG&gt;Double Check Permissions&lt;/STRONG&gt;&lt;BR /&gt;Ensure your GCP service account actually has&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;Storage Object Viewer&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(and perhaps&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;Storage Object Admin&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for writing) on the target bucket.&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;Sample Minimal Correct Setup&lt;/H2&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-light selection:text-super selection:bg-super/10 my-md relative flex flex-col rounded font-mono text-sm font-normal bg-subtler"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl flex h-0 items-start justify-end md:sticky md:top-[100px]"&gt;
&lt;DIV class="overflow-hidden rounded-full border-subtlest ring-subtlest divide-subtlest bg-base"&gt;
&lt;DIV class="border-subtlest ring-subtlest divide-subtlest bg-subtler"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-quiet bg-subtle py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;python&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;CODE&gt;&lt;SPAN class="token token"&gt;import&lt;/SPAN&gt; json

&lt;SPAN class="token token"&gt;# Write GCP key to DBFS&lt;/SPAN&gt;
&lt;SPAN class="token token"&gt;with&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;open&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"/dbfs/tmp/gcs-key.json"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"w"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;as&lt;/SPAN&gt; f&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    f&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;write&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;json&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;dumps&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;gcs_key&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;

spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;set&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"spark.hadoop.google.cloud.auth.service.account.enable"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;conf&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;set&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;"spark.hadoop.google.cloud.auth.service.account.json.keyfile"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"/dbfs/tmp/gcs-key.json"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;

cloudfile &lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="token token punctuation"&gt;{&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"csv"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;"cloudFiles.inferColumnTypes"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;,&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;"cloudFiles.allowOverwrites"&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;"true"&lt;/SPAN&gt;
&lt;SPAN class="token token punctuation"&gt;}&lt;/SPAN&gt;

&lt;SPAN class="token token decorator annotation punctuation"&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;&lt;/SPAN&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token decorator annotation punctuation"&gt;view&lt;/SPAN&gt;
&lt;SPAN class="token token"&gt;def&lt;/SPAN&gt; &lt;SPAN class="token token"&gt;consumo_das_contas_raw&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;:&lt;/SPAN&gt;
    &lt;SPAN class="token token"&gt;return&lt;/SPAN&gt; &lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;
        spark&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;readStream
        &lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;format&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token"&gt;'cloudFiles'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
        &lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;options&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token operator"&gt;**&lt;/SPAN&gt;cloudfile&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
        &lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;load&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation"&gt;f'gs://&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation interpolation punctuation"&gt;{&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation interpolation"&gt;gcs_bucket_name&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation interpolation punctuation"&gt;}&lt;/SPAN&gt;&lt;SPAN class="token token string-interpolation"&gt;/cost-reports/daily/*/billing_report_*.csv'&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
    &lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;This setup should allow Databricks on AWS to read files from GCS using your provided service account key, sidestepping the metadata API.&lt;/P&gt;
&lt;HR /&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;&amp;nbsp;&lt;/H2&gt;
&lt;H2 class="mb-2 mt-4 font-display font-semimedium text-base first:mt-0"&gt;References&lt;/H2&gt;
&lt;UL class="marker:text-quiet list-disc"&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Databricks docs:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="reset interactable cursor-pointer decoration-1 underline-offset-1 text-super hover:underline font-semibold" href="https://docs.databricks.com/en/storage/google-cloud-storage.html" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class="text-box-trim-both"&gt;Accessing Google Cloud Storage from Databricks on AWS&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI class="py-0 my-0 prose-p:pt-0 prose-p:mb-2 prose-p:my-0 [&amp;amp;&amp;gt;p]:pt-0 [&amp;amp;&amp;gt;p]:mb-2 [&amp;amp;&amp;gt;p]:my-0"&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;Google docs:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="reset interactable cursor-pointer decoration-1 underline-offset-1 text-super hover:underline font-semibold" href="https://cloud.google.com/storage/docs/access-control/iam-roles" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class="text-box-trim-both"&gt;GCS service account permissions&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-2 [&amp;amp;+p]:mt-4 [&amp;amp;_strong:has(+br)]:inline-block [&amp;amp;_strong:has(+br)]:pb-2"&gt;If you need to do this securely, consider using a secrets manager to inject paths/keys, and clean up the temp file after use.&lt;/P&gt;</description>
      <pubDate>Wed, 05 Nov 2025 12:40:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingest-files-from-gcs-with-auto-loader-in-dlt-pipeline-running/m-p/137749#M50805</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-11-05T12:40:27Z</dc:date>
    </item>
  </channel>
</rss>

