<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Autoloader (GCP) Custom PubSub Queue in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/autoloader-gcp-custom-pubsub-queue/m-p/16394#M10588</link>
    <description>&lt;P&gt;I want to know if what I describe below is possible with AutoLoader in the Google Cloud Platform.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Problem Description:&lt;/P&gt;&lt;P&gt;We have GCS buckets for every client/account. Inside these buckets is a path/blob for each client's instances of our platform. A client can have 1 or many instances of our platform. Inside the path/blobs are the incremental data files we need to process for the clients. The paths look something like:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;gs://&amp;lt;client specific bucket name&amp;gt;/&amp;lt;platform instance id&amp;gt;/data/&amp;lt;year&amp;gt;/&amp;lt;month&amp;gt;/&amp;lt;day&amp;gt;/datafile&amp;lt;some UUID&amp;gt;.json.gz&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I want to set up a SINGLE autoloader to load all data files across all of the buckets and paths. Is this possible?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Potential Solution:&lt;/P&gt;&lt;P&gt;From reading the docs it looks like I might be able to create a PubSub topic, and then set notifications on the buckets manually to send the file notifications to the created PubSub topic.  &lt;/P&gt;&lt;P&gt;After that I should be able to set the `cloudFiles.subscription` option to point at the PubSub topic I created and then set `pathGlobFilter` to filter to the correct data files so we don't read every file in the bucket.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Will this work as I am expecting? I do not want Autoloader to launch notifications on every bucket we have in our account when I add  `gs://*/.....` to the `pathGlobFilter`.&lt;/P&gt;</description>
    <pubDate>Tue, 28 Jun 2022 15:54:22 GMT</pubDate>
    <dc:creator>Ryan512</dc:creator>
    <dc:date>2022-06-28T15:54:22Z</dc:date>
    <item>
      <title>Autoloader (GCP) Custom PubSub Queue</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-gcp-custom-pubsub-queue/m-p/16394#M10588</link>
      <description>&lt;P&gt;I want to know if what I describe below is possible with AutoLoader in the Google Cloud Platform.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Problem Description:&lt;/P&gt;&lt;P&gt;We have GCS buckets for every client/account. Inside these buckets is a path/blob for each client's instances of our platform. A client can have 1 or many instances of our platform. Inside the path/blobs are the incremental data files we need to process for the clients. The paths look something like:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;gs://&amp;lt;client specific bucket name&amp;gt;/&amp;lt;platform instance id&amp;gt;/data/&amp;lt;year&amp;gt;/&amp;lt;month&amp;gt;/&amp;lt;day&amp;gt;/datafile&amp;lt;some UUID&amp;gt;.json.gz&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I want to set up a SINGLE autoloader to load all data files across all of the buckets and paths. Is this possible?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Potential Solution:&lt;/P&gt;&lt;P&gt;From reading the docs it looks like I might be able to create a PubSub topic, and then set notifications on the buckets manually to send the file notifications to the created PubSub topic.  &lt;/P&gt;&lt;P&gt;After that I should be able to set the `cloudFiles.subscription` option to point at the PubSub topic I created and then set `pathGlobFilter` to filter to the correct data files so we don't read every file in the bucket.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Will this work as I am expecting? I do not want Autoloader to launch notifications on every bucket we have in our account when I add  `gs://*/.....` to the `pathGlobFilter`.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Jun 2022 15:54:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-gcp-custom-pubsub-queue/m-p/16394#M10588</guid>
      <dc:creator>Ryan512</dc:creator>
      <dc:date>2022-06-28T15:54:22Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader (GCP) Custom PubSub Queue</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-gcp-custom-pubsub-queue/m-p/16396#M10590</link>
      <description>&lt;P&gt;Hi @Ryan Ebanks​,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Do you still need help or the article helped you to resolve your question? please let us know.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jul 2022 23:28:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-gcp-custom-pubsub-queue/m-p/16396#M10590</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-07-20T23:28:50Z</dc:date>
    </item>
    <item>
      <title>Re: Autoloader (GCP) Custom PubSub Queue</title>
      <link>https://community.databricks.com/t5/data-engineering/autoloader-gcp-custom-pubsub-queue/m-p/16397#M10591</link>
      <description>&lt;P&gt;Hello @Ryan Ebanks​&amp;nbsp;Please let us know if more help is needed on this.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Aug 2022 05:13:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/autoloader-gcp-custom-pubsub-queue/m-p/16397#M10591</guid>
      <dc:creator>Noopur_Nigam</dc:creator>
      <dc:date>2022-08-02T05:13:39Z</dc:date>
    </item>
  </channel>
</rss>

