I want to know if what I describe below is possible with AutoLoader in the Google Cloud Platform.
Problem Description:
We have GCS buckets for every client/account. Inside these buckets is a path/blob for each client's instances of our platform. A client can have 1 or many instances of our platform. Inside the path/blobs are the incremental data files we need to process for the clients. The paths look something like:
gs://<client specific bucket name>/<platform instance id>/data/<year>/<month>/<day>/datafile<some UUID>.json.gz
I want to set up a SINGLE autoloader to load all data files across all of the buckets and paths. Is this possible?
Potential Solution:
From reading the docs it looks like I might be able to create a PubSub topic, and then set notifications on the buckets manually to send the file notifications to the created PubSub topic.
After that I should be able to set the `cloudFiles.subscription` option to point at the PubSub topic I created and then set `pathGlobFilter` to filter to the correct data files so we don't read every file in the bucket.
Will this work as I am expecting? I do not want Autoloader to launch notifications on every bucket we have in our account when I add `gs://*/.....` to the `pathGlobFilter`.