- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2023 08:54 AM
Still no progress on this. I want to confirm that my cluster configurations are identical in my notebook running on my general purpose compute cluster and my job cluster. Also I am using the same GCP service account. On my compute cluster autoloader works exactly as expected. Here is the code being used for autoloader (this works on compute cluster).
However, when I run this exact same code (from the same notebook) as a job autoloader stops the stream (it seems at .writeStream) and i simply see "stream stopped" with no real clue as to why, as seen below.
If I go to cloud storage I see that my checkpoint location was created, but the commits folder is empty, meaning autoloader was unable to write the stream.
If I run the notebook outside of workflows I see the commits folder gets populated, and if i remove the dbutils.fs.rm(checkpoint_path, True) command autoloader correctly does not write new files until new files are available in the source bucket.