I have a simple job scheduled every 5 min. Basically it listens to cloudfiles on storage account and writes them into delta table, extremely simple. The code is something like this:df = (spark
.readStream
.format("cloudFiles")
.option('cloudFil...
Greetings, I have similar problem. Did you try to use Databricks workflows instead and schedule them instead on Data Factory?Because inside workflows it is possible to select a specific branch, so it may actually work.What do you think?
I resolved it by using.option('cloudFiles.useIncrementalListing', 'false')Now if I understand correctly, rocksdb reads the whole list of files instead of its mini "checkpoints" based on filename and timestamps. My guess is: my json filenames are comp...