By reloading I mean to load all the existing data in that folder. As mentioned above:

  • if there are no special charaters that make AutoLoader fail we can do:

``

autoloader = spark.readStream.format("cloudFiles") \

.option("cloudFiles.format", data_format) \

.option("header", "true") \

.option("cloudFiles.schemaLocation", schema_location) \

.option("cloudFiles.allowOverwrites", "true") \

.load(path)

``​

  • in the second case, where Autloader will fail (at least we know from experience, that it does with the colon in the file names), we use simple data load:

``

df = spark.read.format(data_format)\

.option("header", "true") \

.load(path)

``​

That is why I mentioned that luckily for us, this data folder is not that huge and it works fast.