Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2022 06:33 AM
By reloading I mean to load all the existing data in that folder. As mentioned above:
- if there are no special charaters that make AutoLoader fail we can do:
``
autoloader = spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", data_format) \
.option("header", "true") \
.option("cloudFiles.schemaLocation", schema_location) \
.option("cloudFiles.allowOverwrites", "true") \
.load(path)
``
- in the second case, where Autloader will fail (at least we know from experience, that it does with the colon in the file names), we use simple data load:
``
df = spark.read.format(data_format)\
.option("header", "true") \
.load(path)
``
That is why I mentioned that luckily for us, this data folder is not that huge and it works fast.