Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-09-2022 11:00 PM
Update:
Seems that maxFileAge was not a good idea. The following with the option "includeExistingFiles" = False solved my problem:
streaming_df = (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", extension)
.option("cloudFiles.maxFilesPerTrigger", 20)
.option("cloudFiles.includeExistingFiles", False)
.option("multiLine", True)
.option("pathGlobfilter", "*."+extension) \
.schema(schema).load(streaming_path)
)