Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-06-2022 06:28 AM
Seems that "maxFileAge" solves the problem.
streaming_df = (
spark.readStream.format("cloudFiles").option("cloudFiles.format", "json") \
.option("maxFilesPerTrigger", 20) \
.option("multiLine", True) \
.option("maxFileAge", 1) \
.schema(schema).load(streaming_path)
)
This ignores files older than 1 week.
But how to ignore files older than 1 day?