DLT cloudfiles trigger interval not working
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-07-2023 10:58 AM
I have the following streaming table definition using cloudfiles format and pipelines.trigger.interval setting to reduce file discovery costs but the query is triggering every 12 seconds instead of every 5 minutes.
Is there another configuration I am missing or DLT cloudfiles does not work with that setting?
@dlt.table
def s3_data(
spark_conf={"pipelines.trigger.interval" : "5 minutes"},
table_properties={
"quality": "bronze",
"pipelines.reset.allowed": "false" # preserves the data in the delta table if you do full refresh
}
):
return (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.load("s3://my-bucket/")
.withColumn("filePath", input_file_name())
)