I have a Job configured to run on the file arrival
I have provided the path as
File arrival path: s3://test_bucket/test_cat/test_schema/
When a new parquet file arrived in this path the job was triggering automatically and processed the file
In case of reloading means, Overwriting the existing file
I am uploading the same file again (with same name) to this path then No run was triggered
(No worries about duplicating the data I just need to trigger the job)
Code as below:
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.option("inferSchema", "false")
.option("cloudFiles.allowOverwrites", "true")
.option("cloudFiles.schemaLocation", "checkpoint_dir")
.load(data_source)
Do I need to enable any other settings in order to trigger the job?