cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

File trigger options -- cloudFiles.allowOverwrites

srinivas_001
New Contributor III

I have a Job configured to run on the file arrival 
I have provided the path as 
File arrival path: s3://test_bucket/test_cat/test_schema/

When a new parquet file arrived in this path the job was triggering automatically and processed the file

In case of reloading means, Overwriting the existing file
I am uploading the same file again (with same name) to this path then No run was triggered 
(No worries about duplicating the data I just need to trigger the job)
Code as below:
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.option("inferSchema", "false")
.option("cloudFiles.allowOverwrites", "true")
.option("cloudFiles.schemaLocation", "checkpoint_dir")
.load(data_source)

Do I need to enable any other settings in order to trigger the job?

1 REPLY 1

srinivas_001
New Contributor III

Hi Kaniz,

Thank you for the response.
I am using the databricks runtime 11.3, also checked the checkpoint and data source location which are properly configured. Still I am unable to trigger the job.

NOTE: Incoming files are pushed to AWS s3 location from Apache airflow with REPLACE option TRUE.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now