Databricks Community

srinivas_001 · ‎03-15-2024

I have a Job configured to run on the file arrival
I have provided the path as
File arrival path: s3://test_bucket/test_cat/test_schema/

When a new parquet file arrived in this path the job was triggering automatically and processed the file

In case of reloading means, Overwriting the existing file
I am uploading the same file again (with same name) to this path then No run was triggered
(No worries about duplicating the data I just need to trigger the job)
Code as below:
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.option("inferSchema", "false")
.option("cloudFiles.allowOverwrites", "true")
.option("cloudFiles.schemaLocation", "checkpoint_dir")
.load(data_source)

Do I need to enable any other settings in order to trigger the job?

srinivas_001 · ‎03-18-2024

Hi Kaniz,

Thank you for the response.
I am using the databricks runtime 11.3, also checked the checkpoint and data source location which are properly configured. Still I am unable to trigger the job.

NOTE: Incoming files are pushed to AWS s3 location from Apache airflow with REPLACE option TRUE.

Databricks Community

File trigger options -- cloudFiles.allowOverwrites

Connect with Databricks Users in Your Area

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs

Data + AI Summit: Call for Presentations

Season's Speedings: Databricks SQL Delivers 4x Performance Boost Over Two Years

Now Hiring: Databricks Community Technical Moderator

Become Our Next Monthly Community Champion!