Databricks Community

srinivas_001 · ‎03-15-2024

I have a Job configured to run on the file arrival
I have provided the path as
File arrival path: s3://test_bucket/test_cat/test_schema/

When a new parquet file arrived in this path the job was triggering automatically and processed the file

In case of reloading means, Overwriting the existing file
I am uploading the same file again (with same name) to this path then No run was triggered
(No worries about duplicating the data I just need to trigger the job)
Code as below:
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.option("inferSchema", "false")
.option("cloudFiles.allowOverwrites", "true")
.option("cloudFiles.schemaLocation", "checkpoint_dir")
.load(data_source)

Do I need to enable any other settings in order to trigger the job?

srinivas_001 · ‎03-18-2024

Hi Kaniz,

Thank you for the response.
I am using the databricks runtime 11.3, also checked the checkpoint and data source location which are properly configured. Still I am unable to trigger the job.

NOTE: Incoming files are pushed to AWS s3 location from Apache airflow with REPLACE option TRUE.

Databricks Community

File trigger options -- cloudFiles.allowOverwrites

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences