cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

commit time is coming as null in autoloader

shrutikatyal
New Contributor III

As per the databricks new feature in autoloader that we can use archival and move feature in autoloader however I am trying to use that feature using databricks 16.4.x.scala2.12 however commit time is still coming null as its mentioned in the documentation if commit time is null this feature won't work . how can I resolve it?

I am using below autoloader configuration:

df=spark.readStream.format("cloudfiles")\
        .option("cloudfiles.format","json")\
        .option("cloudfiles.schemaLocation",checkpoint_path)\
        .option("multiLine", "True")\
        .option("cloudFiles.backfillInterval","10 minutes")\
        .option("cloudFiles.inferColumnTypes", "True")\
        .load(ingestDirectory)

dfOutput = (
  df.writeStream
  .trigger(once=True)
  .format("delta")
  .option("mergeSchema", "true")
  .option("checkpointLocation",checkpoint_path)
  .start(CuratedDirectory)
)

dfOutput.awaitTermination()



1 ACCEPTED SOLUTION

Accepted Solutions

mani_22
Databricks Employee
Databricks Employee

@shrutikatyal  I believe the commit_time only functions when the cloudFiles.cleanSource option is enabled. I don't see this option present in your snippet. Could you please enable this option for the read and check? 

Refer to the below documentation, which specifies that column commit_time is supported in Databricks Runtime 16.4 and above when cloudFiles.cleanSource is enabled

https://docs.databricks.com/aws/en/sql/language-manual/functions/cloud_files_state

A file might be processed but marked as committed arbitrarily later. commit_time is updated usually at the start of the next microbatch.

View solution in original post

9 REPLIES 9

Yogesh_378691
New Contributor III


You're right — the new **archival and move feature in Auto Loader** depends on the `_commit_timestamp` column. If that value is coming as `null`, the feature won't work, as mentioned in the documentation.

To fix this, you need to make sure you're explicitly enabling the `commitTime` metadata column using the following option in your Auto Loader configuration:

```python
.option("cloudFiles.addColumns", "commitTime")
```

This ensures that the `_commit_timestamp` field gets populated during ingestion.

Here’s the corrected version of your code:

```python
df = (
spark.readStream.format("cloudfiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.schemaLocation", checkpoint_path)
.option("multiLine", "true")
.option("cloudFiles.backfillInterval", "10 minutes")
.option("cloudFiles.inferColumnTypes", "true")
.option("cloudFiles.addColumns", "commitTime") # Required to get _commit_timestamp
.load(ingestDirectory)
)
```

Once this is set, `_commit_timestamp` should be populated properly, and the archival/move feature should start working as expected.

 

hi Yogesh,
I am getting below error if I am trying to add this in my autoloader configuration i.e. option ("cloudFiles.addColumns", "commitTime").

Attached all screenshots for reference

below i have attached screenshot for cluster configuration

mani_22
Databricks Employee
Databricks Employee

@shrutikatyal  I believe the commit_time only functions when the cloudFiles.cleanSource option is enabled. I don't see this option present in your snippet. Could you please enable this option for the read and check? 

Refer to the below documentation, which specifies that column commit_time is supported in Databricks Runtime 16.4 and above when cloudFiles.cleanSource is enabled

https://docs.databricks.com/aws/en/sql/language-manual/functions/cloud_files_state

A file might be processed but marked as committed arbitrarily later. commit_time is updated usually at the start of the next microbatch.

shrutikatyal
New Contributor III

Thanks, its working now

shrutikatyal
New Contributor III

hi,
I am interested in data bricks certified associate data Engineer certification can I get any voucher?

TheOC
New Contributor II

Hey @shrutikatyal 
I believe the only current route to get a discount voucher would be the following:

https://community.databricks.com/t5/events/dais-2025-virtual-learning-festival-11-june-02-july-2025/...
I think it’s the last day of the event so you might need to be quick!
hope this helps,
TheOC

Cheers,
TheOC

shrutikatyal
New Contributor III

hi,
I have done the certification using data bricks learning festival however I haven't got any voucher yet.
Thanks

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now