4 weeks ago
As per the databricks new feature in autoloader that we can use archival and move feature in autoloader however I am trying to use that feature using databricks 16.4.x.scala2.12 however commit time is still coming null as its mentioned in the documentation if commit time is null this feature won't work . how can I resolve it?
I am using below autoloader configuration:
3 weeks ago
@shrutikatyal I believe the commit_time only functions when the cloudFiles.cleanSource option is enabled. I don't see this option present in your snippet. Could you please enable this option for the read and check?
Refer to the below documentation, which specifies that column commit_time is supported in Databricks Runtime 16.4 and above when cloudFiles.cleanSource is enabled
https://docs.databricks.com/aws/en/sql/language-manual/functions/cloud_files_state
A file might be processed but marked as committed arbitrarily later. commit_time is updated usually at the start of the next microbatch.
4 weeks ago
You're right — the new **archival and move feature in Auto Loader** depends on the `_commit_timestamp` column. If that value is coming as `null`, the feature won't work, as mentioned in the documentation.
To fix this, you need to make sure you're explicitly enabling the `commitTime` metadata column using the following option in your Auto Loader configuration:
```python
.option("cloudFiles.addColumns", "commitTime")
```
This ensures that the `_commit_timestamp` field gets populated during ingestion.
Here’s the corrected version of your code:
```python
df = (
spark.readStream.format("cloudfiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.schemaLocation", checkpoint_path)
.option("multiLine", "true")
.option("cloudFiles.backfillInterval", "10 minutes")
.option("cloudFiles.inferColumnTypes", "true")
.option("cloudFiles.addColumns", "commitTime") # Required to get _commit_timestamp
.load(ingestDirectory)
)
```
Once this is set, `_commit_timestamp` should be populated properly, and the archival/move feature should start working as expected.
3 weeks ago
hi Yogesh,
I am getting below error if I am trying to add this in my autoloader configuration i.e. option ("cloudFiles.addColumns", "commitTime").
3 weeks ago
3 weeks ago
3 weeks ago
@shrutikatyal I believe the commit_time only functions when the cloudFiles.cleanSource option is enabled. I don't see this option present in your snippet. Could you please enable this option for the read and check?
Refer to the below documentation, which specifies that column commit_time is supported in Databricks Runtime 16.4 and above when cloudFiles.cleanSource is enabled
https://docs.databricks.com/aws/en/sql/language-manual/functions/cloud_files_state
A file might be processed but marked as committed arbitrarily later. commit_time is updated usually at the start of the next microbatch.
3 weeks ago
Thanks, its working now
2 weeks ago
hi,
I am interested in data bricks certified associate data Engineer certification can I get any voucher?
2 weeks ago
Hey @shrutikatyal
I believe the only current route to get a discount voucher would be the following:
https://community.databricks.com/t5/events/dais-2025-virtual-learning-festival-11-june-02-july-2025/...
I think it’s the last day of the event so you might need to be quick!
hope this helps,
TheOC
yesterday
hi,
I have done the certification using data bricks learning festival however I haven't got any voucher yet.
Thanks
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now