cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Auto Loader: Empty fields (discovery_time, commit_time, archive_time) in cloud_files_state

Benji0934
New Contributor II

Hi!

Why are the fields discovery_time, commit_time, and archive_time NULL in cloud_files_state?

Do I need to configure anything when creating my Auto Loader?

df = spark.readStream.format("cloudFiles") \
    .option("cloudFiles.format", "json") \
    .option("cloudFiles.tenantId", tenantId) \
    .option("cloudFiles.clientId", clientId) \
    .option("cloudFiles.clientSecret", clientSecret) \
    .option("cloudFiles.resourceGroup", resourceGroup) \
    .option("cloudFiles.subscriptionId", subscriptionId) \
    .option("cloudFiles.useNotifications", "true") \
    .option("cloudFiles.includeExistingFiles", "true") \
    .option("cloudFiles.schemaLocation", checkpoint_path) \
    .option("cloudFiles.schemaEvolutionMode", "rescue") \
    .option("recursiveFileLookup", "true") \
    .option("badRecordsPath", bad_records_path)
    .option("multiLine", "true")
    .schema(dfSchema.schema) \
    .load(sourceDir)
 
#Transforming dataframe stream...
 
df6.writeStream \
    .format("delta") \
    .foreachBatch(upsertToDelta) \
    .option("checkpointLocation", checkpoint_path) \
    .outputMode("update") \
    .start(targetDir) #target folder

2 REPLIES 2

Hubert-Dudek
Databricks MVP
  • Please be sure that the DBR version is 10.5 or higher
  • commit_time and archive_time can be null but discovery_time is set even as NOT NULL in the table definition so it is a bit strange. Please change the DBR version first.

My blog: https://databrickster.medium.com/

Hi Hubert,

Thank you for your reply ๐Ÿ™‚

The DBR version is 11.3. And yes it is indeed very strange.

image