Databricks Community

Benji0934 · ‎12-30-2022

Hi!

Why are the fields discovery_time, commit_time, and archive_time NULL in cloud_files_state?

Do I need to configure anything when creating my Auto Loader?

df = spark.readStream.format("cloudFiles") \
    .option("cloudFiles.format", "json") \
    .option("cloudFiles.tenantId", tenantId) \
    .option("cloudFiles.clientId", clientId) \
    .option("cloudFiles.clientSecret", clientSecret) \
    .option("cloudFiles.resourceGroup", resourceGroup) \
    .option("cloudFiles.subscriptionId", subscriptionId) \
    .option("cloudFiles.useNotifications", "true") \
    .option("cloudFiles.includeExistingFiles", "true") \
    .option("cloudFiles.schemaLocation", checkpoint_path) \
    .option("cloudFiles.schemaEvolutionMode", "rescue") \
    .option("recursiveFileLookup", "true") \
    .option("badRecordsPath", bad_records_path)
    .option("multiLine", "true")
    .schema(dfSchema.schema) \
    .load(sourceDir)
 
#Transforming dataframe stream...
 
df6.writeStream \
    .format("delta") \
    .foreachBatch(upsertToDelta) \
    .option("checkpointLocation", checkpoint_path) \
    .outputMode("update") \
    .start(targetDir) #target folder

Hubert-Dudek · ‎01-02-2023

Please be sure that the DBR version is 10.5 or higher
commit_time and archive_time can be null but discovery_time is set even as NOT NULL in the table definition so it is a bit strange. Please change the DBR version first.