cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Auto Loader: Empty fields (discovery_time, commit_time, archive_time) in cloud_files_state

Benji0934
New Contributor II

Hi!

Why are the fields discovery_time, commit_time, and archive_time NULL in cloud_files_state?

Do I need to configure anything when creating my Auto Loader?

df = spark.readStream.format("cloudFiles") \
    .option("cloudFiles.format", "json") \
    .option("cloudFiles.tenantId", tenantId) \
    .option("cloudFiles.clientId", clientId) \
    .option("cloudFiles.clientSecret", clientSecret) \
    .option("cloudFiles.resourceGroup", resourceGroup) \
    .option("cloudFiles.subscriptionId", subscriptionId) \
    .option("cloudFiles.useNotifications", "true") \
    .option("cloudFiles.includeExistingFiles", "true") \
    .option("cloudFiles.schemaLocation", checkpoint_path) \
    .option("cloudFiles.schemaEvolutionMode", "rescue") \
    .option("recursiveFileLookup", "true") \
    .option("badRecordsPath", bad_records_path)
    .option("multiLine", "true")
    .schema(dfSchema.schema) \
    .load(sourceDir)
 
#Transforming dataframe stream...
 
df6.writeStream \
    .format("delta") \
    .foreachBatch(upsertToDelta) \
    .option("checkpointLocation", checkpoint_path) \
    .outputMode("update") \
    .start(targetDir) #target folder

2 REPLIES 2

Hubert-Dudek
Esteemed Contributor III
  • Please be sure that the DBR version is 10.5 or higher
  • commit_time and archive_time can be null but discovery_time is set even as NOT NULL in the table definition so it is a bit strange. Please change the DBR version first.

Hi Hubert,

Thank you for your reply 🙂

The DBR version is 11.3. And yes it is indeed very strange.

image

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group