Auto Loader: Empty fields (discovery_time, commit_time, archive_time) in cloud_files_state
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2022 01:28 AM
Hi!
Why are the fields discovery_time, commit_time, and archive_time NULL in cloud_files_state?
Do I need to configure anything when creating my Auto Loader?
df = spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "json") \
.option("cloudFiles.tenantId", tenantId) \
.option("cloudFiles.clientId", clientId) \
.option("cloudFiles.clientSecret", clientSecret) \
.option("cloudFiles.resourceGroup", resourceGroup) \
.option("cloudFiles.subscriptionId", subscriptionId) \
.option("cloudFiles.useNotifications", "true") \
.option("cloudFiles.includeExistingFiles", "true") \
.option("cloudFiles.schemaLocation", checkpoint_path) \
.option("cloudFiles.schemaEvolutionMode", "rescue") \
.option("recursiveFileLookup", "true") \
.option("badRecordsPath", bad_records_path)
.option("multiLine", "true")
.schema(dfSchema.schema) \
.load(sourceDir)
#Transforming dataframe stream...
df6.writeStream \
.format("delta") \
.foreachBatch(upsertToDelta) \
.option("checkpointLocation", checkpoint_path) \
.outputMode("update") \
.start(targetDir) #target folder
Labels:
- Labels:
-
Autoloader
-
Cloud_files_state
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-02-2023 08:03 AM
- Please be sure that the DBR version is 10.5 or higher
- commit_time and archive_time can be null but discovery_time is set even as NOT NULL in the table definition so it is a bit strange. Please change the DBR version first.
My blog: https://databrickster.medium.com/
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-02-2023 08:09 AM