Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-17-2025 04:43 AM
I am using Auto Loader to ingest JSON files into a managed table. Auto Loader saves only the first-level fields as new columns, while nested structs are stored as values within those columns.
My goal is to support schema evolution when loading new files. However, Auto Loader only detects changes at the top-level columns. What are possible solutions to track and handle schema evolution for nested JSON structures?
Here's the code that I'm using:
df = (
spark.readStream
.format("cloudFiles")
.option("trigger","true")
.option("multiLine", "false")
.option("cloudFiles.format", "json")
.option("cloudFiles.inferColumnTypes", "true")
.option("recursiveFileLookup", "true")
.option("cloudFiles.schemaEvolutionMode", "addNewColumns")
.option("readerCaseSensitive","false")
.option('cloudFiles.schemaLocation', checkpoint_path)
.load(source_path)
)
(
df.writeStream
.format("delta")
.option("mergeSchema", "true")
.option("checkpointLocation", checkpoint_path)
.outputMode("append")
.trigger(availableNow=True)
.table(target_table)
)