Schema evolution for JSON files with AutoLoader

yit · ‎07-17-2025

I am using Auto Loader to ingest JSON files into a managed table. Auto Loader saves only the first-level fields as new columns, while nested structs are stored as values within those columns.

My goal is to support schema evolution when loading new files. However, Auto Loader only detects changes at the top-level columns. What are possible solutions to track and handle schema evolution for nested JSON structures?

Here's the code that I'm using:

df = (
spark.readStream
.format("cloudFiles") 
.option("trigger","true")
.option("multiLine", "false")
.option("cloudFiles.format", "json") 
.option("cloudFiles.inferColumnTypes", "true") 
.option("recursiveFileLookup", "true")
.option("cloudFiles.schemaEvolutionMode", "addNewColumns")
.option("readerCaseSensitive","false")
.option('cloudFiles.schemaLocation', checkpoint_path)
.load(source_path)
)
(
df.writeStream
.format("delta")
.option("mergeSchema", "true") 
.option("checkpointLocation", checkpoint_path)
.outputMode("append")
.trigger(availableNow=True) 
.table(target_table) 
)