With autoloader
.option("cloudFiles.schemaEvolutionMode", "addNewColumns")
I have done retry after getting
org.apache.spark.sql.catalyst.util.UnknownFieldException: [UNKNOWN_FIELD_EXCEPTION.NEW_FIELDS_IN_FILE]
Encountered unknown fields during parsing:
[test 1_2 Prime, test 1_2 Redundant, test 1_4 Prime, test 1_4 Redundant], which can be fixed by an automatic retry: true
The data is successfully written to the target delta table, new columns are added. However, the target delta table has an extra column:
timestamptest_1_1_primetest_1_1_redundanttest_1_2_primetest_1_2_redundanttest_1_3_primetest_1_3_redundanttest_1_4_primetest_1_4_redundant:string
Why the extra column is added? How to avoid it.
Note that before calling df.writeStream(), the code has used df.toDF() to rename the columns.
In summary, the code has: readStream, rename column, writeStream.