3 weeks ago - last edited 3 weeks ago
I ran into an unexpected error while writing a DataFrame to a managed table
df.write \
.mode("append") \
.option("overwriteSchema", "true") \
.option("mergeSchema", "true") \
.saveAsTable(bsr_sink_table_fqn)
Instead of appending, Spark attempted to create the table again and failed with.
```[TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `customer__castan__production`.`insights`.`used_bsr_insights`
because it already exists.
Choose a different name, drop the existing object, add the IF NOT EXISTS clause, ...
SQLSTATE: 42P07```
The table was actually created but empty.
Why this is confusing:
The .mode("append") call should normally append to the table if it exists, not trigger a CREATE TABLE.
The presence of .option("overwriteSchema", "true") and .option("mergeSchema", "true") may be causing Databricks to attempt a DDL operation on the existing table, which conflicts with append mode.
Questions for the community:
Is this expected behavior when combining append with overwriteSchema/mergeSchema?
Could concurrent writers to the same target table trigger this error even in append mode?
Is there a recommended way to safely append with schema evolution without hitting this TABLE_OR_VIEW_ALREADY_EXISTS error?
3 weeks ago - last edited 3 weeks ago
Hi @Mikkel ,
I guess this is expected behaviour. Check following excerpt from Jacek Laskowski's Delta Lake internals book.
It seems that setting overwriteSchema to true will wipe out the old schema and let you create a completely new table. So it doesn't make sense to use this option together with append - it should be used with overwrite mode.
And yes, if multiple concurrent writers will try to write to the same target table then I guess you will encounter this error because each of them will basically try to recreate the table.
3 weeks ago
Hi @Mikkel,
I was testing this myself before posting and ran into exactly what @szymon_dybczak described. overwriteSchema basically tries to recreate the table even when you’re using append. If all you need is schema evolution, stick with mergeSchema — it only supports additive changes like adding new columns. For anything else (renames, type changes, etc.), you’ll need to run an ALTER TABLE first and then append.
Tks for sharing @szymon_dybczak
3 weeks ago - last edited 3 weeks ago
Hi @Mikkel ,
I guess this is expected behaviour. Check following excerpt from Jacek Laskowski's Delta Lake internals book.
It seems that setting overwriteSchema to true will wipe out the old schema and let you create a completely new table. So it doesn't make sense to use this option together with append - it should be used with overwrite mode.
And yes, if multiple concurrent writers will try to write to the same target table then I guess you will encounter this error because each of them will basically try to recreate the table.
3 weeks ago
Hi @Mikkel,
I was testing this myself before posting and ran into exactly what @szymon_dybczak described. overwriteSchema basically tries to recreate the table even when you’re using append. If all you need is schema evolution, stick with mergeSchema — it only supports additive changes like adding new columns. For anything else (renames, type changes, etc.), you’ll need to run an ALTER TABLE first and then append.
Tks for sharing @szymon_dybczak
3 weeks ago
Thanks @szymon_dybczak and @WiliamRosa for the reply. I will probably keep mergeSchema then. If the options are incompatible or might cause issues I wish there was a warning of sorts. 🙂
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now