Databricks Community

Thanapat_S · ‎06-07-2023

In my ETL case, I want to be able to adjust the table schema as needed, meaning the number of columns may increase or decrease depending on the ETL script. Additionally, I would like to use dynamic partition overwrite to avoid potential errors when using the replacewhere option.

But based on the information provided in the document "Selectively overwrite data with Delta Lake | Databricks on AWS" it seems that this functionality is not yet supported. Is there a solution for this?

I appreciate your support. 🙏

For your information:

Databricks Runtime: 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)

-werners- · ‎06-08-2023

No and I don't see how. With Dynamic Partition Overwrite, existing logical partitions for which the write does not contain data remain unchanged.

This assumes an identical schema for all partitions, which is not guaranteed with overwriteSchema.

View solution in original post

-werners- · ‎06-08-2023

No and I don't see how. With Dynamic Partition Overwrite, existing logical partitions for which the write does not contain data remain unchanged.

This assumes an identical schema for all partitions, which is not guaranteed with overwriteSchema.

Vartika · ‎06-09-2023

Hi @Thanapat Sontayasara,

Does @Werner Stinckens's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? If not, would you be happy to give us more information?

Thanks!