cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamic Partition Overwrite for Delta Tables

SamCallister
New Contributor II

Spark supports dynamic partition overwrite for parquet tables by setting the config:

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")

before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with

replaceWhere
https://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframes

df.write
  .format("delta")
  .mode("overwrite")
  .option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'")
  .save("/delta/events")

Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?

8 REPLIES 8

Anonymous
Not applicable

Is there an update on this? I've also noted that the configuration spark.sql.sources.partitionOverwriteMode does not affect delta tables. For what I can understand, to overwrite partitions dynamically we are stuck with spark.databricks.optimizer.dynamicPartitionPruning but only when used as join key.

It would be usefull to use .option("partitionOverwriteMode", "dynamic") for INSERT OVERWRITE statement

I am still waiting for an update on that...

Rajeswari
New Contributor II

I am facing the same issue. Is there an update/suggested solution on how we could overwrite the Delta file partitions dynamically?

OndrejHavlicek
New Contributor III

There seems to be some reluctance to implement this:

https://github.com/delta-io/delta/issues/348

But there is an open PR:

https://github.com/delta-io/delta/pull/371

MichaelDavidson
New Contributor II

I think good will be for you to create a folder/partition for every FILE_DATE, because you may not need rewrite all files, only a specific file_date will have to be overwritten.

thetermpapers.org

I don't see how this answers the question.. Please do not spam in this forum with question-irrelevant links..

MikeLivshutz
New Contributor III

Dynamic Partition Overwrite was implemented in Databricks 11.1

https://docs.databricks.com/delta/selective-overwrite.html#

alijen
New Contributor II

@SamCallister wrote:

 

Spark supports dynamic partition overwrite for parquet tables by setting the config:

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")

before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with

replaceWhere

thttps://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframescolor blind test

df.write
  .format("delta")
  .mode("overwrite")
  .option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'")
  .save("/delta/events")

Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?


I'm also messing around, it's hard to find tutorials. Help me

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.