cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Dynamic Partition Overwrite for Delta Tables

SamCallister
New Contributor II

Spark supports dynamic partition overwrite for parquet tables by setting the config:

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")

before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with

replaceWhere
https://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframes

df.write
  .format("delta")
  .mode("overwrite")
  .option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'")
  .save("/delta/events")

Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?

8 REPLIES 8

Anonymous
Not applicable

Is there an update on this? I've also noted that the configuration spark.sql.sources.partitionOverwriteMode does not affect delta tables. For what I can understand, to overwrite partitions dynamically we are stuck with spark.databricks.optimizer.dynamicPartitionPruning but only when used as join key.

It would be usefull to use .option("partitionOverwriteMode", "dynamic") for INSERT OVERWRITE statement

I am still waiting for an update on that...

Rajeswari
New Contributor II

I am facing the same issue. Is there an update/suggested solution on how we could overwrite the Delta file partitions dynamically?

OndrejHavlicek
New Contributor III

There seems to be some reluctance to implement this:

https://github.com/delta-io/delta/issues/348

But there is an open PR:

https://github.com/delta-io/delta/pull/371

MichaelDavidson
New Contributor II

I think good will be for you to create a folder/partition for every FILE_DATE, because you may not need rewrite all files, only a specific file_date will have to be overwritten.

thetermpapers.org

I don't see how this answers the question.. Please do not spam in this forum with question-irrelevant links..

MikeLivshutz
New Contributor III

Dynamic Partition Overwrite was implemented in Databricks 11.1

https://docs.databricks.com/delta/selective-overwrite.html#

alijen
New Contributor II

@SamCallister wrote:

 

Spark supports dynamic partition overwrite for parquet tables by setting the config:

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")

before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with

replaceWhere

thttps://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframescolor blind test

df.write
  .format("delta")
  .mode("overwrite")
  .option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'")
  .save("/delta/events")

Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?


I'm also messing around, it's hard to find tutorials. Help me

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group