11-22-2019 01:06 PM
Spark supports dynamic partition overwrite for parquet tables by setting the config:
spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with
replaceWhere
https://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframes
df.write
.format("delta")
.mode("overwrite")
.option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'")
.save("/delta/events")
Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?
12-12-2019 07:09 AM
Is there an update on this? I've also noted that the configuration spark.sql.sources.partitionOverwriteMode does not affect delta tables. For what I can understand, to overwrite partitions dynamically we are stuck with spark.databricks.optimizer.dynamicPartitionPruning but only when used as join key.
It would be usefull to use .option("partitionOverwriteMode", "dynamic") for INSERT OVERWRITE statement
05-04-2020 02:06 AM
I am still waiting for an update on that...
06-08-2020 08:57 PM
I am facing the same issue. Is there an update/suggested solution on how we could overwrite the Delta file partitions dynamically?
08-21-2020 02:15 AM
There seems to be some reluctance to implement this:
But there is an open PR:12-16-2020 12:57 AM
I think good will be for you to create a folder/partition for every FILE_DATE, because you may not need rewrite all files, only a specific file_date will have to be overwritten.
thetermpapers.org
12-16-2020 02:16 AM
I don't see how this answers the question.. Please do not spam in this forum with question-irrelevant links..
11-13-2022 05:38 PM
Dynamic Partition Overwrite was implemented in Databricks 11.1
08-16-2023 02:15 AM
@SamCallister wrote:
Spark supports dynamic partition overwrite for parquet tables by setting the config:
spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with
replaceWherethttps://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframescolor blind test
df.write .format("delta") .mode("overwrite") .option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'") .save("/delta/events")Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?
I'm also messing around, it's hard to find tutorials. Help me
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group