11-22-2019 01:06 PM
Spark supports dynamic partition overwrite for parquet tables by setting the config:
spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with
replaceWhere
https://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframes
df.write
.format("delta")
.mode("overwrite")
.option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'")
.save("/delta/events")
Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?
12-12-2019 07:09 AM
Is there an update on this? I've also noted that the configuration spark.sql.sources.partitionOverwriteMode does not affect delta tables. For what I can understand, to overwrite partitions dynamically we are stuck with spark.databricks.optimizer.dynamicPartitionPruning but only when used as join key.
It would be usefull to use .option("partitionOverwriteMode", "dynamic") for INSERT OVERWRITE statement
05-04-2020 02:06 AM
I am still waiting for an update on that...
06-08-2020 08:57 PM
I am facing the same issue. Is there an update/suggested solution on how we could overwrite the Delta file partitions dynamically?
08-21-2020 02:15 AM
There seems to be some reluctance to implement this:
But there is an open PR:12-16-2020 12:57 AM
I think good will be for you to create a folder/partition for every FILE_DATE, because you may not need rewrite all files, only a specific file_date will have to be overwritten.
thetermpapers.org
12-16-2020 02:16 AM
I don't see how this answers the question.. Please do not spam in this forum with question-irrelevant links..
11-13-2022 05:38 PM
Dynamic Partition Overwrite was implemented in Databricks 11.1
08-16-2023 02:15 AM
@SamCallister wrote:
Spark supports dynamic partition overwrite for parquet tables by setting the config:
spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with
replaceWherethttps://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframescolor blind test
df.write .format("delta") .mode("overwrite") .option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'") .save("/delta/events")Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?
I'm also messing around, it's hard to find tutorials. Help me
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.