Dynamic Partition Overwrite for Delta Tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-22-2019 01:06 PM
Spark supports dynamic partition overwrite for parquet tables by setting the config:
spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with
replaceWhere
https://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframes
df.write
.format("delta")
.mode("overwrite")
.option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'")
.save("/delta/events")
Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?
- Labels:
-
Delta table
-
Overwrite
-
Partitions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-12-2019 07:09 AM
Is there an update on this? I've also noted that the configuration spark.sql.sources.partitionOverwriteMode does not affect delta tables. For what I can understand, to overwrite partitions dynamically we are stuck with spark.databricks.optimizer.dynamicPartitionPruning but only when used as join key.
It would be usefull to use .option("partitionOverwriteMode", "dynamic") for INSERT OVERWRITE statement
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-04-2020 02:06 AM
I am still waiting for an update on that...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2020 08:57 PM
I am facing the same issue. Is there an update/suggested solution on how we could overwrite the Delta file partitions dynamically?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-21-2020 02:15 AM
There seems to be some reluctance to implement this:
But there is an open PR:- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2020 12:57 AM
I think good will be for you to create a folder/partition for every FILE_DATE, because you may not need rewrite all files, only a specific file_date will have to be overwritten.
thetermpapers.org
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2020 02:16 AM
I don't see how this answers the question.. Please do not spam in this forum with question-irrelevant links..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2022 05:38 PM
Dynamic Partition Overwrite was implemented in Databricks 11.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-16-2023 02:15 AM
@SamCallister wrote:
Spark supports dynamic partition overwrite for parquet tables by setting the config:
spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")before writing to a partitioned table. With delta tables is appears you need to manually specify which partitions you are overwriting with
replaceWherethttps://docs.databricks.com/delta/delta-batch.html#overwrite-using-dataframescolor blind test
df.write .format("delta") .mode("overwrite") .option("replaceWhere", "date >= '2017-01-01' AND date <= '2017-01-31'") .save("/delta/events")Is there anyway to overwrite partitions in a delta table without having to manually specify which partitions should be overwritten?
I'm also messing around, it's hard to find tutorials. Help me