cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Declarative Pipelines - Dynamic Overwrite

ismaelhenzel
Contributor III

Regarding the limitations of declarative pipelinesโ€”specifically the inability to use replaceWhereโ€”I discovered through testing that materialized views actually support dynamic overwrites. This handles several scenarios where replaceWhere would typically be used. While MERGE is an option, there are cases where a primary key doesn't exist and you need to overwrite data based on a specific column, such as a date. Surprisingly, dynamic overwrite works perfectly in materialized views for this purpose.

where is my code example: 

from pyspark import pipelines as dp
from pyspark.sql import functions as F
from datetime import date, timedelta

@dp.table(
    name="TESTE",
    partition_cols=["Data"],
    spark_conf={
        "spark.sql.sources.partitionOverwriteMode": "dynamic"
    }
)
def teste():
    data_hoje = date.today()
    data_ontem = data_hoje - timedelta(days=1)
    
    dados_mock = [
        (data_hoje, "Linha A", 100.0, "D"),
        (data_hoje, "Linha B", 150.0, "N"),
        (data_hoje, "Linha A", 200.0, "D"),
        (data_hoje, "Linha B", 250.0, "N")
    ]
    
    # Criando o DataFrame
    df = spark.createDataFrame(dados_mock, ["Data", "Linha", "QuantidadeProduzida", "Turno"])
    
    # 2. Garantir que a coluna Data รฉ do tipo Date
    return df.withColumn("Data", F.col("Data").cast("date"))

In the first run, the MV saved four records. Afterward, I modified dados_mock to include only four records for data_hoje. As a result, the MV ended up with the expected six records: it replaced two existing records and added two new ones. I also tested adding partitions that didn't exist yet, and they were inserted correctly. My question is: How unusual is it to use this approach in a Materialized View? I realize a full refresh would wipe this data, but the same applies to streaming tables in declarative pipelines. Has anyone else used this dynamic overwrite approach in declarative pipeline MVs?

 

1 REPLY 1

omsingh
New Contributor III

This is a really interesting find, and honestly not something most people expect from materialized views.

Under the hood, MVs in Databricks declarative pipelines are still Delta tables. So when you set partitionOverwriteMode=dynamic and partition by a column like a date, Delta just does what itโ€™s supposed to do. It overwrites only the partitions present in the write, keeps the rest, and inserts new ones if they donโ€™t exist.

Thatโ€™s why your results make sense:

-Rows for the same date get replaced
-New rows show up correctly
-Other dates are untouched

Most teams donโ€™t use MVs this way because people usually think of them as โ€œderived and recomputed,โ€ not something that behaves like a controlled overwrite table. Since replaceWhere isnโ€™t allowed, folks normally jump straight to MERGE or streaming tables.

But in cases like yours, where thereโ€™s no real primary key and a clear overwrite boundary like a date, this feels like a very reasonable solution. Itโ€™s simpler than MERGE and still gives you idempotent behavior.

The big thing to be aware of, like you already mentioned, is full refresh. That would blow everything away and rebuild it. But thatโ€™s not really unique to MVs; streaming tables in declarative pipelines have the same risk.

So yeah, itโ€™s not a common pattern, but itโ€™s not wrong either. Iโ€™ve seen similar approaches used in batch-style DLT pipelines when people really understand how Delta behaves. As long as itโ€™s documented and intentional, this seems like a valid approach.

Om Singh