cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Declarative Pipelines - Dynamic Overwrite

ismaelhenzel
Contributor III

Regarding the limitations of declarative pipelinesโ€”specifically the inability to use replaceWhereโ€”I discovered through testing that materialized views actually support dynamic overwrites. This handles several scenarios where replaceWhere would typically be used. While MERGE is an option, there are cases where a primary key doesn't exist and you need to overwrite data based on a specific column, such as a date. Surprisingly, dynamic overwrite works perfectly in materialized views for this purpose.

where is my code example: 

from pyspark import pipelines as dp
from pyspark.sql import functions as F
from datetime import date, timedelta

@dp.table(
    name="TESTE",
    partition_cols=["Data"],
    spark_conf={
        "spark.sql.sources.partitionOverwriteMode": "dynamic"
    }
)
def teste():
    data_hoje = date.today()
    data_ontem = data_hoje - timedelta(days=1)
    
    dados_mock = [
        (data_hoje, "Linha A", 100.0, "D"),
        (data_hoje, "Linha B", 150.0, "N"),
        (data_hoje, "Linha A", 200.0, "D"),
        (data_hoje, "Linha B", 250.0, "N")
    ]
    
    # Criando o DataFrame
    df = spark.createDataFrame(dados_mock, ["Data", "Linha", "QuantidadeProduzida", "Turno"])
    
    # 2. Garantir que a coluna Data รฉ do tipo Date
    return df.withColumn("Data", F.col("Data").cast("date"))

In the first run, the MV saved four records. Afterward, I modified dados_mock to include only four records for data_hoje. As a result, the MV ended up with the expected six records: it replaced two existing records and added two new ones. I also tested adding partitions that didn't exist yet, and they were inserted correctly. My question is: How unusual is it to use this approach in a Materialized View? I realize a full refresh would wipe this data, but the same applies to streaming tables in declarative pipelines. Has anyone else used this dynamic overwrite approach in declarative pipeline MVs?

 

0 REPLIES 0