Regarding the limitations of declarative pipelinesโspecifically the inability to use replaceWhereโI discovered through testing that materialized views actually support dynamic overwrites. This handles several scenarios where replaceWhere would typically be used. While MERGE is an option, there are cases where a primary key doesn't exist and you need to overwrite data based on a specific column, such as a date. Surprisingly, dynamic overwrite works perfectly in materialized views for this purpose.
where is my code example:
from pyspark import pipelines as dp
from pyspark.sql import functions as F
from datetime import date, timedelta
@dp.table(
name="TESTE",
partition_cols=["Data"],
spark_conf={
"spark.sql.sources.partitionOverwriteMode": "dynamic"
}
)
def teste():
data_hoje = date.today()
data_ontem = data_hoje - timedelta(days=1)
dados_mock = [
(data_hoje, "Linha A", 100.0, "D"),
(data_hoje, "Linha B", 150.0, "N"),
(data_hoje, "Linha A", 200.0, "D"),
(data_hoje, "Linha B", 250.0, "N")
]
# Criando o DataFrame
df = spark.createDataFrame(dados_mock, ["Data", "Linha", "QuantidadeProduzida", "Turno"])
# 2. Garantir que a coluna Data รฉ do tipo Date
return df.withColumn("Data", F.col("Data").cast("date"))
In the first run, the MV saved four records. Afterward, I modified dados_mock to include only four records for data_hoje. As a result, the MV ended up with the expected six records: it replaced two existing records and added two new ones. I also tested adding partitions that didn't exist yet, and they were inserted correctly. My question is: How unusual is it to use this approach in a Materialized View? I realize a full refresh would wipe this data, but the same applies to streaming tables in declarative pipelines. Has anyone else used this dynamic overwrite approach in declarative pipeline MVs?