cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC

soumiknow
Contributor II

We have a date (DD/MM/YYYY) partitioned BQ table. We want to update a specific partition data in 'overwrite' mode using PySpark. So to do this, I applied 'spark.sql.sources.partitionOverwriteMode' to 'DYNAMIC' as per the spark bq connector documentation. But still it deleted the other partitioned data which should not be happening.

 

 

df_with_partition.write.format("bigquery") \
                .option("table", f"{bq_table_full}") \
                .option("partitionField", f"{partition_date}") \
                .option("partitionType", f"{bq_partition_type}") \
                .option("temporaryGcsBucket", f"{temp_gcs_bucket}") \
                .option("spark.sql.sources.partitionOverwriteMode", "DYNAMIC") \
                .option("writeMethod", "indirect") \
                .mode("overwrite") \
                .save()

 

 

Can anyone please suggest me what I am doing wrong or how to implement this dynamic partitionOverwriteMode. Many thanks.
#pyspark #overwrite #partition #dynamic #bigquery

22 REPLIES 22

VZLA
Databricks Employee
Databricks Employee

@soumiknow ,

Just checking if there are any further questions, and did my last comment help?

Issue got resolved with DRB 16.1. Many thanks to Support Team.

ambar2595
New Contributor III

I'm using DBR 16.3 and all partitions are still being deleted. This is the code I'm using. No success. 

 

spark = (
        SparkSession.builder.config("spark.datasource.bigquery.intermediateFormat", "orc")
        .config("spark.sql.sources.partitionOverwriteMode", "dynamic")
        .getOrCreate()
    )
visiting_client_day = (
        spark.read.format("delta")
        .load("s3://bucket-2/gold/visiting_client_day")
        .where(col("date_utc") == lit("2025-05-04"))
    )
(
        visiting_client_day.write.format("bigquery")
        .option("parentProject", "parentProject")
        .option("project", "project")
        .option("temporaryGcsBucket", "bucket")
        .mode("overwrite")
        .option("table", FINAL_TABLE)
        .save()
    )

 

 

Hi @ambar2595 ,

Please could you try adding 'writeMedthod' option with 'indirect' value.

option("writeMethod", "indirect")

 

ambar2595
New Contributor III

According to the documentation, this is the default value. https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/master/README.md

ambar2595_0-1746521270975.png

and I just tried it and it didn't work. ๐Ÿ˜ž

 

Yes, agreed. Give it a try for once. If not worked, then this issue introduced with DBR 16.3. Earlier DBR 15.4 LTS had the issue which got fixed in DBR 16.1.

ambar2595
New Contributor III

It didn't worked with DBR 16.1 as well. 

I am still using DBR 16.1 and the partitionOverwriteMode with 'DYNAMIC' value is working for me. Today itself I rechecked the workflow.