- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 11:00 PM
We have a date (DD/MM/YYYY) partitioned BQ table. We want to update a specific partition data in 'overwrite' mode using PySpark. So to do this, I applied 'spark.sql.sources.partitionOverwriteMode' to 'DYNAMIC' as per the spark bq connector documentation. But still it deleted the other partitioned data which should not be happening.
df_with_partition.write.format("bigquery") \
.option("table", f"{bq_table_full}") \
.option("partitionField", f"{partition_date}") \
.option("partitionType", f"{bq_partition_type}") \
.option("temporaryGcsBucket", f"{temp_gcs_bucket}") \
.option("spark.sql.sources.partitionOverwriteMode", "DYNAMIC") \
.option("writeMethod", "indirect") \
.mode("overwrite") \
.save()
Can anyone please suggest me what I am doing wrong or how to implement this dynamic partitionOverwriteMode. Many thanks.
#pyspark #overwrite #partition #dynamic #bigquery
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 04:51 AM
Just checking if there are any further questions, and did my last comment help?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2025 10:18 PM
Issue got resolved with DRB 16.1. Many thanks to Support Team.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-05-2025 04:06 PM - edited 05-05-2025 04:08 PM
I'm using DBR 16.3 and all partitions are still being deleted. This is the code I'm using. No success.
spark = (
SparkSession.builder.config("spark.datasource.bigquery.intermediateFormat", "orc")
.config("spark.sql.sources.partitionOverwriteMode", "dynamic")
.getOrCreate()
)
visiting_client_day = (
spark.read.format("delta")
.load("s3://bucket-2/gold/visiting_client_day")
.where(col("date_utc") == lit("2025-05-04"))
)
(
visiting_client_day.write.format("bigquery")
.option("parentProject", "parentProject")
.option("project", "project")
.option("temporaryGcsBucket", "bucket")
.mode("overwrite")
.option("table", FINAL_TABLE)
.save()
)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-05-2025 10:11 PM
Hi @ambar2595 ,
Please could you try adding 'writeMedthod' option with 'indirect' value.
option("writeMethod", "indirect")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-06-2025 01:46 AM - edited 05-06-2025 01:51 AM
According to the documentation, this is the default value. https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/master/README.md
and I just tried it and it didn't work. 😞
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-06-2025 01:51 AM
Yes, agreed. Give it a try for once. If not worked, then this issue introduced with DBR 16.3. Earlier DBR 15.4 LTS had the issue which got fixed in DBR 16.1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-06-2025 08:37 AM
It didn't worked with DBR 16.1 as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-07-2025 05:59 AM
I am still using DBR 16.1 and the partitionOverwriteMode with 'DYNAMIC' value is working for me. Today itself I rechecked the workflow.


- « Previous
-
- 1
- 2
- Next »