We have the following code which we used to load data to BigQuery table after reading the parquet files from Azure Data Lake Storage:
df.write.format("bigquery").option( "parentProject", gcp_project_id ).option("table", f"{bq_table_name}").option( "temporaryGcsBucket", f"{temp_gcs_bucket}" ).option( "spark.sql.sources.partitionOverwriteMode", "DYNAMIC" ).option( "writeMethod", "indirect" ).mode( "overwrite" ).save()
This code was working, but for last week onwards we are getting the following exception:
bigquery.storageapi.shaded.com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Query error: Value has type STRUCT which cannot be inserted into column source, which has type STRING at [7:2393]
Even though the exception clearly mention that the incoming data has STRUCT data type instead of STRING data type. But we cross checked the data which was not at all in STRUCT.
However, when we changed the mode to 'append' from 'overwrite'; the same data got loaded successfully.
To re-confirm the incoming data type, we loaded the specific 'source' column data to a temporary table which is being created automatically by providing the dataset name in 'append' mode. On newly created table, the same 'source' column data loaded with 'STRING' data type only.
So, the issue is happening with 'overwrite' method only.
We are using DBR 16.1 and spark-3.5-bigauery-0.41.0.jar.
Please suggest how to overcome this issue with 'overwrite' method.