<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/103686#M41549</link>
    <description>&lt;P&gt;You can add to both the driver and executors extraJavaOptions the -verbose:class option, and then check the Spark Logs, example:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;spark.driver.extraJavaOptions -verbose:class&lt;/LI&gt;
&lt;LI&gt;spark.executor.extraJavaOptions -verbose:class&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Same with regard to the jar distribution and localization, this will be visible in the Spark Logs.&lt;/P&gt;</description>
    <pubDate>Tue, 31 Dec 2024 13:32:02 GMT</pubDate>
    <dc:creator>VZLA</dc:creator>
    <dc:date>2024-12-31T13:32:02Z</dc:date>
    <item>
      <title>BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101706#M40778</link>
      <description>&lt;P&gt;&lt;SPAN&gt;We have a date (DD/MM/YYYY) partitioned BQ table. We want to update a specific partition data in 'overwrite' mode using PySpark. So to do this, I applied 'spark.sql.sources.partitionOverwriteMode' to 'DYNAMIC' as per the spark bq connector&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://github.com/GoogleCloudDataproc/spark-bigquery-connector?tab=readme-ov-file" target="_blank" rel="nofollow noopener noreferrer"&gt;documentation&lt;/A&gt;&lt;SPAN&gt;. But still it deleted the other partitioned data which should not be happening.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df_with_partition.write.format("bigquery") \
                .option("table", f"{bq_table_full}") \
                .option("partitionField", f"{partition_date}") \
                .option("partitionType", f"{bq_partition_type}") \
                .option("temporaryGcsBucket", f"{temp_gcs_bucket}") \
                .option("spark.sql.sources.partitionOverwriteMode", "DYNAMIC") \
                .option("writeMethod", "indirect") \
                .mode("overwrite") \
                .save()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Can anyone please suggest me what I am doing wrong or how to implement this dynamic partitionOverwriteMode. Many thanks.&lt;BR /&gt;&lt;/SPAN&gt;#pyspark #overwrite #partition #dynamic #bigquery&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2024 07:00:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101706#M40778</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2024-12-11T07:00:20Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101745#M40803</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;It appears that the issue&lt;/SPAN&gt;&amp;nbsp;is related to the behavior of the Spark BigQuery connector and how it handles partition overwrites. Here are a few points to consider:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Ensure that the configuration setting spark.sql.sources.partitionOverwriteMode is correctly applied. This can be set at the session level using spark.conf.set("spark.sql.sources.partitionOverwriteMode", "DYNAMIC") before performing the write operation.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;If the dynamic partition overwrite mode is not working as expected, you might consider using the replaceWhere option as an alternative. This option allows you to specify a condition to selectively overwrite data. For example:&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang="markup"&gt;df_with_partition.write.format("bigquery") \
    .option("table", f"{bq_table_full}") \
    .option("partitionField", f"{partition_date}") \
    .option("partitionType", f"{bq_partition_type}") \
    .option("temporaryGcsBucket", f"{temp_gcs_bucket}") \
    .option("replaceWhere", f"{partition_date} = 'specific_date'") \
    .mode("overwrite") \
    .save()&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Dec 2024 11:36:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101745#M40803</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-11T11:36:04Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101867#M40862</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/88823"&gt;@Walter_C&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Thank you for your response over this issue.&lt;BR /&gt;&lt;BR /&gt;I tried both the options:&lt;BR /&gt;- setting up the spark configuration at session level, but did not work&lt;BR /&gt;- used the replaceWhere option for a specific partition date, but that also did not work.&lt;BR /&gt;&lt;BR /&gt;In both the cases, all bq table records are getting overwritten or deleted for all partition dates which is not acceptable.&lt;BR /&gt;&lt;BR /&gt;I checked the spark bq connector documentation page, I could not find the 'replaceWhere' option as well.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 07:55:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101867#M40862</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2024-12-12T07:55:37Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101930#M40897</link>
      <description>&lt;P&gt;Got it, can you also test the following code:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;df_with_partition.write.format("bigquery") \
    .option("table", f"{bq_table_full}") \
    .option("partitionField", f"{partition_date}") \
    .option("partitionType", f"{bq_partition_type}") \
    .option("temporaryGcsBucket", f"{temp_gcs_bucket}") \
    .option("partitionOverwriteMode", "dynamic") \
    .option("writeMethod", "indirect") \
    .mode("overwrite") \
    .save()&lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 12 Dec 2024 14:39:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101930#M40897</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-12T14:39:09Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101936#M40901</link>
      <description>&lt;P&gt;I already tried this before, which was not worked. Then I tried with 'DYNAMIC' instead of 'dynamic'. But no luck.&lt;BR /&gt;&lt;BR /&gt;One thing I found that, even though I set the '&lt;SPAN&gt;spark.sql.sources.partitionOverwriteMode&lt;/SPAN&gt;' to 'DYNAMIC' in cluster advance options or even in the notebook before writer dataframe, in all the cases, this&amp;nbsp;&lt;SPAN&gt;partitionOverwriteMode property is not being actually set up in the Spark UI SQL/Dataframe Properties whereas I can see the other properties very well which I set up during cluster creation or even in the notebook.&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;So this could be the problem with&amp;nbsp;'&lt;SPAN&gt;spark.sql.sources.partitionOverwriteMode&lt;/SPAN&gt;' property set up within the databricks cluster. Any idea how to overcome this?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 14:48:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101936#M40901</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2024-12-12T14:48:06Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101942#M40904</link>
      <description>&lt;P&gt;'&lt;SPAN&gt;spark.sql.sources.partitionOverwriteMode&lt;/SPAN&gt;' is visible under Spark UI Environment tab. So this is not the issue.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 15:15:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101942#M40904</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2024-12-12T15:15:37Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101952#M40908</link>
      <description>&lt;P&gt;I reviewed this with an Spark resource, seems that for this Indirect method will be required, you can follow information in&amp;nbsp;&lt;A href="https://github.com/GoogleCloudDataproc/spark-bigquery-connector?tab=readme-ov-file#indirect-write" target="_blank"&gt;https://github.com/GoogleCloudDataproc/spark-bigquery-connector?tab=readme-ov-file#indirect-write&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Dec 2024 15:34:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/101952#M40908</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-12T15:34:29Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/102197#M41015</link>
      <description>&lt;P data-unlink="true"&gt;I raised an &lt;A href="https://github.com/GoogleCloudDataproc/spark-bigquery-connector/issues/1325" target="_self"&gt;issue&lt;/A&gt;&amp;nbsp;to spark-biquery-connector where they mentioned to use '&lt;SPAN&gt;spark-3.5-bigauery-0.41.0.jar&lt;/SPAN&gt;' whereas I was using&amp;nbsp;&lt;SPAN&gt;Databricks Runtime Version: 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12) which include&amp;nbsp;spark-bigquery-with-dependencies_2.12-0.41.0.jar.&lt;BR /&gt;&lt;BR /&gt;So, I tried this:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark_v1 = SparkSession.builder \
    .appName("SampleSparkSession") \
    .config("spark.jars.packages", "/Workspace/Users/xxxxxxx@xxxx.xxxx/spark-3.5-bigquery-0.41.0.jar") \
    .config("spark.jars.excludes", "/databricks/jars/spark-bigquery-with-dependencies_2.12-0.41.0.jar") \
    .getOrCreate()&lt;/LI-CODE&gt;&lt;P data-unlink="true"&gt;&lt;SPAN&gt;But apply the above include and exclude the packages using spark config, the jars not even included or excluded in the actual running cluster. (by checking the Spark UI System Properties)&lt;BR /&gt;&lt;BR /&gt;So, any idea how these jars can be included or excluded to the Databricks cluster which is using&amp;nbsp;Databricks Runtime Version: 15.4 LTS.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2024 06:21:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/102197#M41015</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2024-12-16T06:21:42Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/102236#M41030</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Remove the Default JAR&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;Since &lt;CODE&gt;spark-bigquery-with-dependencies_2.12-0.41.0.jar&lt;/CODE&gt; is included by default in Databricks Runtime 15.4 LTS, and you need to exclude it. This can be done by creating an init script to remove the JAR file from the cluster.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Create an Init Script&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;Create an init script that removes the default &lt;CODE&gt;spark-bigquery-with-dependencies_2.12-0.41.0.jar&lt;/CODE&gt; from the cluster. Here is an example of what the script might look like:&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-bash _1t7bu9hb hljs language-bash gb5fhw3"&gt;#!/bin/bash
&lt;SPAN class="hljs-built_in"&gt;rm&lt;/SPAN&gt; /databricks/jars/spark-bigquery-with-dependencies_2.12-0.41.0.jar&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Upload the Init Script&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;Upload this script to a location accessible by Databricks, such as DBFS (Databricks File System).&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Configure the Cluster to Use the Init Script&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;Go to the cluster configuration page in Databricks.&lt;/LI&gt;
&lt;LI&gt;Under the "Advanced Options" section, find the "Init Scripts" tab.&lt;/LI&gt;
&lt;LI&gt;Add the path to your init script (e.g., &lt;CODE&gt;dbfs:/path/to/your/init-script.sh&lt;/CODE&gt;).&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Add the Custom JAR&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;Upload the &lt;CODE&gt;spark-3.5-bigquery-0.41.0.jar&lt;/CODE&gt; to DBFS or another accessible location.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;In the cluster configuration, go to the "Libraries" tab.&lt;/LI&gt;
&lt;LI&gt;Choose "Install New" and select "DBFS" or the appropriate option where your JAR is stored.&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Provide the path to the &lt;CODE&gt;spark-3.5-bigquery-0.41.0.jar&lt;/CODE&gt;.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Restart the Cluster&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;Restart the cluster to apply the changes. The init script will run, removing the default JAR, and the new JAR will be added to the cluster.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2024 12:03:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/102236#M41030</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2024-12-16T12:03:01Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/102958#M41283</link>
      <description>&lt;P&gt;The init_script worked to remove the unwanted jars from the cluster. This is good news.&lt;BR /&gt;&lt;BR /&gt;But whenever I tried to install the required jar on cluster configuration using library source as GCS path (&lt;SPAN&gt;gs://spark-lib/bigquery/spark-3.5-bigquery-0.41.0.jar&lt;/SPAN&gt;), the UI shows that the jar is installed. Then when I execute the notebook to run the spark job, it returned me the below error which indicate that either the spark session unable to identify the spark-bigquery-connector jar or the jar is not installed successfully.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Py4JJavaError: An error occurred while calling o574.save.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.google.cloud.spark.bigquery.BigQueryRelationProvider. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version. SQLSTATE: 42K02&lt;/LI-CODE&gt;&lt;P&gt;I tried to verify whether the jar was installed on the cluster successfully or not by checking the Spark UI classpath properties or even other properties. But could not found such. Screenshot attached below to refer that the jar was installed from GCS path.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="soumiknow_0-1734937665284.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/13655i47D6A80F572FA1E8/image-size/large?v=v2&amp;amp;px=999" role="button" title="soumiknow_0-1734937665284.png" alt="soumiknow_0-1734937665284.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Dec 2024 07:09:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/102958#M41283</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2024-12-23T07:09:07Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/103654#M41532</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/117977"&gt;@soumiknow&lt;/a&gt;&amp;nbsp;this sounds like the jar was probably installed, but not distributed to the spark cluster, do you see from the Spark logs (Driver/Executors) the jar getting distributed and localized? You may also add the classloading verbose flag to understand if there's a problem with some older version getting distributed before this new jar is placed first in the classpath.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Dec 2024 10:34:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/103654#M41532</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-31T10:34:39Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/103675#M41546</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Please guide me to find whether the jar getting distributed and localized as well as how to add the classloading verbose flag.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Dec 2024 12:59:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/103675#M41546</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2024-12-31T12:59:06Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/103686#M41549</link>
      <description>&lt;P&gt;You can add to both the driver and executors extraJavaOptions the -verbose:class option, and then check the Spark Logs, example:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;spark.driver.extraJavaOptions -verbose:class&lt;/LI&gt;
&lt;LI&gt;spark.executor.extraJavaOptions -verbose:class&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Same with regard to the jar distribution and localization, this will be visible in the Spark Logs.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Dec 2024 13:32:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/103686#M41549</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-12-31T13:32:02Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/104031#M41644</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;I added the extraJavaOptions and after that I can see the following the in the logs:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;downloading https://maven-central.storage-download.googleapis.com/maven2/com/google/cloud/spark/spark-bigquery-connector-common/0.41.0/spark-bigquery-connector-common-0.41.0.jar ...
	[SUCCESSFUL ] com.google.cloud.spark#spark-bigquery-connector-common;0.41.0!spark-bigquery-connector-common.jar (379ms)

:: modules in use:
com.google.cloud.spark#bigquery-connector-common;0.41.0 from preferred-maven-central-mirror in [default]
	com.google.cloud.spark#spark-3.5-bigquery;0.41.0 from preferred-maven-central-mirror in [default]
	com.google.cloud.spark#spark-bigquery-connector-common;0.41.0 from preferred-maven-central-mirror in [default]
	com.google.cloud.spark#spark-bigquery-dsv2-common;0.41.0 from preferred-maven-central-mirror in [default]&lt;/LI-CODE&gt;&lt;P&gt;When I execute my notebook, it returned the same error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Py4JJavaError: An error occurred while calling o438.save.
: org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.google.cloud.spark.bigquery.BigQueryRelationProvider. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version. SQLSTATE: 42K02&lt;/LI-CODE&gt;&lt;P&gt;Can you please suggest any Databricks Runtime Version which include spark-bigquery-connector version 0.41.0 with spark 3.5?&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 07:49:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/104031#M41644</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2025-01-03T07:49:32Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/104104#M41666</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/117977"&gt;@soumiknow&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;This is not the output from -verbose:class, what you see is likely coming from importing the library from an external repository and its showing the add dependencies process. Telling it has pulled and downloaded the "com.google.cloud.spark#spark-bigquery-connector-common;0.41.0!spark-bigquery-connector-common.jar", which is probably added via a sparkSession or Cluster Library with Maven as source. The verbose:class flag, prints in the STDOUT output, something like this(classloading):
&lt;OL&gt;
&lt;LI&gt;&lt;U&gt;&lt;EM&gt;&lt;STRONG&gt;[Loaded com.google.cloud.spark.bigquery.BigQueryRelationProvider from file:/databricks/jars/----ws_3_5--third_party--bigquery-connector--spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--118181791--fatJar-assembly-0.22.2-SNAPSHOT.jar]&lt;/STRONG&gt;&lt;/EM&gt;&lt;/U&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;LI&gt;In DBR 15.4 LTS, which you're already using, you also have already available the:
&lt;OL&gt;
&lt;LI&gt;----ws_3_5--third_party--bigquery-connector--spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--118181791--fatJar-assembly-0.22.2-SNAPSHOT.jar, and&lt;/LI&gt;
&lt;LI&gt;----ws_3_5--third_party--bigquery-connector--spark-bigquery-connector-upgrade_scala-2.12--118181791--spark-bigquery-with-dependencies_2.12-0.41.0.jar&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;LI&gt;The answer you got from the Google Support team, is referring to the&amp;nbsp;&lt;STRONG&gt;fatJar-assembly-0.22.2-SNAPSHOT&lt;/STRONG&gt;. But the spark-bigquery v0.22, does have the &lt;STRONG&gt;"com.google.cloud.spark.bigquery.BigQueryRelationProvider" &lt;/STRONG&gt;available as you can see in here&amp;nbsp;&lt;A href="https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/branch-0.22/connector/src/main/scala/com/google/cloud/spark/bigquery/BigQueryRelationProvider.scala" target="_blank" rel="noopener"&gt;https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/branch-0.22/connector/src/main/scala/com/google/cloud/spark/bigquery/BigQueryRelationProvider.scala.&lt;/A&gt;&amp;nbsp;So the problem you're running into, is not related to the jar file version itself.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;So at this point you have two different issues, &lt;STRONG&gt;1) Need to use the v0.40&lt;/STRONG&gt;, and&lt;STRONG&gt; 2) Currently getting a DATA_SOURCE_NOT_FOUND error&lt;/STRONG&gt;.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;If you encounter the &lt;STRONG&gt;DATA_SOURCE_NOT_FOUND&lt;/STRONG&gt; error, it means the data source name provided to Spark is not resolvable either in its built-in registry or through any dynamically loaded libraries. I'm honestly unsure how are you running into this error, and would need your clarification comments about the current cluster status and setup to help you with it. If I had to guess, I would say you have manually deleted this fat jar v0.20 from the cluster altogheter, using an init script maybe?&lt;/LI&gt;
&lt;LI&gt;When you call "format("bigquery")" this is what will happen behind the scenes in the DataSource.scala:&lt;BR /&gt;&lt;LI-CODE lang="markup"&gt;      case name if name.equalsIgnoreCase("bigquery") =&amp;gt;
          "com.google.cloud.spark.bigquery.BigQueryRelationProvider"
&lt;/LI-CODE&gt;&lt;/LI&gt;
&lt;LI&gt;So&amp;nbsp;lookupDataSource will try to find the "com.google.cloud.spark.bigquery.BigQueryRelationProvider" and then Spark instantiates the "BigQueryRelationProvider". In other words, when you use spark.read.format("bigquery"), Spark uses this mapping to locate and load the appropriate class.&amp;nbsp;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Could you please run this from a notebook in DBR 15.4 LTS with no additional libraries attached to it:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;%python
class_name = "com.google.cloud.spark.bigquery.BigQueryRelationProvider"

try:
    # Get the class reference
    cls = spark._jvm.Thread.currentThread().getContextClassLoader().loadClass(class_name)
    # Get the JAR file path
    jar_path = cls.getProtectionDomain().getCodeSource().getLocation().getPath()
    print(f"The class {class_name} is loaded from: {jar_path}")
except Exception as e:
    print(f"Error locating the class {class_name}: {e}")&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It should return:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;The class com.google.cloud.spark.bigquery.BigQueryRelationProvider is loaded from: /databricks/jars/----ws_3_5--third_party--bigquery-connector--spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--118181791--fatJar-assembly-0.22.2-SNAPSHOT.jar&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;If it does not then, the reason clearly is the absence of a jar file with the class mentioned in the error message.&lt;/P&gt;
&lt;P&gt;The MVN output you've shared though is actually interesting, solely based on my personal assumption, you've attempted to add the spark-bigquery artifact through Cluster Library with Maven as source, but the artifact that you have pulled does not have the BigQueryRelationProvider as well, if you're ok with going with the latest version, the Maven coordinates you should be using are "&lt;STRONG&gt;com.google.cloud.spark:spark-bigquery_2.12:0.41.1&lt;/STRONG&gt;"&lt;/P&gt;
&lt;P&gt;Then rerunning the same code:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;class_name = "com.google.cloud.spark.bigquery.BigQueryRelationProvider"

try:
    cls = spark._jvm.Thread.currentThread().getContextClassLoader().loadClass(class_name)
    jar_path = cls.getProtectionDomain().getCodeSource().getLocation().getPath()
    print(f"The class {class_name} is loaded from: {jar_path}")

    # Print the package information (often contains version info)
    package = cls.getPackage()
    print(f"Package Specification Title: {package.getSpecificationTitle()}")
    print(f"Package Specification Version: {package.getSpecificationVersion()}")
    print(f"Package Implementation Version: {package.getImplementationVersion()}")
except Exception as e:
    print(f"Error: {e}")&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Should return:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;The class com.google.cloud.spark.bigquery.BigQueryRelationProvider is loaded from: /local_disk0/tmp/addedFile235d6f3b981a4f61bb72e599c2d013986386268607538077001/com_google_cloud_spark_spark_bigquery_2_12_0_41_1.jar
Package Specification Title: BigQuery DataSource v1 for Scala 2.12
Package Specification Version: 0.41
Package Implementation Version: 0.41.1&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Whether the library and your current cluster status after the changes is stable, supported or not, I'm not sure., but you're always welcome to raise a support ticket and one of our engineers will kindly continue with the assistance.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 16:13:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/104104#M41666</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2025-01-03T16:13:22Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/104699#M41847</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/117977"&gt;@soumiknow&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Just checking if there are any further questions, and did my last comment help?&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jan 2025 12:51:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/104699#M41847</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2025-01-08T12:51:00Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/112536#M44245</link>
      <description>&lt;P&gt;Issue got resolved with DRB 16.1. Many thanks to Support Team.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Mar 2025 05:18:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/112536#M44245</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2025-03-14T05:18:19Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/117764#M45565</link>
      <description>&lt;P&gt;I'm using &lt;STRONG&gt;DBR&lt;/STRONG&gt; &lt;STRONG&gt;16.3&lt;/STRONG&gt; and all partitions are still being deleted. This is the code I'm using. No success.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;spark = (
        SparkSession.builder.config("spark.datasource.bigquery.intermediateFormat", "orc")
        .config("spark.sql.sources.partitionOverwriteMode", "dynamic")
        .getOrCreate()
    )
visiting_client_day = (
        spark.read.format("delta")
        .load("s3://bucket-2/gold/visiting_client_day")
        .where(col("date_utc") == lit("2025-05-04"))
    )
(
        visiting_client_day.write.format("bigquery")
        .option("parentProject", "parentProject")
        .option("project", "project")
        .option("temporaryGcsBucket", "bucket")
        .mode("overwrite")
        .option("table", FINAL_TABLE)
        .save()
    )&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 05 May 2025 23:08:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/117764#M45565</guid>
      <dc:creator>ambar2595</dc:creator>
      <dc:date>2025-05-05T23:08:27Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/117795#M45572</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114737"&gt;@ambar2595&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Please could you try adding 'writeMedthod' option with 'indirect' value.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;option("writeMethod", "indirect")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 May 2025 05:11:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/117795#M45572</guid>
      <dc:creator>soumiknow</dc:creator>
      <dc:date>2025-05-06T05:11:36Z</dc:date>
    </item>
    <item>
      <title>Re: BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMI</title>
      <link>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/117844#M45575</link>
      <description>&lt;P&gt;According to the documentation, this is the default value.&amp;nbsp;&lt;A href="https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/master/README.md" target="_blank" rel="noopener"&gt;https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/master/README.md&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ambar2595_0-1746521270975.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/16513i9CE65A78D92F4B4B/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ambar2595_0-1746521270975.png" alt="ambar2595_0-1746521270975.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;and I just tried it and it didn't work. &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 May 2025 08:51:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/bq-partition-data-deleted-fully-even-though-spark-sql-sources/m-p/117844#M45575</guid>
      <dc:creator>ambar2595</dc:creator>
      <dc:date>2025-05-06T08:51:07Z</dc:date>
    </item>
  </channel>
</rss>

