cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks always loads built-in BigQuery connector (0.22.2), can’t override with 0.43.x

SupunK
Visitor

I am using Databricks Runtime 15.4 (Spark 3.5 / Scala 2.12) on AWS.

My goal is to use the latest Google BigQuery connector because I need the direct write method (BigQuery Storage Write API):

option("writeMethod", "direct")

This allows writing directly into BigQuery without requiring a temporary GCS bucket, which is necessary in my environment.

To do this, I installed the official Google connector as a cluster library via Maven:

com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.43.1

The library installs successfully and shows as "Attached" on the cluster.

However, Databricks does not use this connector at runtime. To check which connector is actually being loaded, I run:

 

python
jvm = spark._jvm
provider = jvm.com.google.cloud.spark.bigquery.BigQueryRelationProvider()
location = provider.getClass().getProtectionDomain().getCodeSource().getLocation().toString()
print(location)

The output is always:

...spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--fatJar-assembly-0.22.2-SNAPSHOT.jar

This means Databricks always loads its built-in forked connector (0.22.2-SNAPSHOT) instead of the Google connector (0.43.x) that I installed.

Additional observations:

  • Restarting the cluster does not change anything.

  • The installed connector appears as "Attached" but never shows up in /databricks/jars.

  • /databricks/jars only contains:

    • spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--fatJar-assembly-0.22.2-SNAPSHOT.jar
    • spark-bigquery-with-dependencies_2.12-0.41.0.jar (Databricks' own copy)
    • spark.read.format("bigquery") still resolves to the built-in connector every time.

      Question: Is there any supported way on Databricks Runtime 15.4 to override or replace the built-in BigQuery connector so that:

      spark.read.format("bigquery")

      uses the Google spark-bigquery-with-dependencies_2.12 (0.43.x) connector, specifically to allow using the direct write method without a temporary GCS bucket?

      Or is the Databricks BigQuery connector version fixed and not user-overridable?

  •  

  •  

0 REPLIES 0