cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks always loads built-in BigQuery connector (0.22.2), can’t override with 0.43.x

SupunK
New Contributor II

I am using Databricks Runtime 15.4 (Spark 3.5 / Scala 2.12) on AWS.

My goal is to use the latest Google BigQuery connector because I need the direct write method (BigQuery Storage Write API):

option("writeMethod", "direct")

This allows writing directly into BigQuery without requiring a temporary GCS bucket, which is necessary in my environment.

To do this, I installed the official Google connector as a cluster library via Maven:

com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.43.1

The library installs successfully and shows as "Attached" on the cluster.

However, Databricks does not use this connector at runtime. To check which connector is actually being loaded, I run:

 

python
jvm = spark._jvm
provider = jvm.com.google.cloud.spark.bigquery.BigQueryRelationProvider()
location = provider.getClass().getProtectionDomain().getCodeSource().getLocation().toString()
print(location)

The output is always:

...spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--fatJar-assembly-0.22.2-SNAPSHOT.jar

This means Databricks always loads its built-in forked connector (0.22.2-SNAPSHOT) instead of the Google connector (0.43.x) that I installed.

Additional observations:

  • Restarting the cluster does not change anything.

  • The installed connector appears as "Attached" but never shows up in /databricks/jars.

  • /databricks/jars only contains:

    • spark-bigquery-connector-hive-2.3__hadoop-3.2_2.12--fatJar-assembly-0.22.2-SNAPSHOT.jar
    • spark-bigquery-with-dependencies_2.12-0.41.0.jar (Databricks' own copy)
    • spark.read.format("bigquery") still resolves to the built-in connector every time.

      Question: Is there any supported way on Databricks Runtime 15.4 to override or replace the built-in BigQuery connector so that:

      spark.read.format("bigquery")

      uses the Google spark-bigquery-with-dependencies_2.12 (0.43.x) connector, specifically to allow using the direct write method without a temporary GCS bucket?

      Or is the Databricks BigQuery connector version fixed and not user-overridable?

  •  

  •  

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

There is no supported way on Databricks Runtime 15.4 to override or replace the built-in BigQuery connector to use your own version (such as 0.43.x) in order to access the direct write method. Databricks clusters come preloaded with their own managed version of the BigQuery connector, which is loaded by default both for Scala and Python APIs, even if you attach a newer version as a Maven or cluster library.

Connector Override Is Not Supported

  • Databricks enforces usage of its internal/forked BigQuery connector jar. This is loaded from /databricks/jars by default and takes precedence over any user-attached or Maven-installed connectors.

  • As of Databricks Runtime 15.x, there is no documented or officially supported mechanism to replace or “shadow” the built-in connector jar with a later version.

  • Even when a newer Google connector jar is supplied, Databricks’ class loading/mechanism gives priority to the built-in connector.

Built-in Connector Limitations

  • Databricks’ built-in BigQuery connector, as of runtime 15.4, does not support the Storage Write API direct method (writeMethod="direct"), nor does it support disabling GCS intermediate storage.

  • Requests to update or allow override of the connector version are tracked as feature requests with Google and Databricks, but as of November 2025 this feature is not available.

Workarounds and Alternatives

  • Using the DataFrame API with spark.read.format("bigquery") on Databricks Runtime will always resolve to Databricks’ managed connector, not Google’s latest connector.

  • If you need features available only in the new connector (such as direct writes via Storage Write API), you must use a non-Databricks Spark platform (for example, self-managed Spark on EMR or Dataproc), or wait until Databricks updates its built-in connector with support for those features.

  • Some users have experimented with removing or replacing /databricks/jars items with init scripts, but this is not a supported path and can destabilize the cluster.

Table: Connector Handling in Databricks

Approach Result on Databricks Runtime 15.4
Attach newer BigQuery Maven connector Built-in/forked connector is used
Provide Google’s 0.43.x as cluster lib Ignored: built-in 0.22.2-SNAPSHOT is used
Remove built-in jar via init script Not supported, may break cluster
Use spark.read.format("bigquery") Always resolves to built-in connector
Use non-Databricks Spark distribution Latest Google connector can be used
 
 

You can follow Databricks and Google BigQuery release notes for changes on this limitation, but as of now, Databricks’ connector version is fixed and not user-overridable.