Bigquery using foreign catalog change behavior from runtime 15.4 to 16.4

mguirao
New Contributor II

Hello,

I wanted to update my all-purpose cluster from 15.4 to higher (like 16.4) but I noticed a change in the behavior of my query on a BigQuery catalog.

Using 15.4 my query runs only 1 job 

databricks_no_issue.jpg

Using 16.4, same query on same resources produce 35 jobs (and is slower)

databricks_issue.jpeg

More than that, it seems that on BigQuery side, it is using stream and the BigQuery Storage API to query the data.
And since my Databricks instance is on Azure, it incurs costs with SKU "BigQuery Storage API Network Internet Data Transfer Out Europe to Europe", while it was not the case for 15.4

I have read the release note regarding 16.4 and found nothing explaining this change of behavior.
Can you please help me understand
Regards,

pradeep_singh
Contributor

In DBR 16.1+ Databricks switched the BigQuery federation connector from JDBC to the BigQuery Storage API,
which parallelizes reads (more jobs) and transfers data directly from BigQuery to Databricks compute—so cross‑cloud queries (Azure → GCP) can incur BigQuery Storage API egress charges; maximize pushdown (filters, projections, limits, joins), reduce selected columns, and consider co‑locating compute (Databricks on GCP) to mitigate network costs.

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

View solution in original post