cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Google BigQuery Foreign Catalog - Incorrect Data Format

RobsonNLPT
Contributor III

I've tested a foreign catalog connected to a google bigquery project.

The connection was ok and I was able to see my datasets and tables

The problem: for columns with regular data types the data format is perfect but the columns with type record and repeated(arrays) I was expecting the see the json format like I see in google big query results.

The data is a json but with a completely different schema and it doesn't make any sense. The foreign catalog maps the record and repeated data types to varchar(65535). 

Federation is a great feature but those incorrect data conversions are a disaster. 

Any helps?

 

 

 

 

 

 

3 REPLIES 3

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @RobsonNLPT,

This is a limitation, the data conversion issue you are facing is expected behavior due to the current data type mappings supported by the Lakehouse Federation platform. Unfortunately, this means that the JSON format you see in Google BigQuery results is not preserved when the data is accessed through the foreign catalog in Databricks. BigQuery types such as array, geography, interval, json, string, and struct are mapped to VarcharType in Spark. I will check if there is a feature request to adjust this.

Hi Alberto.

One thing is you convert as string.

The other thing is delivering a json completely wrong. They should deliver at least the json as string 

This is not only limitations. You can't release a feature with those unacceptable issues. Data is asset..

Hi Alberto

I've found a solution using spark connector with the credentials

 

spark.read.format("bigquery")
This returns the correct data and format I expect.
I highly recommend a fix on federation engine to support bigquery as a foreign catalog
 
Best regards
 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group