cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Reading bigquery data using a query

naga_databricks
Contributor

To read Bigquery data using spark.read, i'm using a query. This query executes and creates a table on the materializationDataset. 


df = spark.read.format("bigquery") \
.option("query", query) \
.option("materializationProject", materializationProject) \
.option("materializationExpirationTimeInMinutes", materializationExpirationTimeInMinutes) \
.option("materializationDataset", materializationDataset) \
.load()
df.write.mode("overwrite").format("delta").saveAsTable(deltaTableName)

This documentation on databricks says, the spark.read will create a Materialized View, however i see a table. Is the documentation accurate? Or is there a way to select if spark can create a table or a materialized view?

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @naga_databricksThe Databricks documentation does not explicitly state that spark.read BigQuery format will create a Materialized View.

Instead, it mentions that it can read from a BigQuery table or the result of a BigQuery SQL query. When you specify the query option, it executes the SQL query in BigQuery and loads the result into a DataFrame.

In your case, the materializationDataset option specifies the BigQuery data set that will be used to store the temporary table resulting from the query execution. This table is materialized in BigQuery for the duration specified by the materializationExpirationTimeInMinutes option. 

However, this does not mean it creates a Materialized View in the BigQuery sense.

In BigQuery, a Materialized View is a precomputed result set of a SQL query that can be used to speed up SQL query execution, especially for complex queries. 

So, to answer your question, the Databricks documentation seems accurate based on the given information.

There is no direct way to select if spark.read you can create a table or a Materialized View in BigQuery. It simply executes the query and loads the result into a DataFrame, and the temporary table made in the process is more of an implementation detail.

Sources - https://docs.databricks.com/external-data/bigquery.html

Thanks Kaniz for the explanation. It sounds good to me that i'm working with tables here.

For me the only cause of confusion was the underlined writing:

naga_databricks_1-1692970117627.png

 

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.