cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Reading bigquery data using a query

naga_databricks
Contributor

To read Bigquery data using spark.read, i'm using a query. This query executes and creates a table on the materializationDataset. 


df = spark.read.format("bigquery") \
.option("query", query) \
.option("materializationProject", materializationProject) \
.option("materializationExpirationTimeInMinutes", materializationExpirationTimeInMinutes) \
.option("materializationDataset", materializationDataset) \
.load()
df.write.mode("overwrite").format("delta").saveAsTable(deltaTableName)

This documentation on databricks says, the spark.read will create a Materialized View, however i see a table. Is the documentation accurate? Or is there a way to select if spark can create a table or a materialized view?

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @naga_databricksThe Databricks documentation does not explicitly state that spark.read BigQuery format will create a Materialized View.

Instead, it mentions that it can read from a BigQuery table or the result of a BigQuery SQL query. When you specify the query option, it executes the SQL query in BigQuery and loads the result into a DataFrame.

In your case, the materializationDataset option specifies the BigQuery data set that will be used to store the temporary table resulting from the query execution. This table is materialized in BigQuery for the duration specified by the materializationExpirationTimeInMinutes option. 

However, this does not mean it creates a Materialized View in the BigQuery sense.

In BigQuery, a Materialized View is a precomputed result set of a SQL query that can be used to speed up SQL query execution, especially for complex queries. 

So, to answer your question, the Databricks documentation seems accurate based on the given information.

There is no direct way to select if spark.read you can create a table or a Materialized View in BigQuery. It simply executes the query and loads the result into a DataFrame, and the temporary table made in the process is more of an implementation detail.

Sources - https://docs.databricks.com/external-data/bigquery.html

Thanks Kaniz for the explanation. It sounds good to me that i'm working with tables here.

For me the only cause of confusion was the underlined writing:

naga_databricks_1-1692970117627.png

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group