cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Spark persistent view on a partition parquet file

sage5616
Valued Contributor

In Spark, is it possible to create a persistent view on a partitioned parquet file in Azure BLOB? The view must be available when the cluster restarted, without having to re-create that view, hence it cannot be a temp view.

I can create a temp view, but not the persistent view. Following code returns an exception.

spark.sql("CREATE VIEW test USING parquet OPTIONS (path \"/mnt/folder/file.c000.snappy.parquet\")")
ParseException: 
mismatched input 'USING' expecting {'(', 'UP_TO_DATE', 'AS', 'COMMENT', 'PARTITIONED', 'TBLPROPERTIES'}(line 1, pos 23)

Big thank you for taking a look ๐Ÿ™‚

1 ACCEPTED SOLUTION

Accepted Solutions

sage5616
Valued Contributor

Here is what worked for me. Hope this helps someone else: https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914...

CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`

@Hubert Dudekโ€‹ & @Tomasz Bacewiczโ€‹ unfortunately your answers are not useful.

P.S. I can not hard code the columns or dynamically define table DDL in order to create the external table. I need the schema of the parquet file to be inferred at table creation from the file, without explicitly hard coding the schema ahead.

View solution in original post

3 REPLIES 3

tomasz
Databricks Employee
Databricks Employee

Have you tried creating an external table on top of the existing parquet data? Views are built on top of existing tables registered in the metastore (not directly on files).

You would use the External table functionality by using LOCATION in your query (https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-an-external-table)

Keep in mind that the path specified should be to a directory, not a specific parquet file.

Hubert-Dudek
Esteemed Contributor III

VIEW is the implementation of select statements. Please register the parquet as an external TABLE.

sage5616
Valued Contributor

Here is what worked for me. Hope this helps someone else: https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914...

CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`

@Hubert Dudekโ€‹ & @Tomasz Bacewiczโ€‹ unfortunately your answers are not useful.

P.S. I can not hard code the columns or dynamically define table DDL in order to create the external table. I need the schema of the parquet file to be inferred at table creation from the file, without explicitly hard coding the schema ahead.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group