cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark persistent view on a partition parquet file

sage5616
Valued Contributor

In Spark, is it possible to create a persistent view on a partitioned parquet file in Azure BLOB? The view must be available when the cluster restarted, without having to re-create that view, hence it cannot be a temp view.

I can create a temp view, but not the persistent view. Following code returns an exception.

spark.sql("CREATE VIEW test USING parquet OPTIONS (path \"/mnt/folder/file.c000.snappy.parquet\")")
ParseException: 
mismatched input 'USING' expecting {'(', 'UP_TO_DATE', 'AS', 'COMMENT', 'PARTITIONED', 'TBLPROPERTIES'}(line 1, pos 23)

Big thank you for taking a look 🙂

1 ACCEPTED SOLUTION

Accepted Solutions

sage5616
Valued Contributor

Here is what worked for me. Hope this helps someone else: https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914...

CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`

@Hubert Dudek​ & @Tomasz Bacewicz​ unfortunately your answers are not useful.

P.S. I can not hard code the columns or dynamically define table DDL in order to create the external table. I need the schema of the parquet file to be inferred at table creation from the file, without explicitly hard coding the schema ahead.

View solution in original post

3 REPLIES 3

tomasz
New Contributor III
New Contributor III

Have you tried creating an external table on top of the existing parquet data? Views are built on top of existing tables registered in the metastore (not directly on files).

You would use the External table functionality by using LOCATION in your query (https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-an-external-table)

Keep in mind that the path specified should be to a directory, not a specific parquet file.

Hubert-Dudek
Esteemed Contributor III

VIEW is the implementation of select statements. Please register the parquet as an external TABLE.

sage5616
Valued Contributor

Here is what worked for me. Hope this helps someone else: https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914...

CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`

@Hubert Dudek​ & @Tomasz Bacewicz​ unfortunately your answers are not useful.

P.S. I can not hard code the columns or dynamically define table DDL in order to create the external table. I need the schema of the parquet file to be inferred at table creation from the file, without explicitly hard coding the schema ahead.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.