cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark persistent view on a partition parquet file

sage5616
Valued Contributor

In Spark, is it possible to create a persistent view on a partitioned parquet file in Azure BLOB? The view must be available when the cluster restarted, without having to re-create that view, hence it cannot be a temp view.

I can create a temp view, but not the persistent view. Following code returns an exception.

spark.sql("CREATE VIEW test USING parquet OPTIONS (path \"/mnt/folder/file.c000.snappy.parquet\")")
ParseException: 
mismatched input 'USING' expecting {'(', 'UP_TO_DATE', 'AS', 'COMMENT', 'PARTITIONED', 'TBLPROPERTIES'}(line 1, pos 23)

Big thank you for taking a look 🙂

1 ACCEPTED SOLUTION

Accepted Solutions

sage5616
Valued Contributor

Here is what worked for me. Hope this helps someone else: https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914...

CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`

@Hubert Dudek​ & @Tomasz Bacewicz​ unfortunately your answers are not useful.

P.S. I can not hard code the columns or dynamically define table DDL in order to create the external table. I need the schema of the parquet file to be inferred at table creation from the file, without explicitly hard coding the schema ahead.

View solution in original post

3 REPLIES 3

tomasz
Databricks Employee
Databricks Employee

Have you tried creating an external table on top of the existing parquet data? Views are built on top of existing tables registered in the metastore (not directly on files).

You would use the External table functionality by using LOCATION in your query (https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-an-external-table)

Keep in mind that the path specified should be to a directory, not a specific parquet file.

Hubert-Dudek
Esteemed Contributor III

VIEW is the implementation of select statements. Please register the parquet as an external TABLE.

sage5616
Valued Contributor

Here is what worked for me. Hope this helps someone else: https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914...

CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`

@Hubert Dudek​ & @Tomasz Bacewicz​ unfortunately your answers are not useful.

P.S. I can not hard code the columns or dynamically define table DDL in order to create the external table. I need the schema of the parquet file to be inferred at table creation from the file, without explicitly hard coding the schema ahead.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now