- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2022 08:39 AM
In Spark, is it possible to create a persistent view on a partitioned parquet file in Azure BLOB? The view must be available when the cluster restarted, without having to re-create that view, hence it cannot be a temp view.
I can create a temp view, but not the persistent view. Following code returns an exception.
spark.sql("CREATE VIEW test USING parquet OPTIONS (path \"/mnt/folder/file.c000.snappy.parquet\")")
ParseException:
mismatched input 'USING' expecting {'(', 'UP_TO_DATE', 'AS', 'COMMENT', 'PARTITIONED', 'TBLPROPERTIES'}(line 1, pos 23)
Big thank you for taking a look 🙂
- Labels:
-
Azure
-
Persistent View
-
Pyspark
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2022 10:06 AM
Here is what worked for me. Hope this helps someone else: https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914...
CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`
@Hubert Dudek & @Tomasz Bacewicz unfortunately your answers are not useful.
P.S. I can not hard code the columns or dynamically define table DDL in order to create the external table. I need the schema of the parquet file to be inferred at table creation from the file, without explicitly hard coding the schema ahead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2022 08:56 AM
Have you tried creating an external table on top of the existing parquet data? Views are built on top of existing tables registered in the metastore (not directly on files).
You would use the External table functionality by using LOCATION in your query (https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-an-external-table)
Keep in mind that the path specified should be to a directory, not a specific parquet file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2022 09:51 AM
VIEW is the implementation of select statements. Please register the parquet as an external TABLE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2022 10:06 AM
Here is what worked for me. Hope this helps someone else: https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914...
CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`
@Hubert Dudek & @Tomasz Bacewicz unfortunately your answers are not useful.
P.S. I can not hard code the columns or dynamically define table DDL in order to create the external table. I need the schema of the parquet file to be inferred at table creation from the file, without explicitly hard coding the schema ahead.

