<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Spark persistent view on a partition parquet file in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-persistent-view-on-a-partition-parquet-file/m-p/14214#M8741</link>
    <description>&lt;P&gt;VIEW is the implementation of select statements. Please register the parquet as an external TABLE. &lt;/P&gt;</description>
    <pubDate>Fri, 08 Jul 2022 16:51:32 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2022-07-08T16:51:32Z</dc:date>
    <item>
      <title>Spark persistent view on a partition parquet file</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-persistent-view-on-a-partition-parquet-file/m-p/14212#M8739</link>
      <description>&lt;P&gt;In Spark, is it possible to create a persistent view on a partitioned parquet file in Azure BLOB? The view must be available when the cluster restarted, without having to re-create that view, hence it cannot be a temp view.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I can create a temp view, but not the persistent view. Following code returns an exception.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql("CREATE VIEW test USING parquet OPTIONS (path \"/mnt/folder/file.c000.snappy.parquet\")")&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;ParseException: 
mismatched input 'USING' expecting {'(', 'UP_TO_DATE', 'AS', 'COMMENT', 'PARTITIONED', 'TBLPROPERTIES'}(line 1, pos 23)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Big thank you for taking a look &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2022 15:39:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-persistent-view-on-a-partition-parquet-file/m-p/14212#M8739</guid>
      <dc:creator>sage5616</dc:creator>
      <dc:date>2022-07-08T15:39:55Z</dc:date>
    </item>
    <item>
      <title>Re: Spark persistent view on a partition parquet file</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-persistent-view-on-a-partition-parquet-file/m-p/14213#M8740</link>
      <description>&lt;P&gt;Have you tried creating an external table on top of the existing parquet data? Views are built on top of existing tables registered in the metastore (not directly on files). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You would use the External table functionality by using LOCATION in your query (https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-an-external-table)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Keep in mind that the path specified should be to a directory, not a specific parquet file.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2022 15:56:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-persistent-view-on-a-partition-parquet-file/m-p/14213#M8740</guid>
      <dc:creator>tomasz</dc:creator>
      <dc:date>2022-07-08T15:56:25Z</dc:date>
    </item>
    <item>
      <title>Re: Spark persistent view on a partition parquet file</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-persistent-view-on-a-partition-parquet-file/m-p/14214#M8741</link>
      <description>&lt;P&gt;VIEW is the implementation of select statements. Please register the parquet as an external TABLE. &lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2022 16:51:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-persistent-view-on-a-partition-parquet-file/m-p/14214#M8741</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-07-08T16:51:32Z</dc:date>
    </item>
    <item>
      <title>Re: Spark persistent view on a partition parquet file</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-persistent-view-on-a-partition-parquet-file/m-p/14215#M8742</link>
      <description>&lt;P&gt;Here is what worked for me. Hope this helps someone else: &lt;A href="https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914245#72914245" alt="https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914245#72914245" target="_blank"&gt;https://stackoverflow.com/questions/72913913/spark-persistent-view-on-a-partition-parquet-file/72914245#72914245&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;CREATE VIEW test as select * from parquet.`/mnt/folder-with-parquet-file(s)/`&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;@Hubert Dudek​&amp;nbsp;&amp;amp; @Tomasz Bacewicz​&amp;nbsp;unfortunately your answers are not useful.&lt;/P&gt;&lt;P&gt;P.S. I can not hard code the columns or dynamically define table DDL in order to create the external table. I need the schema of the parquet file to be inferred at table creation from the file, without explicitly hard coding the schema ahead.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2022 17:06:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-persistent-view-on-a-partition-parquet-file/m-p/14215#M8742</guid>
      <dc:creator>sage5616</dc:creator>
      <dc:date>2022-07-08T17:06:20Z</dc:date>
    </item>
  </channel>
</rss>

