<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Is there a way to create a non-temporary Spark View with PySpark? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13426#M8114</link>
    <description>&lt;P&gt;Just create Table instead&lt;/P&gt;</description>
    <pubDate>Thu, 14 Oct 2021 15:07:04 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2021-10-14T15:07:04Z</dc:date>
    <item>
      <title>Is there a way to create a non-temporary Spark View with PySpark?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13425#M8113</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;When creating a Spark view using SparkSQL ("CREATE VIEW AS SELCT ...") per default, this view is &lt;B&gt;&lt;U&gt;non-temporary&lt;/U&gt;&lt;/B&gt; - the view definition will survive the Spark session as well as the Spark cluster.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In PySpark I can use DataFrame.createOrReplaceTempView or &lt;A href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html" alt="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.createOrReplaceGlobalTempView.html" target="_blank"&gt;DataFrame.createOrReplaceGlobalTempView&lt;/A&gt; to create a &lt;B&gt;&lt;U&gt;temporary&lt;/U&gt;&lt;/B&gt; view for a DataFrame.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there a way to create a &lt;B&gt;non-temporary&lt;/B&gt; Spark View with &lt;B&gt;PySpark&lt;/B&gt; for a DataFrame programatically?&lt;/P&gt;&lt;P&gt;spark.sql('CREATE VIEW AS SELCT ...') doesn't count &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I did not find a &lt;A href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.html?highlight=view" alt="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.html?highlight=view" target="_blank"&gt;DataFrame method&lt;/A&gt; to do so...&lt;/P&gt;</description>
      <pubDate>Thu, 14 Oct 2021 14:18:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13425#M8113</guid>
      <dc:creator>MartinB</dc:creator>
      <dc:date>2021-10-14T14:18:49Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to create a non-temporary Spark View with PySpark?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13426#M8114</link>
      <description>&lt;P&gt;Just create Table instead&lt;/P&gt;</description>
      <pubDate>Thu, 14 Oct 2021 15:07:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13426#M8114</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-10-14T15:07:04Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to create a non-temporary Spark View with PySpark?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13427#M8115</link>
      <description>&lt;P&gt;Creating a table would imply &lt;B&gt;&lt;U&gt;data persistance&lt;/U&gt;&lt;/B&gt;, wouldn't it?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't want that.&lt;/P&gt;</description>
      <pubDate>Thu, 14 Oct 2021 16:45:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13427#M8115</guid>
      <dc:creator>MartinB</dc:creator>
      <dc:date>2021-10-14T16:45:36Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to create a non-temporary Spark View with PySpark?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13428#M8116</link>
      <description>&lt;P&gt;hi @Martin B.​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;There are 2 types of views. TEMPORARY views are session-scoped and is dropped when session ends because it skips persisting the definition in the underlying metastore, if any. GLOBAL TEMPORARY views are tied to a system preserved temporary database global_temp. If you would like to know more about it, please refer to the &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-view.html#parameters" alt="https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-view.html#parameters" target="_blank"&gt;docs&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If none of these two options work for you, then the other option will be to create a physical table like @Hubert Dudek​&amp;nbsp; mentioned.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 14 Oct 2021 17:18:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13428#M8116</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-10-14T17:18:03Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to create a non-temporary Spark View with PySpark?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13429#M8117</link>
      <description>&lt;P&gt;Hi @Jose Gonzalez​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would argue, there are 3 types of views:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;TEMPORARY VIEWS&lt;UL&gt;&lt;LI&gt;CREATE TEMPORARY VIEW sam AS SELECT * FROM ...&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;GLOBAL TEMPORARY VIEWS&lt;UL&gt;&lt;LI&gt;CREATE GLOBAL TEMPORARY VIEW sam AS SELECT * FROM ...&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;B&gt;&lt;U&gt;NON-TEMPORARY VIEWS&lt;/U&gt;&lt;/B&gt;&lt;UL&gt;&lt;LI&gt;CREATE VIEW sam AS SELECT * FROM ...&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please see the example here: &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-aux-show-views.html#examples" alt="https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-aux-show-views.html#examples" target="_blank"&gt;https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-aux-show-views.html#examples&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2395i1510B54DCCBC48EF/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;I want to create a non-temporary view (isTemporary=false) - not with SparkSQL but with PySpark. &lt;/P&gt;</description>
      <pubDate>Sat, 16 Oct 2021 08:37:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13429#M8117</guid>
      <dc:creator>MartinB</dc:creator>
      <dc:date>2021-10-16T08:37:34Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to create a non-temporary Spark View with PySpark?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13430#M8118</link>
      <description>&lt;P&gt;@Jose Gonzalez​&amp;nbsp; or @Piper Wilson​&amp;nbsp; any ideas?&lt;/P&gt;</description>
      <pubDate>Wed, 10 Nov 2021 13:20:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13430#M8118</guid>
      <dc:creator>MartinB</dc:creator>
      <dc:date>2021-11-10T13:20:02Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to create a non-temporary Spark View with PySpark?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13431#M8119</link>
      <description>&lt;P&gt;why not to create manage table?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;dataframe.write.mode(SaveMode.Overwrite).saveAsTable("&amp;lt;example-table&amp;gt;")
&amp;nbsp;
# later when we need data
resultDf = spark.read.table("&amp;lt;example-table&amp;gt;")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Nov 2021 13:29:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13431#M8119</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-11-10T13:29:29Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to create a non-temporary Spark View with PySpark?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13432#M8120</link>
      <description>&lt;P&gt;@Hubert Dudek​,&amp;nbsp;creating a managed table means persisting the data frame (writing the content of the dataframe to storage) .&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Imagine you have a Spark table (in delta lake format) containing your raw data. Every 5 Minutes there is new data appended to that raw table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I want to refine the raw data: filter it, select just specific columns and convert some columns to a better data type (e.g string to date). With VIEWSs you can apply this transformations &lt;B&gt;&lt;U&gt;virtually&lt;/U&gt;&lt;/B&gt;. Every time you access the view, the data are transformed at access time - but you always get the&lt;B&gt;&lt;U&gt; current data&lt;/U&gt;&lt;/B&gt; from the underlying table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When creating a managed table you basically create a copy of the content of the raw table with the transformations applied. Persisting data comes at performance and storage costs.&lt;/P&gt;&lt;P&gt;Moreover, every time, I want to access my "clean" version of the data I have to &lt;B&gt;&lt;U&gt;specify &lt;/U&gt;&lt;/B&gt;the transformation logic again. VIEWs allow me to just access my transformed raw data without the need of manual refresh or persistence.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Nov 2021 16:24:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13432#M8120</guid>
      <dc:creator>MartinB</dc:creator>
      <dc:date>2021-11-10T16:24:45Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a way to create a non-temporary Spark View with PySpark?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13433#M8121</link>
      <description>&lt;P&gt;ok now I got it finally &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; so whole question is just to create CREATE VIEW AS SELECT via PySpark API. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;we can see that in &lt;B&gt;dataframe.py that &lt;/B&gt;all views are temp:&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/spark/blob/7d9a4fab7957086a12ba3e9e2856e20566531e3a/python/pyspark/sql/dataframe.py" alt="https://github.com/apache/spark/blob/7d9a4fab7957086a12ba3e9e2856e20566531e3a/python/pyspark/sql/dataframe.py" target="_blank"&gt;https://github.com/apache/spark/blob/7d9a4fab7957086a12ba3e9e2856e20566531e3a/python/pyspark/sql/dataframe.py&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;this PySpark API is routing to&lt;B&gt; Dataset.scala&lt;/B&gt;:&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala" alt="https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala" target="_blank"&gt;https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;in &lt;B&gt;Dataset.scala &lt;/B&gt;we can see condition which should be rebuild-ed. It is not allowing PersistedView&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;val viewType = if (global) GlobalTempView else LocalTempView&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;it should be something like in &lt;B&gt;SparkSqlParser.scala &lt;/B&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;      val viewType = if (ctx.TEMPORARY == null) {
        PersistedView
      } else if (ctx.GLOBAL != null) {
        GlobalTempView
      } else {
        LocalTempView
      }&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;so:&lt;/P&gt;&lt;P&gt;&lt;B&gt;1)&lt;/B&gt;&lt;/P&gt;&lt;P&gt;private def createTempViewCommand in &lt;B&gt;Dataset.scala &lt;/B&gt;need additional viewType param and rather should be renamed (already name is wrong as Global was added)&lt;/P&gt;&lt;P&gt;&lt;B&gt;2) &lt;/B&gt;than functions like createGlobalPersistantCommand etc. ould be added in &lt;B&gt;Dataset.scala&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;3) &lt;/B&gt;than de fcreateGlobalPersistantView etc. could be added to &lt;B&gt;dataframe.py&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;after it is done in Spark X (? :- ) it will be possible.&lt;/P&gt;&lt;P&gt;Maybe someone want to contribute and create commits &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Nov 2021 19:17:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-there-a-way-to-create-a-non-temporary-spark-view-with-pyspark/m-p/13433#M8121</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-11-10T19:17:45Z</dc:date>
    </item>
  </channel>
</rss>

