<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: CREATE view USING json and *include* _metadata, _rescued_data in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/105120#M41999</link>
    <description>&lt;DIV&gt;I am able to perform the below operation for a delta table.&lt;/DIV&gt;
&lt;DIV&gt;&lt;LI-CODE lang="markup"&gt;SELECT *,_metadata.file_name FROM anytable where condition.&lt;/LI-CODE&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/en/ingestion/file-metadata-column.html" target="_blank"&gt;https://docs.databricks.com/en/ingestion/file-metadata-column.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;You can use something like&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN class="n"&gt;df&lt;/SPAN&gt; &lt;SPAN class="o"&gt;=&lt;/SPAN&gt; &lt;SPAN class="n"&gt;spark&lt;/SPAN&gt;&lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;read&lt;/SPAN&gt; \
  &lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;format&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="s2"&gt;"json"&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt; \
  &lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;schema&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="n"&gt;schema&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt; \
  &lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;load&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="s2"&gt;"dbfs:/tmp/*"&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt; \
  &lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;select&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="s2"&gt;"*"&lt;/SPAN&gt;&lt;SPAN class="p"&gt;,&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;"_metadata"&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt;

&lt;SPAN class="n"&gt;display&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="n"&gt;df&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;. You can enable the rescued data column by setting the option&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;rescuedDataColumn&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;to a column name, such as&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;_rescued_data&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;with&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;spark.read.option("rescuedDataColumn",&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;"_rescued_data").format("json").load(&amp;lt;path&amp;gt;)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/en/query/formats/json.html#rescued-data-column" target="_blank"&gt;https://docs.databricks.com/en/query/formats/json.html#rescued-data-column&lt;/A&gt;&lt;/P&gt;
&lt;/DIV&gt;</description>
    <pubDate>Fri, 10 Jan 2025 07:26:36 GMT</pubDate>
    <dc:creator>NandiniN</dc:creator>
    <dc:date>2025-01-10T07:26:36Z</dc:date>
    <item>
      <title>CREATE view USING json and *include* _metadata, _rescued_data</title>
      <link>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99224#M39930</link>
      <description>&lt;P&gt;Title may be self-explanatory. Basically, I'm curious to ask if it's possible (and if so how) to add `_metadata` and `_rescued_data` fields to a view "using json".&lt;/P&gt;&lt;P&gt;e.g.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;%sql

CREATE OR REPLACE VIEW entity_view
USING json
OPTIONS (path="/.../.*json",multiline=true)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below I show it's possible to do this with `read_files` in Spark SQL:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ChristianRRL_0-1731949214474.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12966i3FD58D41CC7AA534/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ChristianRRL_0-1731949214474.png" alt="ChristianRRL_0-1731949214474.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Below I show an error trying to query these fields.. not sure if I'm doing something wrong:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ChristianRRL_1-1731949348303.png" style="width: 549px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12967iB6AF4C5DABB11CEF/image-dimensions/549x277?v=v2" width="549" height="277" role="button" title="ChristianRRL_1-1731949348303.png" alt="ChristianRRL_1-1731949348303.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2024 17:06:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99224#M39930</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2024-11-18T17:06:32Z</dc:date>
    </item>
    <item>
      <title>Re: CREATE view USING json and *include* _metadata, _rescued_data</title>
      <link>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99225#M39931</link>
      <description>&lt;P&gt;Forgot to add this as reference:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;A href="https://spark.apache.org/docs/latest/sql-data-sources-json.html" target="_blank"&gt;JSON Files - Spark 3.5.3 Documentation&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;The JSON Files documentation doesn't seem to show any OPTIONS that may enable the functionality that I'm looking for.. But please feel free to correct me if there's a way to achieve what I'm looking for that I may be overlooking!&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2024 17:08:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99225#M39931</guid>
      <dc:creator>ChristianRRL</dc:creator>
      <dc:date>2024-11-18T17:08:37Z</dc:date>
    </item>
    <item>
      <title>Re: CREATE view USING json and *include* _metadata, _rescued_data</title>
      <link>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99601#M40040</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96188"&gt;@ChristianRRL&lt;/a&gt;&amp;nbsp;, I'll be looking into this, and I'll get back to you with an answer&lt;/P&gt;</description>
      <pubDate>Thu, 21 Nov 2024 11:03:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99601#M40040</guid>
      <dc:creator>Nam_Nguyen</dc:creator>
      <dc:date>2024-11-21T11:03:56Z</dc:date>
    </item>
    <item>
      <title>Re: CREATE view USING json and *include* _metadata, _rescued_data</title>
      <link>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99753#M40081</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96188"&gt;@ChristianRRL&lt;/a&gt;&amp;nbsp;, as a first quick look, could you please try to create a PySpark dataframe with the _metadata and _rescued_data columns, query the dataframe to make sure you can see those columns, and then create a view using this dataframe?&lt;/P&gt;</description>
      <pubDate>Fri, 22 Nov 2024 10:02:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99753#M40081</guid>
      <dc:creator>Nam_Nguyen</dc:creator>
      <dc:date>2024-11-22T10:02:50Z</dc:date>
    </item>
    <item>
      <title>Re: CREATE view USING json and *include* _metadata, _rescued_data</title>
      <link>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99786#M40096</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/96188"&gt;@ChristianRRL&lt;/a&gt;&amp;nbsp;You can still use the same method read_files when creating the view, I see that you are using classic hive style reader instead of using the read_files in the actual view definition of sql and you don't need to use spark.sql, please see below.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="akhil393_0-1732290590854.jpeg" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/13078i7B98A2639917F328/image-size/large?v=v2&amp;amp;px=999" role="button" title="akhil393_0-1732290590854.jpeg" alt="akhil393_0-1732290590854.jpeg" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Example code:&amp;nbsp;&lt;/P&gt;
&lt;DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;create or replace&lt;/SPAN&gt; &lt;SPAN&gt;view&lt;/SPAN&gt; &lt;SPAN&gt;json_view&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;as&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;SELECT&lt;/SPAN&gt;&lt;SPAN&gt; _metadata, &lt;/SPAN&gt;&lt;SPAN&gt;*&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;FROM&lt;/SPAN&gt;&lt;SPAN&gt; read_files(&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;'s3://********/_delta_log/*.json'&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;format&lt;/SPAN&gt; &lt;SPAN&gt;=&amp;gt;&lt;/SPAN&gt; &lt;SPAN&gt;'json'&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Nov 2024 15:59:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/99786#M40096</guid>
      <dc:creator>akhil393</dc:creator>
      <dc:date>2024-11-22T15:59:11Z</dc:date>
    </item>
    <item>
      <title>Re: CREATE view USING json and *include* _metadata, _rescued_data</title>
      <link>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/105120#M41999</link>
      <description>&lt;DIV&gt;I am able to perform the below operation for a delta table.&lt;/DIV&gt;
&lt;DIV&gt;&lt;LI-CODE lang="markup"&gt;SELECT *,_metadata.file_name FROM anytable where condition.&lt;/LI-CODE&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/en/ingestion/file-metadata-column.html" target="_blank"&gt;https://docs.databricks.com/en/ingestion/file-metadata-column.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;You can use something like&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;SPAN class="n"&gt;df&lt;/SPAN&gt; &lt;SPAN class="o"&gt;=&lt;/SPAN&gt; &lt;SPAN class="n"&gt;spark&lt;/SPAN&gt;&lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;read&lt;/SPAN&gt; \
  &lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;format&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="s2"&gt;"json"&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt; \
  &lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;schema&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="n"&gt;schema&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt; \
  &lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;load&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="s2"&gt;"dbfs:/tmp/*"&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt; \
  &lt;SPAN class="o"&gt;.&lt;/SPAN&gt;&lt;SPAN class="n"&gt;select&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="s2"&gt;"*"&lt;/SPAN&gt;&lt;SPAN class="p"&gt;,&lt;/SPAN&gt; &lt;SPAN class="s2"&gt;"_metadata"&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt;

&lt;SPAN class="n"&gt;display&lt;/SPAN&gt;&lt;SPAN class="p"&gt;(&lt;/SPAN&gt;&lt;SPAN class="n"&gt;df&lt;/SPAN&gt;&lt;SPAN class="p"&gt;)&lt;/SPAN&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;. You can enable the rescued data column by setting the option&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;rescuedDataColumn&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;to a column name, such as&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;_rescued_data&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;with&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class="docutils literal notranslate"&gt;&lt;SPAN class="pre"&gt;spark.read.option("rescuedDataColumn",&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="pre"&gt;"_rescued_data").format("json").load(&amp;lt;path&amp;gt;)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/en/query/formats/json.html#rescued-data-column" target="_blank"&gt;https://docs.databricks.com/en/query/formats/json.html#rescued-data-column&lt;/A&gt;&lt;/P&gt;
&lt;/DIV&gt;</description>
      <pubDate>Fri, 10 Jan 2025 07:26:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-view-using-json-and-include-metadata-rescued-data/m-p/105120#M41999</guid>
      <dc:creator>NandiniN</dc:creator>
      <dc:date>2025-01-10T07:26:36Z</dc:date>
    </item>
  </channel>
</rss>

