<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Create table from json and flatten in the same SQL in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/create-table-from-json-and-flatten-in-the-same-sql/m-p/104372#M41719</link>
    <description>&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;To create a table directly from a JSON file and flatten it using SQL in Databricks, you can use the &lt;CODE&gt;CREATE TABLE&lt;/CODE&gt; statement with the &lt;CODE&gt;USING JSON&lt;/CODE&gt; clause. However, SQL alone does not provide a direct way to flatten nested JSON structures. You would typically need to use a combination of SQL and DataFrame operations to achieve this.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;Here is an example of how you can create a table from a JSON file using SQL:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql _1t7bu9hb hljs language-sql gb5fhw3"&gt;&lt;SPAN class="hljs-comment"&gt;-- Create a table from a JSON file&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;CREATE&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;TABLE&lt;/SPAN&gt; your_table_name
&lt;SPAN class="hljs-keyword"&gt;USING&lt;/SPAN&gt; JSON
OPTIONS (path &lt;SPAN class="hljs-string"&gt;'path/to/your/json/file'&lt;/SPAN&gt;);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="gb5fhw4"&gt;
&lt;DIV&gt;
&lt;DIV class=" iwpqfy0"&gt;&lt;SPAN&gt;In this example:&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI&gt;Replace &lt;CODE&gt;'path/to/your/json/file'&lt;/CODE&gt; with the actual path to your JSON file.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;To flatten the nested JSON structure, you would need to use DataFrame operations in combination with SQL. Here is an example of how you can achieve this using PySpark:&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python _1t7bu9hb hljs language-python gb5fhw3"&gt;
&lt;SPAN class="hljs-comment"&gt;# Read the JSON file into a DataFrame&lt;/SPAN&gt;
df = spark.read.json(&lt;SPAN class="hljs-string"&gt;"path/to/your/json/file"&lt;/SPAN&gt;)

&lt;SPAN class="hljs-comment"&gt;# Flatten the nested JSON structure&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;from&lt;/SPAN&gt; pyspark.sql.functions &lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; explode
df_flattened = df.withColumn(&lt;SPAN class="hljs-string"&gt;"Price"&lt;/SPAN&gt;, explode(df.Prices)).select(&lt;SPAN class="hljs-string"&gt;"Clnt"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"NaDt"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"PrCd"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"IFCd"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"GLId"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"Price.*"&lt;/SPAN&gt;)

&lt;SPAN class="hljs-comment"&gt;# Create a temporary view from the DataFrame&lt;/SPAN&gt;
df_flattened.createOrReplaceTempView(&lt;SPAN class="hljs-string"&gt;"temp_view"&lt;/SPAN&gt;)

&lt;SPAN class="hljs-comment"&gt;# Use SQL to create a table from the view&lt;/SPAN&gt;
spark.sql(&lt;SPAN class="hljs-string"&gt;"""
    CREATE TABLE your_table_name AS
    SELECT * FROM temp_view
"""&lt;/SPAN&gt;)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;</description>
    <pubDate>Mon, 06 Jan 2025 14:42:35 GMT</pubDate>
    <dc:creator>Walter_C</dc:creator>
    <dc:date>2025-01-06T14:42:35Z</dc:date>
    <item>
      <title>Create table from json and flatten in the same SQL</title>
      <link>https://community.databricks.com/t5/data-engineering/create-table-from-json-and-flatten-in-the-same-sql/m-p/104371#M41718</link>
      <description>&lt;P&gt;Any help in writing SQL to create the table using JSON file and flatten in the same step? As I'm new to JSON it would be nice if someone can give the heads up by referencing to any document or help provide the recommended the solution.&lt;/P&gt;&lt;P&gt;Sample JSON file is like,&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;{"Clnt":"95901","NaDt":"2023-05-04","PrCd":"ET","IFCd":"RDM121884","GLId":"1014765",
"Prices":[{"Ord":5,"Vndr":"Test","Prc":45850.00,"Cur":"RW","Typ":"Bidd","Dt":"2023-05-04","Cmnt":"Test"},
          {"Ord":6,"Expr":"Test","Prc":45950.00,"Cur":"RW","Typ":"NA","Dt":"2023-06-04","Cmnt":"New"}]}
{"Clnt":"95903","NaDt":"20223-05-04","PrCd":"ET","IFCd":"RDM121884","GLId":"1014765",
"Prices":[{"Ord":8,"Vndr":"Test","Prc":45750.00,"Cur":"RW","Typ":"Test","Dt":"2022-05-04","Cmnt":"others"}]}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jan 2025 14:39:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-table-from-json-and-flatten-in-the-same-sql/m-p/104371#M41718</guid>
      <dc:creator>David_Billa</dc:creator>
      <dc:date>2025-01-06T14:39:33Z</dc:date>
    </item>
    <item>
      <title>Re: Create table from json and flatten in the same SQL</title>
      <link>https://community.databricks.com/t5/data-engineering/create-table-from-json-and-flatten-in-the-same-sql/m-p/104372#M41719</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;To create a table directly from a JSON file and flatten it using SQL in Databricks, you can use the &lt;CODE&gt;CREATE TABLE&lt;/CODE&gt; statement with the &lt;CODE&gt;USING JSON&lt;/CODE&gt; clause. However, SQL alone does not provide a direct way to flatten nested JSON structures. You would typically need to use a combination of SQL and DataFrame operations to achieve this.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;Here is an example of how you can create a table from a JSON file using SQL:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql _1t7bu9hb hljs language-sql gb5fhw3"&gt;&lt;SPAN class="hljs-comment"&gt;-- Create a table from a JSON file&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;CREATE&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;TABLE&lt;/SPAN&gt; your_table_name
&lt;SPAN class="hljs-keyword"&gt;USING&lt;/SPAN&gt; JSON
OPTIONS (path &lt;SPAN class="hljs-string"&gt;'path/to/your/json/file'&lt;/SPAN&gt;);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="gb5fhw4"&gt;
&lt;DIV&gt;
&lt;DIV class=" iwpqfy0"&gt;&lt;SPAN&gt;In this example:&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI&gt;Replace &lt;CODE&gt;'path/to/your/json/file'&lt;/CODE&gt; with the actual path to your JSON file.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;To flatten the nested JSON structure, you would need to use DataFrame operations in combination with SQL. Here is an example of how you can achieve this using PySpark:&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python _1t7bu9hb hljs language-python gb5fhw3"&gt;
&lt;SPAN class="hljs-comment"&gt;# Read the JSON file into a DataFrame&lt;/SPAN&gt;
df = spark.read.json(&lt;SPAN class="hljs-string"&gt;"path/to/your/json/file"&lt;/SPAN&gt;)

&lt;SPAN class="hljs-comment"&gt;# Flatten the nested JSON structure&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;from&lt;/SPAN&gt; pyspark.sql.functions &lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; explode
df_flattened = df.withColumn(&lt;SPAN class="hljs-string"&gt;"Price"&lt;/SPAN&gt;, explode(df.Prices)).select(&lt;SPAN class="hljs-string"&gt;"Clnt"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"NaDt"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"PrCd"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"IFCd"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"GLId"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"Price.*"&lt;/SPAN&gt;)

&lt;SPAN class="hljs-comment"&gt;# Create a temporary view from the DataFrame&lt;/SPAN&gt;
df_flattened.createOrReplaceTempView(&lt;SPAN class="hljs-string"&gt;"temp_view"&lt;/SPAN&gt;)

&lt;SPAN class="hljs-comment"&gt;# Use SQL to create a table from the view&lt;/SPAN&gt;
spark.sql(&lt;SPAN class="hljs-string"&gt;"""
    CREATE TABLE your_table_name AS
    SELECT * FROM temp_view
"""&lt;/SPAN&gt;)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 06 Jan 2025 14:42:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/create-table-from-json-and-flatten-in-the-same-sql/m-p/104372#M41719</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-06T14:42:35Z</dc:date>
    </item>
  </channel>
</rss>

