Create table from json and flatten in the same SQL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2025 06:39 AM
Any help in writing SQL to create the table using JSON file and flatten in the same step? As I'm new to JSON it would be nice if someone can give the heads up by referencing to any document or help provide the recommended the solution.
Sample JSON file is like,
{"Clnt":"95901","NaDt":"2023-05-04","PrCd":"ET","IFCd":"RDM121884","GLId":"1014765",
"Prices":[{"Ord":5,"Vndr":"Test","Prc":45850.00,"Cur":"RW","Typ":"Bidd","Dt":"2023-05-04","Cmnt":"Test"},
{"Ord":6,"Expr":"Test","Prc":45950.00,"Cur":"RW","Typ":"NA","Dt":"2023-06-04","Cmnt":"New"}]}
{"Clnt":"95903","NaDt":"20223-05-04","PrCd":"ET","IFCd":"RDM121884","GLId":"1014765",
"Prices":[{"Ord":8,"Vndr":"Test","Prc":45750.00,"Cur":"RW","Typ":"Test","Dt":"2022-05-04","Cmnt":"others"}]}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2025 06:42 AM
To create a table directly from a JSON file and flatten it using SQL in Databricks, you can use the CREATE TABLE
statement with the USING JSON
clause. However, SQL alone does not provide a direct way to flatten nested JSON structures. You would typically need to use a combination of SQL and DataFrame operations to achieve this.
Here is an example of how you can create a table from a JSON file using SQL:
-- Create a table from a JSON file
CREATE TABLE your_table_name
USING JSON
OPTIONS (path 'path/to/your/json/file');
- Replace
'path/to/your/json/file'
with the actual path to your JSON file.
To flatten the nested JSON structure, you would need to use DataFrame operations in combination with SQL. Here is an example of how you can achieve this using PySpark:
# Read the JSON file into a DataFrame
df = spark.read.json("path/to/your/json/file")
# Flatten the nested JSON structure
from pyspark.sql.functions import explode
df_flattened = df.withColumn("Price", explode(df.Prices)).select("Clnt", "NaDt", "PrCd", "IFCd", "GLId", "Price.*")
# Create a temporary view from the DataFrame
df_flattened.createOrReplaceTempView("temp_view")
# Use SQL to create a table from the view
spark.sql("""
CREATE TABLE your_table_name AS
SELECT * FROM temp_view
""")

