07-12-2022 10:27 AM
Hi,
I am loading a JSON file into Databricks by simply doing the following:
from pyspark.sql.functions import *
from pyspark.sql.types import *
bronze_path="wasbs://....../140477.json"
df_incremental = spark.read.option("multiline","true").json(bronze_path)
display(df_incremental)
My JSON file is complicated and is displayed:
I want to be able to load this data into a delta table.
My schema is:
type AutoGenerated struct {
Audit struct {
Refno string `json:"refno"`
Formid string `json:"formid"`
AuditName string `json:"audit_name"`
AuditorName string `json:"auditor_name"`
Location string `json:"location"`
Fulllocation string `json:"fulllocation"`
Published string `json:"published"`
Date string `json:"date"`
Compliant string `json:"compliant"`
Archived string `json:"archived"`
Score string `json:"score"`
PossibleScore string `json:"possible_score"`
Percentage string `json:"percentage"`
Answers []struct {
QuestionNumber string `json:"question_number"`
Question string `json:"question"`
Status string `json:"status"`
Answerid string `json:"answerid"`
Questionid string `json:"questionid"`
Answer string `json:"answer"`
Ansoptid string `json:"ansoptid,omitempty"`
Observation string `json:"observation"`
Compliant string `json:"compliant"`
Score string `json:"score"`
PossibleScore string `json:"possible_score"`
DateResolved string `json:"date_resolved"`
ResolvedByUser string `json:"resolved_by_user"`
DateCompliant string `json:"date_compliant"`
Compliancy []interface{} `json:"compliancy"`
HookName string `json:"hookName"`
DateAuthorised string `json:"date_authorised,omitempty"`
AuthorisedByUser string `json:"authorised_by_user,omitempty"`
} `json:"answers"`
} `json:"audit"`
}
Any idea how to do this?
07-13-2022 06:45 AM
delta can handle nested columns so you can just write it to delta lake.
Have you tried it already? It will probably just work.
If you talke about new incoming data that you want to merge in an existing table, that will be a tad more complex. You need to define a merge key which will decide what operation you will do (insert, delete update).
07-13-2022 06:55 AM
How would I write this to a delta table to see if it works?
07-13-2022 06:59 AM
ok so you already have a df with your json data (with inferred schema).
Next step is:
df.write \
.format("delta") \
.mode("overwrite") \
.save("<whereveryouwanttostoreyourdata>")
09-03-2022 10:26 PM
Hi @Lloyd Vickery
Does @Werner Stinckens response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?
We'd love to hear from you.
Thanks!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group