Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-07-2022 01:11 AM
hello,
I have a non standard Json file with a nested file structure that I have issues with. Here is an example of the json file.
jsonfile= """[ {
"success":true,
"numRows":2,
"data":{
"58251":{
"invoiceno":"58251",
"name":"invoice1",
"companyId":"1000",
"departmentId":"1",
"lines":{
"40000":{
"invoiceline":"40000",
"productid":"1",
"amount":"10000",
"quantity":"7"
},
"40001":{
"invoiceline":"40001",
"productid":"2",
"amount":"9000",
"quantity":"7"
}
}
},
"58252":{
"invoiceno":"58252",
"name":"invoice34",
"companyId":"1001",
"departmentId":"2",
"lines":{
"40002":{
"invoiceline":"40002",
"productid":"3",
"amount":"7000",
"quantity":"6"
},
"40003":{
"invoiceline":"40003",
"productid":"2",
"amount":"9000",
"quantity":"7"
} } } } }]"""
import pandas as pd
df = pd.read_json(jsonfile)
display(df)The problem here is that some of the keys are used as "58252", instead of "id":"58252" or blank value. This is the case for both the "Data" array and the "lines" array. So I have tried using standard functions in spark with json_normalize or explode but it doesnt seem to work with this particular json format.
The endgoal is to get a dataframe looking like this:
I hope you can give me some advice on how to approach and handle these files in databricks.
Labels:
- Labels:
-
Dataframe