facing format issue while converting one type nested json to other brand new json schema
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-07-2021 09:50 PM
Hi,
We are writing our flatten json dataframe to user defined nested schema json using pysprk in Databricks.But we are not getting the expected format
Expecting :
{"ID":"aaa",c_id":[{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"1"},{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"2"},{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"3"}]
But Getting :
{"ID":"aaa",c_id":{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"1"}},
{"ID":"aaa",c_id":{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"2"}},
{"ID":"aaa",c_id":{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"3"}}
We tried with group_by and collect list but not getting in expected format.
Could someone please help us is there any way to achieve it.
Thank you in advance
- Labels:
-
Azure databricks
-
Format Issue
-
JSON
-
Pyspark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2021 12:43 AM
I don't know what your code is, so you should probably share it.
And also the starting json
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2021 02:24 AM
as @wereners said you need to share the code. If it is dataframe to json probably you need to use StructType - Array to get that list but without code is hard to help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2021 04:17 AM
Hi,
Thank you for the reply..
Here I am sharing the code piece
df_global_op=df_global.withColumn("Definitions",struct((df_global.id).alias("ID"),\
struct((df_global.a).alias("con"),\
(df_global.b).alias("createdate"),\
(df_global.c).alias("data"),\
(df_global.d).alias("id")).\
alias("c_id"))).\
drop(*global_fields).select("Definitions.*").distinct().write.\
format("json").\
option("ignoreNullFields", "false").\
save("/mnt/test/op/12-08-2021")
Please be noted df_global is a flatten df of input json.. Here we are deriving output json on top of flatten one based on requested schema.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2021 10:20 PM

