cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

facing format issue while converting one type nested json to other brand new json schema

SailajaB
Valued Contributor III

Hi,

We are writing our flatten json dataframe to user defined nested schema json using pysprk in Databricks.But we are not getting the expected format

Expecting :

{"ID":"aaa",c_id":[{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"1"},{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"2"},{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"3"}]

But Getting :

{"ID":"aaa",c_id":{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"1"}},

{"ID":"aaa",c_id":{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"2"}},

{"ID":"aaa",c_id":{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"3"}}

We tried with group_by and collect list but not getting in expected format.

Could someone please help us is there any way to achieve it.

Thank you in advance

4 REPLIES 4

-werners-
Esteemed Contributor III

I don't know what your code is, so you should probably share it.

And also the starting json

Hubert-Dudek
Esteemed Contributor III

as @wereners said you need to share the code. If it is dataframe to json probably you need to use StructType - Array to get that list but without code is hard to help.

SailajaB
Valued Contributor III

Hi,

Thank you for the reply..

Here I am sharing the code piece

df_global_op=df_global.withColumn("Definitions",struct((df_global.id).alias("ID"),\

                      struct((df_global.a).alias("con"),\

(df_global.b).alias("createdate"),\

(df_global.c).alias("data"),\

(df_global.d).alias("id")).\

alias("c_id"))).\

drop(*global_fields).select("Definitions.*").distinct().write.\

  format("json").\

  option("ignoreNullFields", "false").\

  save("/mnt/test/op/12-08-2021")

Please be noted df_global is a flatten df of input json.. Here we are deriving output json on top of flatten one based on requested schema.

Thank you

SailajaB
Valued Contributor III

@HubertDudek , @werners

Is there any way to resolve the above one?

Thank you

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.