cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

facing format issue while converting one type nested json to other brand new json schema

SailajaB
Valued Contributor III

Hi,

We are writing our flatten json dataframe to user defined nested schema json using pysprk in Databricks.But we are not getting the expected format

Expecting :

{"ID":"aaa",c_id":[{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"1"},{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"2"},{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"3"}]

But Getting :

{"ID":"aaa",c_id":{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"1"}},

{"ID":"aaa",c_id":{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"2"}},

{"ID":"aaa",c_id":{"con":null,"createdate":"2015-10-09T00:00:00Z","data":null,"id":"3"}}

We tried with group_by and collect list but not getting in expected format.

Could someone please help us is there any way to achieve it.

Thank you in advance

4 REPLIES 4

-werners-
Esteemed Contributor III

I don't know what your code is, so you should probably share it.

And also the starting json

Hubert-Dudek
Esteemed Contributor III

as @wereners said you need to share the code. If it is dataframe to json probably you need to use StructType - Array to get that list but without code is hard to help.

SailajaB
Valued Contributor III

Hi,

Thank you for the reply..

Here I am sharing the code piece

df_global_op=df_global.withColumn("Definitions",struct((df_global.id).alias("ID"),\

                      struct((df_global.a).alias("con"),\

(df_global.b).alias("createdate"),\

(df_global.c).alias("data"),\

(df_global.d).alias("id")).\

alias("c_id"))).\

drop(*global_fields).select("Definitions.*").distinct().write.\

  format("json").\

  option("ignoreNullFields", "false").\

  save("/mnt/test/op/12-08-2021")

Please be noted df_global is a flatten df of input json.. Here we are deriving output json on top of flatten one based on requested schema.

Thank you

SailajaB
Valued Contributor III

@HubertDudek , @werners

Is there any way to resolve the above one?

Thank you

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group