10-17-2021 11:52 AM
I am doing a convertion of a data frame to nested dict/json. One of the column called "Problematic__c" is boolean type.
For some reason json does not accept this data type retriving error: "Object of type bool_ is not JSON serializable"
I need this as boolean as this json is later injected to Salesforce via API. I could easly make it string but the destination object accept boolean only.
Here is a python code:
all_rows = len(data)
y = []
for i in range(all_rows):
x = dict(data.iloc[i, 2:])
x["Account__r"] = dict(data.iloc[i, :1])
x["Product_Master__r"] = dict(data.iloc[i, 1:2])
y.append(x)
y=json.dumps(y)
and this is expected output:
[
{
"Recommended_Action__c":"Take action Z",
"Extra_Information_JSON__c":"[{\"name\":\"Action\",\"value\":\"Verifier remplissage\"},{\"name\":\"Stock Disponible\",\"value\":\"18\"}]",
"Flag__c":"Rupture Ponctuel",
"Problematic__c":True,
"Value__c":5800.0,
"Source_Id__c":"538.0",
"Batch__c":"a2e7Y110000dO6WQAU",
"Account__r":{
"Code__c":"00001-B"
},
"Product_Master__r":{
"EAN_Code__c":"1111111111.0"
}
},
{
....
}.
.....
]
Data frame called "data" has structure as below with sample values:
"Code__c":"00001-B"
"EAN_Code__c":"1111111111.0"
"Recommended_Action__c":"Take action Z",
"Extra_Information_JSON__c":"[{\"name\":\"Action\",\"value\":\"Verifier remplissage\"},{\"name\":\"Stock Disponible\",\"value\":\"18\"}]",
"Flag__c":"Rupture Ponctuel",
"Problematic__c":True,
"Value__c":5800.0,
"Source_Id__c":"538.0",
"Batch__c":"a2e7Y110000dO6WQAU",
"Code__c":"00001-B"
"EAN_Code__c":"1111111111.0"
10-18-2021 11:01 AM
You can just use `to_json` to achieve this. Here is an example:
from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.sql.functions import to_json
data = [(1, Row(Code__c="00001-B",
EAN_Code__c="1111111111.0",
Extra_Information_JSON__c="[{\"name\":\"Action\",\"value\":\"Verifier remplissage\"},{\"name\":\"Stock Disponible\",\"value\":\"18\"}]",
Flag__c="Rupture Ponctuel",
Problematic__c=True))]
df = spark.createDataFrame(data, ("key", "value"))
display(df.select(to_json(df.value).alias("json")))
This is just an example to point you in the right direction, you may need to adapt it to your specific input format. This is meant to run in a Databricks notebook, otherwise the final `display` will not work.
10-18-2021 03:31 AM
Hi I had similar problem with boolean but with export to different data format.
df2 = df1.select(df1.Account__r, df1.Product_Master__r)
df2.coalesce(1).write.format('json').save('/path/file_name.json')
10-18-2021 05:51 AM
Thanks but not sure how do I "write json directly from dataframe without dict and looping".
df1.Account__r or df1.Product_Master__r simply won't work as there are no such objects as "Account__r " or "Product_Master__r" in a dataframe. That's why I used dict to create it.
10-18-2021 07:39 AM
you can achieve it by transforming dataframe using built-in spark functions etc.
10-18-2021 11:01 AM
You can just use `to_json` to achieve this. Here is an example:
from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.sql.functions import to_json
data = [(1, Row(Code__c="00001-B",
EAN_Code__c="1111111111.0",
Extra_Information_JSON__c="[{\"name\":\"Action\",\"value\":\"Verifier remplissage\"},{\"name\":\"Stock Disponible\",\"value\":\"18\"}]",
Flag__c="Rupture Ponctuel",
Problematic__c=True))]
df = spark.createDataFrame(data, ("key", "value"))
display(df.select(to_json(df.value).alias("json")))
This is just an example to point you in the right direction, you may need to adapt it to your specific input format. This is meant to run in a Databricks notebook, otherwise the final `display` will not work.
11-03-2021 06:51 PM
Thanks
10-22-2021 01:16 AM
Thanks Dan, that make sens!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group