Databricks Community

Braxx · ‎10-17-2021

I am doing a convertion of a data frame to nested dict/json. One of the column called "Problematic__c" is boolean type.

For some reason json does not accept this data type retriving error: "Object of type bool_ is not JSON serializable"

I need this as boolean as this json is later injected to Salesforce via API. I could easly make it string but the destination object accept boolean only.

Here is a python code:

all_rows = len(data)
y = []
    
for i in range(all_rows):    
    x = dict(data.iloc[i, 2:])
    
    x["Account__r"] = dict(data.iloc[i, :1])
    x["Product_Master__r"] = dict(data.iloc[i, 1:2])
    y.append(x)
    y=json.dumps(y)

and this is expected output:

[
   {
      "Recommended_Action__c":"Take action Z",
      "Extra_Information_JSON__c":"[{\"name\":\"Action\",\"value\":\"Verifier remplissage\"},{\"name\":\"Stock Disponible\",\"value\":\"18\"}]",
      "Flag__c":"Rupture Ponctuel",
      "Problematic__c":True,
      "Value__c":5800.0,
      "Source_Id__c":"538.0",
      "Batch__c":"a2e7Y110000dO6WQAU",
      "Account__r":{
         "Code__c":"00001-B"
      },
      "Product_Master__r":{
         "EAN_Code__c":"1111111111.0"
      }
    },
    {  
        ....
    }.
    .....
 ]

Data frame called "data" has structure as below with sample values:

"Code__c":"00001-B"
"EAN_Code__c":"1111111111.0"
"Recommended_Action__c":"Take action Z",
"Extra_Information_JSON__c":"[{\"name\":\"Action\",\"value\":\"Verifier remplissage\"},{\"name\":\"Stock Disponible\",\"value\":\"18\"}]",
"Flag__c":"Rupture Ponctuel",
"Problematic__c":True,
"Value__c":5800.0,
"Source_Id__c":"538.0",
"Batch__c":"a2e7Y110000dO6WQAU",
"Code__c":"00001-B"
"EAN_Code__c":"1111111111.0"

Dan_Z · ‎10-18-2021

You can just use `to_json` to achieve this. Here is an example:

from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.sql.functions import to_json
 
data = [(1, Row(Code__c="00001-B", 
                EAN_Code__c="1111111111.0",
                Extra_Information_JSON__c="[{\"name\":\"Action\",\"value\":\"Verifier remplissage\"},{\"name\":\"Stock Disponible\",\"value\":\"18\"}]",
                Flag__c="Rupture Ponctuel",
                Problematic__c=True))]
 
df = spark.createDataFrame(data, ("key", "value"))
display(df.select(to_json(df.value).alias("json")))

This is just an example to point you in the right direction, you may need to adapt it to your specific input format. This is meant to run in a Databricks notebook, otherwise the final `display` will not work.

View solution in original post

Hubert-Dudek · ‎10-18-2021

Hi I had similar problem with boolean but with export to different data format.

please try to write json directly from dataframe without dict and looping (all needed transformation can be done in dataframe):

df2 = df1.select(df1.Account__r, df1.Product_Master__r)
df2.coalesce(1).write.format('json').save('/path/file_name.json')

you can also write spark dataframe also directly to Salesforce please check https://github.com/springml/spark-salesforce

Braxx · ‎10-18-2021

Thanks but not sure how do I "write json directly from dataframe without dict and looping".

df1.Account__r or df1.Product_Master__r simply won't work as there are no such objects as "Account__r " or "Product_Master__r" in a dataframe. That's why I used dict to create it.

Hubert-Dudek · ‎10-18-2021

you can achieve it by transforming dataframe using built-in spark functions etc.

Dan_Z · ‎10-18-2021

You can just use `to_json` to achieve this. Here is an example:

from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.sql.functions import to_json
 
data = [(1, Row(Code__c="00001-B", 
                EAN_Code__c="1111111111.0",
                Extra_Information_JSON__c="[{\"name\":\"Action\",\"value\":\"Verifier remplissage\"},{\"name\":\"Stock Disponible\",\"value\":\"18\"}]",
                Flag__c="Rupture Ponctuel",
                Problematic__c=True))]
 
df = spark.createDataFrame(data, ("key", "value"))
display(df.select(to_json(df.value).alias("json")))

This is just an example to point you in the right direction, you may need to adapt it to your specific input format. This is meant to run in a Databricks notebook, otherwise the final `display` will not work.