03-05-2024 03:16 PM
Hi,
This is my sample JSON data which is generated from api response and it is all coming in a single row. I want to split this in multiple rows and store it in a dataframe.
[{"transaction_id":"F6001EC5-528196D1","corrects_transaction_id":null,"transaction_type":"Call","original_order_id":"F6001EC5-528196D1","advertiser_campaign_id":"9528415", {"transaction_id":"119F26CF-2304AAE6","corrects_transaction_id":null,"transaction_type":"Call","original_order_id":"119F26CF-2304AAE6","advertiser_campaign_id":"9528009"}]
I want to store it in multiple rows like below, both the key and the values. --
{"transaction_id":"F6001EC5-528196D1","corrects_transaction_id":null,"transaction_type":"Call","original_order_id":"F6001EC5-528196D1","advertiser_campaign_id":"9528415"}
{"transaction_id":"119F26CF-2304AAE6","corrects_transaction_id":null,"transaction_type":"Call","original_order_id":"119F26CF-2304AAE6","advertiser_campaign_id":"9528009"}
Appreciate your help.
03-05-2024 05:49 PM
json="""[{"transaction_id":"F6001EC5-528196D1","corrects_transaction_id":null,"transaction_type":"Call","original_order_id":"F6001EC5-528196D1","advertiser_campaign_id":"9528415"}, {"transaction_id":"119F26CF-2304AAE6","corrects_transaction_id":null,"transaction_type":"Call","original_order_id":"119F26CF-2304AAE6","advertiser_campaign_id":"9528009"}]"""
df=spark.createDataFrame([[json]],['json_str'])
from pyspark.sql.types import *
from pyspark.sql import functions as F
json_schema=ArrayType(StructType([ StructField("transaction_id", StringType())]))
df2 = df.withColumn("json",F.explode(F.from_json("json_str",json_schema)))
df2.display()
03-05-2024 05:49 PM
json="""[{"transaction_id":"F6001EC5-528196D1","corrects_transaction_id":null,"transaction_type":"Call","original_order_id":"F6001EC5-528196D1","advertiser_campaign_id":"9528415"}, {"transaction_id":"119F26CF-2304AAE6","corrects_transaction_id":null,"transaction_type":"Call","original_order_id":"119F26CF-2304AAE6","advertiser_campaign_id":"9528009"}]"""
df=spark.createDataFrame([[json]],['json_str'])
from pyspark.sql.types import *
from pyspark.sql import functions as F
json_schema=ArrayType(StructType([ StructField("transaction_id", StringType())]))
df2 = df.withColumn("json",F.explode(F.from_json("json_str",json_schema)))
df2.display()
03-06-2024 04:57 PM
Thanks a lot for your prompt response. The solution recommended worked and produced the desired results. Once again, I appreciate your help !!!
03-06-2024 05:58 PM
This solution is working fine on this sample data. But when I implement on the actual api response, the dataframe which gets produced has no data even though I am able to print the response.text and see the data.
This is how I am creating the dataframe
03-06-2024 06:44 PM
please give a sample of response_text
03-06-2024 07:26 PM - edited 03-06-2024 07:32 PM
Here' a sample row --
[{"transaction_id":"F6001EC5-528196D1","corrects_transaction_id":null,"transaction_type":"Call","original_order_id":"F6001EC5-528196D1","advertiser_campaign_id":"9528415","advertiser_campaign_id_from_network":"9528415","advertiser_campaign_name":"xxxxx","media_type":"Online: Other","call_source_description":"386-749-8349","syndicated_ident":"","promo_line_description":"DAY: OBE xxxx Business Listing","virtual_line_id":"22949255","call_result_description_detail_managed_advertiser":"Profile Promo Number","city":"xxxxx","region":"FL","qualified_regions":"any","repeat_calling_phone_number":"No","calling_phone_number":"xxx-289-xxxx","mobile":"Mobile","duration":1,"connect_duration":0,"ivr_duration":1,"keypresses":"","keypress_1":"","keypress_2":"","keypress_3":"","keypress_4":"","dynamic_number_pool_referrer_search_engine":null,"dynamic_number_pool_referrer_search_keywords_id":"","dynamic_number_pool_referrer_search_keywords":"","dynamic_number_pool_referrer_ad":"","dynamic_number_pool_referrer_ad_id":"","dynamic_number_pool_referrer_ad_group":"","dynamic_number_pool_referrer_ad_group_id":"","dynamic_number_pool_referrer_referrer_campaign":"","dynamic_number_pool_referrer_referrer_campaign_id":"","dynamic_number_pool_referrer_keyword_match_type":null,"dynamic_number_pool_referrer_param1_name":null,"dynamic_number_pool_referrer_param1_value":null,"dynamic_number_pool_referrer_param2_name":null,"dynamic_number_pool_referrer_param2_value":null,"dynamic_number_pool_referrer_param3_name":null,"dynamic_number_pool_referrer_param3_value":null,"dynamic_number_pool_referrer_param4_name":null,"dynamic_number_pool_referrer_param4_value":null,"dynamic_number_pool_referrer_param5_name":null,"dynamic_number_pool_referrer_param5_value":null,"dynamic_number_pool_referrer_param6_name":null,"dynamic_number_pool_referrer_param6_value":null,"dynamic_number_pool_referrer_param7_name":null,"dynamic_number_pool_referrer_param7_value":null,"dynamic_number_pool_referrer_param8_name":null,"dynamic_number_pool_referrer_param8_value":null,"dynamic_number_pool_referrer_param9_name":null,"dynamic_number_pool_referrer_param9_value":null,"dynamic_number_pool_referrer_param10_name":null,"dynamic_number_pool_referrer_param10_value":null,"dynamic_number_pool_referrer_search_type":null,"dynamic_number_pool_pool_type":null,"dynamic_number_pool_id":null,"start_time_local":"2024-02-13 22:21:18 -05:00","start_time_xml":"2024-02-13T22:21:18","start_time_utc":1707880878627,"start_time_network_timezone":"2024-02-13 22:21:18 -05:00","start_time_network_timezone_xml":"2024-02-13T22:21:18","recording":null,"corrected_at":null,"opt_in_SMS":0,"complete_call_id":"F300-1E5E5784C609","transfer_from_type":"Profile Direct","notes":"","verified_zip":"","hangup_cause":"Caller: Hang-up","Answered by Agent":0,"Not Answered by Agent":1,"Answered by Voicemail":0,"Voicemail Left":0,"English Existing Patient":null,"English Prospect":null,"Spanish Existing Patient":null,"Spanish Prospect":null,"Likely New Patient Appointment":0,"Repeat Caller":0,"CWHH Likely Referral":null,"CWHH Ortho Likely Referral":null,"Closed - Converted":null,"CWHH Ortho Call (web page calls)":null,"New Lead":null,"Outreach - Attempted":null,"Outreach - Engaged":null,"Tour Scheduled":null,"PCP Appointment Scheduled":null,"Closed - Not Converted":null,"Destination: Number Not in Service":null,"invoca_id":null,"invoca_detected_destination":null,"gclid":null,"g_cid":null,"wbraid":null,"msclkid":null,"mcid":null,"existing_destination_english":"xxx-506-xxxx","new_destination_spanish":"xxx-388-xxxx","invoca_campaign_name":"xxxxxx","existing_destination_spanish":"xxx-506-xxxx","new_destination_english":"xxx-388-xxxx","call_flow":"GENESYS Daytona: Ormond Beach ENG/SPA Generic Call Menu","direct_dial":null,"timing":"Evergreen","language":"English/Spanish","audience_type":"Prospects","asset_type":"Tracked Number","year":null,"hh_destination":null,"sfcid":null,"report_suite":null,"customer_id":null,"adobe_org_id":null,"landing_page":null,"utm_medium":"Business Listings","utm_source":"GMB","utm_campaign":null,"utm_content":null,"calling_page":null,"cm_mmc":null,"kc":null,"e_dins":null,"state":"xxxxxx","center_entity":"3638","center_code":"OBE","center_name":null,"market":"xxxxx, Daytona","funding_type":"Acquisition","sub_funding_type":"Engagement","brand":"xxxxx Care Center","signal_name":"Not Answered by Agent","signal_partner_unique_id":"","signal_occurred_at":"2024-02-13 22:21:18 -05:00","signal_source":"","revenue":null,"sale_amount":null,"destination_phone_number":""}]
If I pass this sample value and then createdataframe out of it, then the rest of the steps are working fine but then there is no data. Sharing the code snippet below
03-06-2024 07:38 PM
I test the sample is work fine, please check you your unparsed_df and parsed_df
03-06-2024 08:09 PM
Yeah, I am working on the same. It is not throwing any error but also not producing any data in the dataframe.
unparsed_df is created from the sample json data
unparsed_df = spark.createDataFrame([[json]],['json_str'])
03-07-2024 10:19 AM
Can the admin please delete the post where in I am sharing the sample data. Dont want to share it in the public domain. Admin please help with this.
03-06-2024 08:17 PM
Sharing the code snippet where I am just working on a part of the data with few fields. As you will see that display shows "query returned no results".
03-06-2024 08:26 PM
it's not a valid json
03-06-2024 08:45 PM
Yeah, that was an error while copying the data. I have fixed that and it seems to work but when I am working with the full JSON data and JSON schema as shared earlier , it is not producing any data. Thanks once again for your help.
03-06-2024 09:16 PM
your data is
03-07-2024 10:14 AM
Yes indeed, it was datatype issue. After changing it to Longtype in the schema definition, it is working now. Thanks once again for all your inputs and time. Much appreciated !!!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group