cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

JSON string object with nested Array and Struct column to dataframe in pyspark

filipjankovic
New Contributor

I am trying to convert JSON string stored in variable into spark dataframe without specifying schema, because I have a big number of different tables, so it has to be dynamically. I managed to do it with sc.parallelize, but since we are moving to Unity Catalog, I had to create a Shared Compute cluster, so now sc.parallelize and some other libraries are not working.

I have prepared 3 different JSON strings stored in variable that looks something like this, but originally it has much more rows. I need it to work for all 3 examples.

Onedrive file: JSON conversion sample.dbc

Here is the example of code that is working with Single user cluster, but not with Shared Compute:

import json

data_df = sc.parallelize(value_json).map(lambda x: json.dumps(x))
data_final_df = spark.read.json(data_df)
data_final_df = data_final_df.toDF(*(c.replace('@odata.', '_odata_').replace('.', '_') for c in data_final_df.columns))

display(data_final_df)

 

 
 

 

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.