cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

JSON string object with nested Array and Struct column to dataframe in pyspark

filipjankovic
New Contributor

I am trying to convert JSON string stored in variable into spark dataframe without specifying schema, because I have a big number of different tables, so it has to be dynamically. I managed to do it with sc.parallelize, but since we are moving to Unity Catalog, I had to create a Shared Compute cluster, so now sc.parallelize and some other libraries are not working.

I have prepared 3 different JSON strings stored in variable that looks something like this, but originally it has much more rows. I need it to work for all 3 examples.

Onedrive file: JSON conversion sample.dbc

Here is the example of code that is working with Single user cluster, but not with Shared Compute:

import json

data_df = sc.parallelize(value_json).map(lambda x: json.dumps(x))
data_final_df = spark.read.json(data_df)
data_final_df = data_final_df.toDF(*(c.replace('@odata.', '_odata_').replace('.', '_') for c in data_final_df.columns))

display(data_final_df)

 

 
 

 

1 REPLY 1

cgrant
Databricks Employee
Databricks Employee

Hi filipjankovic,

SparkContext sc is a Spark 1.0 API and is deprecated on Standard and Serverless compute. However, your input data is a list of dictionaries, which are supported with spark.createDataFrame.

This should give you identical output without dropping down to RDD or using the deprecated SparkContext:

data_df = spark.createDataFrame(value_json)
data_final_df = data_df.toDF(*(c.replace('@odata.', '_odata_').replace('.', '_') for c in data_df.columns))
display(data_final_df)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group