cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

split character string in cell with sql

dulu
New Contributor III

I have the following input:

hinh22 

I am looking for a way to split the characters in the item_order_detail column into 2 columns itemID and itemName. As below output table uses SQL function in databricks with spark_sql version 3.2.1.

hinh223Can someone suggest a solution for me?

Thanks all

6 REPLIES 6

Wassim
New Contributor III

Hi, you got to extract itemID and itemName from single column item_order_detail - you can make use of from_json function for that. I hope following code helps,

SELECT order_id, from_json(item_order_details, 'itemid STRING, itemname STRING') as items
FROM orders
 
SELECT order_id, items.itemid, items.itemname
FROM orders
LATERAL VIEW from_json(item_order_details, 'itemid STRING, itemname STRING') items as items

Wassim
New Contributor III

and pyspark for the same-

df = spark.table("orders")
 
df = df.withColumn("items", from_json(df["item_order_details"], "itemid STRING, itemname STRING"))
 
df = df.select("order_id", "items.itemid", "items.itemname")

dulu
New Contributor III

Hi @Wasim Inamdar​ 

I tried it and have error bellow:

SELECT order_id, from_json(item_order_details, 'itemid STRING, itemname STRING') as items
FROM orders

result

224 

and

SELECT order_id, items.itemid, items.itemname
FROM orders
LATERAL VIEW from_json(item_order_details, 'itemid STRING, itemname STRING') items as items

Error in SQL statement: AnalysisException: from_json is expected to be a generator. However, its class is org.apache.spark.sql.catalyst.expressions.JsonToStructs, which is not a generator

Anonymous
Not applicable

If you have a struct type and not json, you can explode the column and then extract the parts. https://spark.apache.org/docs/latest/api/sql/index.html#explode will unpack or flatmap the array. Just select order and explode("item_order_details")

Kaniz
Community Manager
Community Manager

Hi @Du Lu​ , We haven’t heard from you since the last response from @Joseph Kambourakis​ and I was checking back to see if his suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

sher
Valued Contributor II
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.