cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to deal with column name with .(dot) in pyspark dataframe??

MithuWagh
New Contributor

  • We are streaming data from kafka source with json but in some column we are getting .(dot) in column names.
  • streaming json data:

df1 = df.selectExpr("CAST(value AS STRING)")

{"pNum":"A14","from":"telecom","payload":{"TARGET":"1","COUNTRY":"India","EMAIL.1":"test@test.com","PHONE.1":"1122334455"}}

  • in above json we are getting (EMAIL.1,PHONE.1) with .(dot) name.
  • we are extracting the json data with get_json_object like below but we are getting Email and phone values are null

df2 = df1.select(get_json_object(df1["value"], '$.pNum').alias('pNum'), get_json_object(df1["value"], '$.from').alias('from'), get_json_object(df1["value"], '$.payload.TARGET').alias('TARGET'), get_json_object(df1["value"], '$.payload.COUNTRY').alias('COUNTRY'), get_json_object(df1["value"], '$.payload.EMAIL.1').alias('EMAIL'), get_json_object(df1["value"], '$.payload.PHONE.1').alias('PHONE'))

then how to deal with this type of columns name??

1 REPLY 1

shyam_9
Valued Contributor
Valued Contributor

Hi @Mithu Wagh you can use backticks to enclose the column name.

df.select("`col0.1`")
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.