cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Best way to parse Google Analytics data in Databricks notebook

AnaMocanu
New Contributor II

I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format

{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null}]}}]}},{"v":{"f":[{"v":"blabla"},{"v":{"f":[{"v":null},{"v":"1"},{"v":null},{"v":null}]}}]}},{"v":{"f":[{"v":"ga_session_id"},{"v":{"f":[{"v":null},{"v":"XXXX"},{"v":null},{"v":null}]}}]}}]}

Does anyone have a good technique for parsing this data, or do I need to manually parse all these columns manually?

Many thanks!

Ana

2 ACCEPTED SOLUTIONS

Accepted Solutions

daniel_sahal
Esteemed Contributor

@AnaMocanu 
I was using this function, with a little modifications on my end:
https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2

Maybe this will be helpful for you 🙂

View solution in original post

Thank you @daniel_sahal 

I decided to go with parsing the data from the json format, as I don't need too many columns and the elements in the list that I need will stay the same.

For example, when you're picking the first element in the list

df = df.withColumn('device_category', get_json_object(col("device"), "$.v.f[0].v")).alias("device_category")

 

View solution in original post

2 REPLIES 2

daniel_sahal
Esteemed Contributor

@AnaMocanu 
I was using this function, with a little modifications on my end:
https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2

Maybe this will be helpful for you 🙂

Thank you @daniel_sahal 

I decided to go with parsing the data from the json format, as I don't need too many columns and the elements in the list that I need will stay the same.

For example, when you're picking the first element in the list

df = df.withColumn('device_category', get_json_object(col("device"), "$.v.f[0].v")).alias("device_category")

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.