<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Best way to parse Google Analytics data in Databricks notebook in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/best-way-to-parse-google-analytics-data-in-databricks-notebook/m-p/66990#M33255</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103893"&gt;@AnaMocanu&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;I was using this function, with a little modifications on my end:&lt;BR /&gt;&lt;A href="https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2" target="_blank"&gt;https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Maybe this will be helpful for you &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 23 Apr 2024 05:05:47 GMT</pubDate>
    <dc:creator>daniel_sahal</dc:creator>
    <dc:date>2024-04-23T05:05:47Z</dc:date>
    <item>
      <title>Best way to parse Google Analytics data in Databricks notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/best-way-to-parse-google-analytics-data-in-databricks-notebook/m-p/66976#M33253</link>
      <description>&lt;P&gt;I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null}]}}]}},{"v":{"f":[{"v":"blabla"},{"v":{"f":[{"v":null},{"v":"1"},{"v":null},{"v":null}]}}]}},{"v":{"f":[{"v":"ga_session_id"},{"v":{"f":[{"v":null},{"v":"XXXX"},{"v":null},{"v":null}]}}]}}]}&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Does anyone have a good technique for parsing this data, or do I need to manually parse all these columns manually?&lt;/P&gt;&lt;P&gt;Many thanks!&lt;/P&gt;&lt;P&gt;Ana&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2024 02:27:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-way-to-parse-google-analytics-data-in-databricks-notebook/m-p/66976#M33253</guid>
      <dc:creator>AnaMocanu</dc:creator>
      <dc:date>2024-04-23T02:27:27Z</dc:date>
    </item>
    <item>
      <title>Re: Best way to parse Google Analytics data in Databricks notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/best-way-to-parse-google-analytics-data-in-databricks-notebook/m-p/66990#M33255</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/103893"&gt;@AnaMocanu&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;I was using this function, with a little modifications on my end:&lt;BR /&gt;&lt;A href="https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2" target="_blank"&gt;https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Maybe this will be helpful for you &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2024 05:05:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-way-to-parse-google-analytics-data-in-databricks-notebook/m-p/66990#M33255</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2024-04-23T05:05:47Z</dc:date>
    </item>
    <item>
      <title>Re: Best way to parse Google Analytics data in Databricks notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/best-way-to-parse-google-analytics-data-in-databricks-notebook/m-p/67122#M33286</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/79106"&gt;@daniel_sahal&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I decided to go with parsing the data from the json format, as I don't need too many columns and the elements in the list that I need will stay the same.&lt;/P&gt;&lt;P&gt;For example, when you're picking the first element in the list&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; df.&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'device_category'&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;get_json_object&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"device"&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;"$.v.f[0].v"&lt;/SPAN&gt;&lt;SPAN&gt;)).&lt;/SPAN&gt;&lt;SPAN&gt;alias&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"device_category"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Apr 2024 20:53:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/best-way-to-parse-google-analytics-data-in-databricks-notebook/m-p/67122#M33286</guid>
      <dc:creator>AnaMocanu</dc:creator>
      <dc:date>2024-04-23T20:53:10Z</dc:date>
    </item>
  </channel>
</rss>

