- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2021 12:02 PM
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2021 08:06 PM
Thanks @Digan Parikh . Credit to Tahir Fayyaz, Found a couple of different paths depending on whether you're looking to bring in raw GA data vs aggregated GA data.
1) For Raw You can bring in data from GA Universal Analytics 360 Paid version or GA v4 Free and GA v4 360 (Paid) versions first into Big Query and then use either the Spark Bigquery Connector to bring that data in Delta Lake (on any cloud provider) or Land that data into GCS and then use AutoLoader on GCS to bring that into Delta
2) For aggregated GA data you can either use the GA Reporting API or the GA Data API and then bring that data in either via DB Scheduled Python Notebook Job or Partner ETL tools into Delta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2021 11:01 AM
You can use a ETL tool that connects to GA and puts it into Databricks. You can also do it yourself - use the GA APIs via python and connect. This can be though as you need to set everything up yourself (API Oauth 2.0, different api endpoints, parse json etc).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2021 08:06 PM
Thanks @Digan Parikh . Credit to Tahir Fayyaz, Found a couple of different paths depending on whether you're looking to bring in raw GA data vs aggregated GA data.
1) For Raw You can bring in data from GA Universal Analytics 360 Paid version or GA v4 Free and GA v4 360 (Paid) versions first into Big Query and then use either the Spark Bigquery Connector to bring that data in Delta Lake (on any cloud provider) or Land that data into GCS and then use AutoLoader on GCS to bring that into Delta
2) For aggregated GA data you can either use the GA Reporting API or the GA Data API and then bring that data in either via DB Scheduled Python Notebook Job or Partner ETL tools into Delta

