cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What is a good way to ingest Google Analytics data into Databricks

aladda
Databricks Employee
Databricks Employee
 
1 ACCEPTED SOLUTION

Accepted Solutions

aladda
Databricks Employee
Databricks Employee

Thanks @Digan Parikhโ€‹ . Credit to Tahir Fayyaz, Found a couple of different paths depending on whether you're looking to bring in raw GA data vs aggregated GA data.

1) For Raw You can bring in data from GA Universal Analytics 360 Paid version or GA v4 Free and GA v4 360 (Paid) versions first into Big Query and then use either the Spark Bigquery Connector to bring that data in Delta Lake (on any cloud provider) or Land that data into GCS and then use AutoLoader on GCS to bring that into Delta

2) For aggregated GA data you can either use the GA Reporting API or the GA Data API and then bring that data in either via DB Scheduled Python Notebook Job or Partner ETL tools into Delta

View solution in original post

2 REPLIES 2

Digan_Parikh
Valued Contributor

You can use a ETL tool that connects to GA and puts it into Databricks. You can also do it yourself - use the GA APIs via python and connect. This can be though as you need to set everything up yourself (API Oauth 2.0, different api endpoints, parse json etc).

aladda
Databricks Employee
Databricks Employee

Thanks @Digan Parikhโ€‹ . Credit to Tahir Fayyaz, Found a couple of different paths depending on whether you're looking to bring in raw GA data vs aggregated GA data.

1) For Raw You can bring in data from GA Universal Analytics 360 Paid version or GA v4 Free and GA v4 360 (Paid) versions first into Big Query and then use either the Spark Bigquery Connector to bring that data in Delta Lake (on any cloud provider) or Land that data into GCS and then use AutoLoader on GCS to bring that into Delta

2) For aggregated GA data you can either use the GA Reporting API or the GA Data API and then bring that data in either via DB Scheduled Python Notebook Job or Partner ETL tools into Delta

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group