Query your Google Analytics data with BigQuery

Business Benefits

Use BigQuery to manipulate, change, and analyze your Google Analytics data.


Decide which type of Google Analytics connection to use in BigQuery: Google Analytics 360, Google Analytics 4 native, or a third-party option.

Third-party connectors include:

  • Supermetrics using flat tables in Batch mode.
  • Owox using tables with nested or repeated fields, similar to GA360 tables, in Streaming mode

For now, we’ll use the GA public dataset. The sample dataset provides an obfuscated Google Analytics 360 dataset that can be accessed via BigQuery.

Click + Add Data > Explore public datasets to access the GA360 sample data tables in BigQuery public datasets.

Search for Google Analytics Sample in the public datasets.

Click on blue View Dataset button to explore the added dataset.

You’ll see one table that contains multiple tables per day, with a table suffix as a date.

Some lines and fields are empty; this is caused by nested fields.

Differentiate between a sharded and partitioned table.

A partitioned table is divided into segments, called partitions, that make it easier to manage and query your data. An alternative to date/timestamp/datetime partitioned tables, date-sharded tables use a time-based naming approach such as [PREFIX]_YYYYMMDD.

GA360 uses date-sharded tables:

To query a sharded table you can add * to replace the date in the table name, to query all the data from all the sharded tables:

SELECT count(visitId)FROM bigquery-public-data.google_analytics_sample.ga_sessions_*

```Or add the date if you want to query a data for the specific day:

SELECT * FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`

SELECT date, SUM(totals.pageviews) AS pageviews                                                          FROM bigquery-public-data.google_analytics_sample.ga_sessions_*WHERE _TABLE_SUFFIX BETWEEN '20160729' and '20170801'GROUP by date
## Spot the nested fields in Google Analytics 360 data, and perform some queries on this data using the `UNNESTED` function.

Each row in GA360 data represents a single session and contains many fields, some of which can be repeated and nested. `Hits` contains a repeated set of fields, representing the page views and events during the session.

![](https://cxl.com/institute/wp-content/uploads/2021/03/image-151.png)

Use `UNNEST` function to query the nested fields. Try this query:

SELECT
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
bigquery-public-data.google_analytics_sample.ga_sessions_*,
UNNEST(hits) as hits
where hits.hitNumber = 1
group by hitNumber, pagePath
## Query GA360 data using `UNNEST` and `TABLE SUFFIX`.

Try this query to get pages per session for every visitor and every visit:

SELECT
fullVisitorId,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
bigquery-public-data.google_analytics_sample.ga_sessions_*,
UNNEST(hits) as hits
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND
hits.type="PAGE"
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber


## Continue to explore the Google Analytics data.

Explore the Google Analytics table schema to understand what are fields available for your queries. Find what are the nested fields to query them correctly; search for RECORD type in the schema, to find a sub-table with nested fields. Use the Standard SQL, and query your data as you would do with any other table, just take into consideration Table Suffix and Nested fields.


## Work on a real use case, and write a GA360 query using what you've learned in the previous steps.

Create a table with one line per user/date of visit with the following variables, only the last 90 days of the data:

- `fullVisitorId`
- `date`
- total number of page views on the last 30 days prior this date
- average time spent during a session on the last 30 days prior this date

It will give you the quality of visit per user, per date, and you can add any other variables to include more quality indicators in your analysis. Query:

DECLARE
maxDate DATE;
SET
maxDate = (
SELECT
PARSE_DATE('%Y%m%d',
MAX(date))
FROM
bigquery-public-data.google_analytics_sample.ga_sessions_*);
WITH
features_table AS (
SELECT
fullVisitorId,
date,
trafficSource.source,
SUM(totals.pageviews) AS pageviews,
IFNULL(SUM(totals.timeOnSite),
0) AS timeOnSite,
CASE
WHEN device.deviceCategory = "tablet" THEN '3'
WHEN device.deviceCategory = "mobile" THEN '2'
ELSE
'1'
END
AS deviceCategory,
CASE
WHEN SUM(totals.transactions) IS NULL THEN '0'
ELSE
'1'
END
AS TRANSACTION
FROM
bigquery-public-data.google_analytics_sample.ga_sessions_*
WHERE
PARSE_DATE('%Y%m%d',
date) >= DATE_ADD(maxDate, INTERVAL -90 DAY)
GROUP BY
fullVisitorId,
date,
device.deviceCategory,
trafficSource.source
ORDER BY
fullVisitorId,
date )
SELECT
*,
IFNULL(SUM(pageviews) OVER(PARTITION BY fullVisitorId ORDER BY UNIX_DATE(PARSE_DATE('%Y%m%d', date)) RANGE BETWEEN 30 PRECEDING
AND CURRENT ROW),
0) AS _30DaysSumPageviews,
IFNULL(ROUND(AVG(timeOnSite) OVER(PARTITION BY fullVisitorId ORDER BY UNIX_DATE(PARSE_DATE('%Y%m%d', date)) RANGE BETWEEN 30 PRECEDING
AND CURRENT ROW),1),
0) AS _30DaysAverageTimeOnSite
FROM
features_table

Last edited by @hesh_fekry 2023-11-14T11:15:31Z