Google Analytics and a game changer

For a long time, I found standard Google Analytics reports to be the best way to get useful information. Occasionally, I struggled with sampling, limitations and strange results, but i couldn't help it, until I discovered Google Analytics 360 and raw data exports to Google BigQuery .

After a few hours playing with SQL , you could now deliver information you could never have with aggregated Google Analytics reports. Since that day, I've been exploring how raw data can be a web analyst's best friend.

Now that more and more tools allow you to access raw data (and cloud storage has become more powerful), web analysts should also change their approach.

In this article, i will do the following:

  • Present you the differences between raw and aggregated data;
  • Show you what to expect from this new access;
  • Show how you can get this data (at a low cost);

What is the difference between raw and aggregated data in Google Analytics?

Google Analytics, in the free version, provides only aggregated data. That means you can't get all the information on a page view by page view, event by event. Obviously, you can get a lot of information through User Explorer report, but this is limited, not scalable and cannot be downloaded.

For regular use, aggregated data is usually sufficient. After all, most of the questions we answer are pretty basic

Answering these questions does not require raw data. Default or custom reports in Google Analytics get the job done. So, Why dive deep into millions of rows of accurate data?

The problem with aggregated data is that, well, are added: you're mixing up a bunch of user behavior, sometimes hiding the most interesting facts.

Let's take a simple example with pages per session. Suppose you have two sources with six sessions each that have the following number of pages per session

By looking at the raw data, you can see that if you omit the statistical outlier of 10 pages, Source A has much less commitment. Nevertheless, if you mark only data average , is the same as Source B: 3 pages. (The median would be different).

So, Why is this more granular approach not the default for Google Analytics? Due to calculation costs. When you provide only sampled aggregate data, you don't need to go through millions of rows in each report. It makes sense that the free version of Google Analytics doesn't provide free advanced calculations.

What can you get from the raw data?

Now that you see the limitations of aggregated data, let's look at some use cases for raw data

One thing seriously missing

and Google Analytics? Synchronization. There is no simple way to know the actual time interval between a cart add-on and a purchase, either within the same session or not.

Now, may bill you based on actual data usage in BigQuery. Y, as web analysts, we must be fully aware of exactly what you get for that investment.

Events calendar

Of course, you can store some times in a cookie and do your own calculations. But this seems to reinvent the wheel: Google Analytics is supposed to have already collected this data!!

With raw data analysis, you can easily get the precise time of an event for a given user and compare it to another event for the same user. For a broader analysis, you can add data in any way you decide: average, median, percentage distribution or some advanced statistical model.

Isn't it important to know that the 20% of its users convert into 2 minutes and the 10% takes more than 7 days? Don't you think you should communicate differently with these two groups of visitors??

User scope analysis

In Google Analytics reports, incluso Google Analytics 360, user segmentation is limited to 90 days . For some companies, especially those with a long decision process , this hindsight is not enough.

With raw data, can answer questions like:

  • Are users acquired during the holiday season more likely to buy in September than other types of users?
  • What effect does watching a video have over the course of a year? Help with conversions?

If you store raw data, you can keep event logs for as long as you want. Simply check with your data protection officer if the duration fits the purpose of the processing.


A correlation coefficient shows the statistical relationship between two variables. with big data, it can be insightful to measure the relationship between two behaviors, how:

  • The impact of pageviews by topic on the purchase. Is there a correlation between the types of content a user reads and what they buy?
  • related products. If I buy product A, which product category has a positive correlation with this product?

Third party data

By last, but not less important, raw data storage can be a total game changer if you join forces with other data sources. Here are some examples:

E-commerce data. This is especially valuable if you store a Google Analytics Customer ID along with any add to cart or checkout action


You can calculate a more accurate conversion rate as you can get transaction information even for users who did not activate Google Analytics on the confirmation page (for example, ad blockers prevent tags from firing, banking services do not redirect, confirmation pages take too long to load, etc.).

further, as you are now using your own data, you can remove revenue from canceled transactions or returns . You can also calculate more advanced and sensitive metrics, as margin instead of revenue.

CRM data. What's more annoying than realizing that a campaign that generated a ton of leads generated a ton of leads? irrelevant ? This is a challenge for most B2B sites..

Google Analytics 360

If you're rich enough / lucky enough to get google analytics 360, you get a raw data export to Google BigQuery right away.

Obviously, for such analysis you would have to collect information in a data-readable format. But once it's done, has a world full of relevant information to share.

Adverts, trackers, records, whatever is. Once you get used to storing all your data in the same warehouse and running union analyzes with analytical data, can make your wildest data dreams come true:

All the information, including enhanced e-commerce, is exported. Each row represents a session, and you can play with many dimensions and metrics.


in the coming months, Google App + Web will become the new standard. This will come with more integrations between Google Marketing Platform and Google Cloud Platform., especially BigQuery. If your SQL skills are a bit rusty, I recommend you upgrade and play around with some demo data.

With easy access to raw data, fast and efficient calculations and powerful data visualization , advanced digital analytics is becoming more and more mature. The future of all this is probably tighter integration with other business data..

Some have been telling us for years that business intelligence and digital analytics should work together.. Slow but safe, is becoming a reality.

Yandex.Metrica and Matomo

I have not tried all the tools. Most paid tools offer a raw data export. Nevertheless, I would like to mention two free tools that also have this option.

  • Yandex.Metrica it is 100% free and provides raw data through its logs API.
  • Visible is an open source analytics tool that may need to be installed on your own server: get raw data directly into your database.


If you were a victim of a computer attack on your website, contact us and we'll help you recover if website.


More information and advice

Easier than ever!

You are one step away from having the website of your dreams.