How Google Analytics Collects Data

Google Analytics (GA) is a Web analytics tool.  It collects Web traffic metrics using a piece of JavaScript code referred to as the Google Analytics Tracking Code (GATC).  The code sends Visitor session data to Google servers for processing.  Google servers are notified any time a page is viewed by a Web browser. This process of using embedded code to collect Web traffic metrics is called page tagging.  GA is one of many Web analytics tools that use page tagging.

How Does it Work?

GA works by setting various cookies in Visitors’ Web browsers.  Cookies are small files that Web servers place in Web browsers, often for the purpose of tracking internet activity.  The file consists of a text message that is sent back to the server each time the browser requests a Web page.  Session cookies only remain in a Web browser until the browser is closed or remains inactive for a specified amount of time (“a session” or “Visit”).  Persistent cookies, on the other hand, remain after a browser session ends.  Some persistent cookies, including those set by GA, are set to expire after a specific amount of time passes (i.e. six months or two years).

Top of Page

Cookies and Privacy

Neither EPA nor Google collects any personally identifiable information (PII) about Visitors to the EPA website using GA.

GA uses first-party persistent cookies, which the government classifies as Tier 2 persistent cookies. Tier 2 persistent cookies do not collect any PII and are permissible for use by federal agencies. For more information on how the Office of Management and Budget (OMB) defines persistent cookies, see OMB M-10-22, Guidance for Online Use of Web Measurement and Customization Technologies.

Unless you first optout by blocking the cookies, the GATC will automatically set a persistent cookie in the browser of the computer or mobile device you are using to access the EPA website.  Visitors can choose not to allow GA to track their Web activity by changing their browser settings.  Modern browsers have options to block the kinds of cookies set by GA.  

Cookie Deletion and Return Visitors

A small percentage of Visitors to EPA will delete their Web browser’s cookies prior to their next Visit. With the GA cookies deleted, these Visitors will be counted as New Visits upon their return to the EPA website. This is the known and accepted reality among Web analysts. Therefore, when you interpret
the Return Visits metric in GA, consider the metric to represent the [Minimum] Return Visits. 
Some Return Visits will inevitably be lost due to cookie deletion, but as is true of most Web traffic metrics, it is the trend over time that will provide the most insight. Log file analyzers do not offer a good alternative in this regard, as they rely on IP addresses to identify Return Visits, and large companies often have dynamic IP addresses that can change between Visits or even during a Visit. 

Top of Page

Page Tagging v. Log File Analysis

An alternative approach to collecting Web traffic metrics through page tagging is log file analysis.  This method entails downloading server log files for processing in an analytics software program.  It does not use page tagging or cookies.

Since server logs record all server transactions, including activity from Web crawlers and bots, software is needed to filter out non-human activity.  While log file software does filter out known crawlers and bots and those that self-identify, a list must be continually maintained, making it difficult to filter all non-human activity. 

On the other hand, GA page tags have to be activated by JavaScript (JS), which the vast majority of Web spiders and bots do not process. However, Visitors who have JS disabled are not counted by most page tagging tools.  While this may represent a small amount of traffic, it should be considered as part of any Web traffic analysis. 

Log files may not be collecting all human activity either, since consecutive Visits to the same Web page can cause the page to be retrieved from the browser’s cache.  Web servers do not typically record such transactions.  With page tagging, even cached pages are recorded, since the JS is activated whenever the Web browser loads a page. 

Where page tagging holds a major advantage over log file analysis, however, is in the breadth of traffic metrics that can be collected and the ad hoc customizations that are available.

Page tagging solutions use cookies to track Return Visits and other Visit-based metrics, such as Pages per Visit and Visit Duration.  Log file software relies on IP addresses to calculate Visit-based metrics, which can be problematic since many large companies have dynamic IP addresses that can change after or even during a Visit.  Even though some Visitors delete their cookies prior to returning the same website, page tagging is viewed as a more accurate calculation of Visit-based metrics.

Page tagging tools also provide user-friendly segmentation and custom reporting options.  This allows you to quickly calculate the number of Visits from segments, such as:

  • Mobile devices
  • Various locations (down to the City level)
  • Social media referrals
  • Searches that included particular keywords

These calculations can be executed quickly in the interface.  In contrast, customizations to log file reports may require reprocessing the raw log files, or even custom configurations to the software itself.  In most cases, however, these customizations are not possible with log file analysis.  

The main advantage of log file analysis is the internal control of data.  Whereas page tagging usually requires third-party hosting and processing of data, log file analysis enables organizations to process metrics without relying on outside parties.  Depending on the organization, this can be a major selling point. 

For analysis purposes, it is most important to find the tool that meets your needs, understanding that all analytics tools will provide differing calculations, and stick with that tool as you compare metrics month over month and year over year.

Top of Page