How Google Analytics Collects Data
GA works by setting various cookies in Visitors’ Web browsers. Cookies are small files that Web servers place in Web browsers, often for the purpose of tracking internet activity. The file consists of a text message that is sent back to the server each time the browser requests a Web page. Session cookies only remain in a Web browser until the browser is closed or remains inactive for a specified amount of time (“a session” or “Visit”). Persistent cookies, on the other hand, remain after a browser session ends. Some persistent cookies, including those set by GA, are set to expire after a specific amount of time passes (i.e. six months or two years).
Neither EPA nor Google collects any personally identifiable information (PII) about Visitors to the EPA website using GA.
GA uses first-party persistent cookies, which the government classifies as Tier 2 persistent cookies. Tier 2 persistent cookies do not collect any PII and are permissible for use by federal agencies. For more information on how the Office of Management and Budget (OMB) defines persistent cookies, see OMB M-10-22, Guidance for Online Use of Web Measurement and Customization Technologies.
Unless you first optout by blocking the cookies, the GATC will automatically set a persistent cookie in the browser of the computer or mobile device you are using to access the EPA website. Visitors can choose not to allow GA to track their Web activity by changing their browser settings. Modern browsers have options to block the kinds of cookies set by GA.
Cookie Deletion and Return Visitors
An alternative approach to collecting Web traffic metrics through page tagging is log file analysis. This method entails downloading server log files for processing in an analytics software program. It does not use page tagging or cookies.
Since server logs record all server transactions, including activity from Web crawlers and bots, software is needed to filter out non-human activity. While log file software does filter out known crawlers and bots and those that self-identify, a list must be continually maintained, making it difficult to filter all non-human activity.
Log files may not be collecting all human activity either, since consecutive Visits to the same Web page can cause the page to be retrieved from the browser’s cache. Web servers do not typically record such transactions. With page tagging, even cached pages are recorded, since the JS is activated whenever the Web browser loads a page.
Where page tagging holds a major advantage over log file analysis, however, is in the breadth of traffic metrics that can be collected and the ad hoc customizations that are available.
Page tagging tools also provide user-friendly segmentation and custom reporting options. This allows you to quickly calculate the number of Visits from segments, such as:
- Mobile devices
- Various locations (down to the City level)
- Social media referrals
- Searches that included particular keywords
These calculations can be executed quickly in the interface. In contrast, customizations to log file reports may require reprocessing the raw log files, or even custom configurations to the software itself. In most cases, however, these customizations are not possible with log file analysis.
The main advantage of log file analysis is the internal control of data. Whereas page tagging usually requires third-party hosting and processing of data, log file analysis enables organizations to process metrics without relying on outside parties. Depending on the organization, this can be a major selling point.
For analysis purposes, it is most important to find the tool that meets your needs, understanding that all analytics tools will provide differing calculations, and stick with that tool as you compare metrics month over month and year over year.