Google Analytics: How to Segment and Filter Robot Traffic
[We’ve updated and republished this blog, originally posted in March 2011, to reflect changes in the Google Analytics interface as well as changes to Yottaa]
Google Analytics (“GA”) is the most popular web analytics tool on the Web, largely because it is both free and excellent. Because performance monitoring systems like ours can impact GA reports, it feels right to spend a little time helping our users and the larger GA community with explicit instructions for filtering and/or creating Custom Reports to view traffic coming to your websites. The directions below are specific to GA, but the principles are easily applied to other web analytics systems such as Coremetrics Analytics or Omniture SiteCatalyst. If you’re using a system other than GA, scroll to the end of the post for more information on finding Yottaa’s bots.
First, a description of the problem: some types of traffic to your website should simply never be counted in your reports. Internal traffic (from developers and testers) is one such category. Traffic generated by search engine crawlers like Googlebot is another. Similarly, traffic from automated solutions for testing or monitoring your site, such as Yottaa Monitor, Keynote, Gomez and BrowserMob, should not appear in your metrics.
Google Analytics Filters and Custom Reports: Demystified
Before we dive into the step-by-step instructions, here’s a quick overview of the distinction between GA’s filters and Custom Reports features.
Custom Reports offer flexibility on how existing GA data is presented. Using Custom Reports will alter your view into your data, but will not change what’s actually being collected. Custom Reports apply retroactively, which is to say that when you define one, you can then view all your historical data through the lens of that Report.
Filters are more invasive than Custom Reports, in that they they actually impact what is collected and stored by GA. Filtering cannot be applied retroactively and only affects data collected once the filter has been created. Some GA users feel more comfortable first testing and refining rules in Custom Reports to be sure the data looks right, then creating a filter applying the same rules. Alternately, you can create a duplicate profile for the same domain and only apply your filter to one of them, thus preserving collection of all “raw” data off to the side, while leveraging the power of filters in your main reporting profile (See https://www.google.com/support/analytics/bin/answer.py?answer=55494 for more detail on this approach.)
Ok, without further ado, here’s what to do.
Instructions for creating a custom report to hide Yottaa bot traffic
Here’s how to create a Custom Report. First, on the GA dashboard, click the “Custom Reporting” tab.
From here, click “+ New Custom Report” (visible above), then create a title (1) (below), and add Metric Groups (2). When deciding which Metric Groups to add, think about which ones you typically look at on your GA dashboard; you’ll want your report to be thorough enough that it can effectively take the place of the regular dashboard. Add as many as you wish.
For step (3), look under the “Filters – optional” heading and click “+ Add a filter”, open the “Visitors” sub-menu, and choose “Browser Version”.
This will bring up the menu shown below. Now choose “Exclude” (4), keep the default choice “Exact” (5), and type “99.0” into the open field (6). Why 99.0? Here’s why: this type of Yottaa bot uses real browsers to perform monitoring activities. This ensures the most accurate “real user experience” testing.
However, using real browsers makes it hard to differentiate hits from Yottaa bots and hits from real users. To solve this problem, we’ve set a fake version number for all of the real browsers we use. That number is 99.0. (Internet Explorer is currently at version 9, and FireFox at 15 — so 99.0 shouldn’t cause any conflicts any time soon). This means that when you filter by this exact version number all of Yottaa’s browser bots will get filtered out and all other browsers will get through.
Now hit save. You’ve now created a Custom Report with all of Yottaa’s real-browser bots filtered out. Keep in mind, since this is a Custom Report, your GA account will continue to collect and report bot traffic. In order to experience the filters you’ve just created, go back to the “Custom Reporting” tab and this report will appear as a choice on the menu.
Instructions for filtering out Yottaa bot traffic
If you want to filter out bot traffic entirely (because, perhaps, you want to avoid having to view a Custom Report every time you access your GA account) then follow these steps. As previously stated, you can also create a new profile and keep your original profile intact, in case you want to easily be able to see your site’s traffic with the bot traffic included.
First click the “Admin” button on the top right side of the GA dashboard.
Next, create a new profile or choose an existing profile.
Choosing a profile will bring up the menu seen below. Click the “Filters” tab.
You will see any existing filters you’ve created (as in the image below) or none, if you’ve never used this feature. In either case, click “+ New Filter”
This will bring up the options shown below. First, click the radio button for “Create new Filter” (1), then add a unique filter name (2), and click the radio button for “Custom filter” (3). Choosing this button will bring up some additional options.
In the additional options, “Exclude” will already be chosen. Do not change this selection. Continuing down the page, change the Filter Field to “Browser Version” (4), and type “99.0” (5) into the Filter Pattern field. Then hit save (6).
Your Google Analytics profile will now no longer log traffic from any Yottaa bots. Hooray!
What If I’m Not Using GA?
If you’re using a site analytics system other than GA, look in the User-Agent in the Request Header for the following (Yottaa specific components in bold):
- IE 9: User-Agent – Mozilla/5.0 (compatible; MSIE 99.0; Windows NT 6.0; Trident/5.0 YottaaMonitor)
- IE 8: User-Agent – Mozilla/4.0 (compatible; MSIE 99.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E YottaaMonitor)
- FireFox 3.6: User-Agent – Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:188.8.131.52) Gecko/20120306 Firefox/99.0 YottaaMonitor
- FireFox 7: User-Agent – Mozilla/5.0 (Windows NT 6.0; rv:7.0.1) Gecko/20100101 Firefox/99.0 YottaaMonitor
- FireFox 13: User-Agent – Mozilla/5.0 (Windows NT 6.0; rv:13.0) Gecko/20100101 Firefox/99.0 YottaaMonitor
- Chrome (latest): User-Agent – Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/99.0 Safari/537.1YottaaMonitor
- HTTP agent (availability check): User-Agent – YottaaMonitor
Note: ROBOTS.TXT and Blocking Monitoring Bots
Finally, a note about the “robots.txt” Robots Exclusion Standard. Yottaa bots respect the rules of the road and will obey instructions found in robots.txt files. However, we strongly recommend against outright blocking of our bots, as doing so will prevent highly useful, free performance metrics from being collected. Filtering bot traffic out of your analytics tool of choice as we’ve outlined in this post is simple, and allows you to continue monitoring your site in Yottaa.com while keeping your analytics clean.