Yottaa's Site Optimization & Web Performance Blog

Google Analytics: How to Segment and Filter Robot Traffic

Sep 25, 2012 11:23:00 AM

Posted by Alex Pinto



[We've updated and republished this blog, originally posted in March 2011, to reflect changes in the Google Analytics interface as well as changes to Yottaa]

 

Google Analytics ("GA") is the most popular web analytics tool on the Web, largely because it is both free and excellent. Because performance monitoring systems like ours can impact GA reports, it feels right to spend a little time helping our users and the larger GA community with explicit instructions for filtering and/or creating Custom Reports to view traffic coming to your websites.  The directions below are specific to GA, but the principles are easily applied to other web analytics systems such as Coremetrics Analytics or Omniture SiteCatalyst. If you're using a system other than GA, scroll to the end of the post for more information on finding Yottaa's bots.  

 

The Problem

 

First, a description of the problem: some types of traffic to your website should simply never be counted in your reports. Internal traffic (from developers and testers) is one such category. Traffic generated by search engine crawlers like Googlebot is another. Similarly, traffic from automated solutions for testing or monitoring your site, such as Yottaa Monitor, Keynote, Gomez and BrowserMob, should not appear in your metrics.

Because GA is implemented using JavaScript, it automatically ignores simple crawlers that don’t execute the JavaScript. However, there are increasingly sophisticated bots out there, including our own Yottaa Web Performance Monitoring robots, which can't win at Jeopardy just yet but do know how to do things like run JavaScript and accept cookies. These smart bots are tougher to distinguish from normal human users, and by default GA will include the traffic they generate. Hence the need to create Custom Reports or filters in GA.  To do this you simply need to teach GA how to identify the bots, by looking at their browser type or browser version.  

 

Below, we lay out explicit instructions on how to filter out Yottaa's bots, but the same principles can be applied to any other JavaScript-executing bots present on your GA reports.  To figure out if there are other bots in your GA reports, go to your Standard Reporting tab and in the left sidebar navigation go to Audience > Technology > Network.  Here, anywhere you see high numbers of visits coupled with bounce rates near 100 percent and miniscule average time spent on site, it's likely bot traffic.  

 

Google Analytics Filters and Custom Reports: Demystified

 

Before we dive into the step-by-step instructions, here's a quick overview of the distinction between GA's filters and Custom Reports features.

 

Custom Reports offer flexibility on how existing GA data is presented. Using Custom Reports will alter your view into your data, but will not change what's actually being collected. Custom Reports apply retroactively, which is to say that when you define one, you can then view all your historical data through the lens of that Report. 

 

Filters are more invasive than Custom Reports, in that they they actually impact what is collected and stored by GA. Filtering cannot be applied retroactively and only affects data collected once the filter has been created. Some GA users feel more comfortable first testing and refining rules in Custom Reports to be sure the data looks right, then creating a filter applying the same rules. Alternately, you can create a duplicate profile for the same domain and only apply your filter to one of them, thus preserving collection of all "raw" data off to the side, while leveraging the power of filters in your main reporting profile (See https://www.google.com/support/analytics/bin/answer.py?answer=55494 for more detail on this approach.)

Ok, without further ado, here's what to do.

 

Instructions for creating a custom report to hide Yottaa bot traffic

Here's how to create a Custom Report.  First, on the GA dashboard, click the "Custom Reporting" tab.

1.1

 

From here, click "+ New Custom Report" (visible above), then create a title (1) (below), and add Metric Groups (2).  When deciding which Metric Groups to add, think about which ones you typically look at on your GA dashboard; you'll want your report to be thorough enough that it can effectively take the place of the regular dashboard. Add as many as you wish. 

 

steps3

 

For step (3), look under the "Filters - optional" heading and click "+ Add a filter", open the "Visitors" sub-menu, and choose "Browser Version". 

 

describe the image


This will bring up the menu shown below.  Now choose "Exclude" (4), keep the default choice "Exact" (5), and type "99.0" into the open field (6). Why 99.0?  Here's why: this type of Yottaa bot uses real browsers to perform monitoring activities.  This ensures the most accurate "real user experience" testing.

 

However, using real browsers makes it hard to differentiate hits from Yottaa bots and hits from real users.  To solve this problem, we've set a fake version number for all of the real browsers we use. That number is 99.0. (Internet Explorer is currently at version 9, and FireFox at 15 -- so 99.0 shouldn't cause any conflicts any time soon).  This means that when you filter by this exact version number all of Yottaa's browser bots will get filtered out and all other browsers will get through. 

 

describe the image

Now hit save.  You've now created a Custom Report with all of Yottaa's real-browser bots filtered out. Keep in mind, since this is a Custom Report, your GA account will continue to collect and report bot traffic.  In order to experience the filters you've just created, go back to the "Custom Reporting" tab and this report will appear as a choice on the menu.  

 

Instructions for filtering out Yottaa bot traffic

 

If you want to filter out bot traffic entirely (because, perhaps, you want to avoid having to view a Custom Report every time you access your GA account) then follow these steps.  As previously stated, you can also create a new profile and keep your original profile intact, in case you want to easily be able to see your site's traffic with the bot traffic included. 

First click the "Admin" button on the top right side of the GA dashboard. 


2012 09 20 1759

 

Next, create a new profile or choose an existing profile. 

 

2012 09 20 1800Choosing a profile will bring up the menu seen below.  Click the "Filters" tab.

2012 09 20 1801

 

You will see any existing filters you've created (as in the image below) or none, if you've never used this feature.  In either case, click "+ New Filter"

 

filter list new

 

This will bring up the options shown below.  First, click the radio button for "Create new Filter" (1), then add a unique filter name (2), and click the radio button for "Custom filter" (3). Choosing this button will bring up some additional options.

 

In the additional options, "Exclude" will already be chosen. Do not change this selection. Continuing down the page, change the Filter Field to "Browser Version" (4), and type "99.0" (5) into the Filter Pattern field.  Then hit save (6).  

gablog2

Your Google Analytics profile will now no longer log traffic from any Yottaa bots. Hooray!

 

What If I'm Not Using GA?

 

If you're using a site analytics system other than GA, look in the User-Agent in the Request Header for the following (Yottaa specific components in bold):

 

  • IE 9: User-Agent - Mozilla/5.0 (compatible; MSIE 99.0; Windows NT 6.0; Trident/5.0 YottaaMonitor)
  • IE 8: User-Agent - Mozilla/4.0 (compatible; MSIE 99.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E YottaaMonitor)
  • FireFox 3.6: User-Agent - Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.28) Gecko/20120306 Firefox/99.0 YottaaMonitor
  • FireFox 7: User-Agent - Mozilla/5.0 (Windows NT 6.0; rv:7.0.1) Gecko/20100101 Firefox/99.0 YottaaMonitor
  • FireFox 13: User-Agent - Mozilla/5.0 (Windows NT 6.0; rv:13.0) Gecko/20100101 Firefox/99.0 YottaaMonitor
  • Chrome (latest): User-Agent - Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/99.0 Safari/537.1YottaaMonitor
  • HTTP agent (availability check): User-Agent - YottaaMonitor

 

Note: ROBOTS.TXT and Blocking Monitoring Bots

 

Finally, a note about the "robots.txt" Robots Exclusion Standard. Yottaa bots respect the rules of the road and will obey instructions found in robots.txt files. However, we strongly recommend against outright blocking of our bots, as doing so will prevent highly useful, free performance metrics from being collected. Filtering bot traffic out of your analytics tool of choice as we've outlined in this post is simple, and allows you to continue monitoring your site in Yottaa.com while keeping your analytics clean.

 

Further questions?  Visit our support site or email our support team

 

Topics: Application Optimization

Posts By Topic