Skip to Main Content
Performance

The Truth About Bot Traffic (And How to Prevent it from Killing Performance)

Google Analytics is not telling the whole story about your website’s traffic.  Site owners are often shocked to discover that up to 50 percent or more of their server hits come from bots and scrapers, with most of those hits going unreported by typical web analytics services.  Google Analytics tells you about your human visitors because they’re the most valuable.  But bots require server CPU processing just like any visitor, and your processing capacity can consumed by these extra server hits instead of serving the real user requests.

bot traffic
Typical traffic numbers for small- to medium-sized web sites.

Imagine you own an eCommerce site, and dozens of customers are trying to check out their shopping carts at 4:00 pm.  If your server CPU is busy serving thousands of hits from bots at that moment – with bot requests taking up, say, 80% of your server processing capacity – then the human visitors’ experiences will suffer.  Some of them might bounce away and never return, finding the product elsewhere.  And it’s all because of bots!

Why Don’t I See Bots From Web Analytics?

Like most web analytics services, Google Analytics relies on running a small JavaScript code snippet inside your webpage to track your site visitors.  This JavaScript code snippet is designed to collect the visitor behavior data and send such data back to Google Analytics backend, which performs further number crunching to produce the reports.

The key requirement for web analytics to work is that their JavaScript snippet will be executed by the client side browser. Browsers that human visitors use to view the web are equipped to automatically run JavaScript, as well as render images, execute CSS, and perform the multitude of other tasks that result in a web page looking and acting like it’s supposed to. Bots, on the other hand, crawl around your site without running a real browser.  They don’t need to execute JavaScript or render images: they can get all the information they need by crawling through the raw HTML documents.  Since they don’t execute the Google Analytics script, Google Analytics service is not aware of such traffic and the bots fly under the radar.

The Good, The Bad, The?

Bot TrafficIt would be one thing if blocking all bots were a viable solution.  But as you’re surely aware, the “good” bots from sites like Google, Bing, and other companies drive your site’s SEO value.  Smart site owners would sooner roll out a red carpet and usher these bots onto their pages than block them. So if you want to both allow good bots to crawl your site but save your server processing capacity for your human visitors, then the solution for dealing with bot traffic must lie somewhere between “carte blanche” and total blocking.

There are various methods for dealing with bots that fall into this middle space.  IP blocking is a way to deal with individual bots that you have identified as troublesome by keeping them off your site.  Throttling deals with bots generally (good, bad, and everywhere between) by placing a limit on the number of times one can hit your site.

 

How IP Blocking Can Help Your Site’s Safety and Performance
Some bots are bad.  They might be scraping your site’s content to republish it (illegally) elsewhere on the web, posting spam comments on your blog, or showing advertisements to some of your visitors.

To deal with pernicious bots, there are tools that allow you to block them.  First, you must find out which bots are causing trouble.  There are a number of ways to find out if your site is getting unusually high amount of hits from a certain bot: your hosting provider, firewall provider, or other backend service provider can provide that information.  There?s also the raw server log for your site, but logs are usually so massive that you need a tool like Deep Log Analyzer or AWStats to read it effectively.

Bot Traffic

Once you’ve identified one or more IP addresses to block, enlist a firewall service (Yottaa’s is pictured above) or use your existing one.  Enter as many IP addresses as you wish to block.

How Throttling Can Help Your Site’s Performance

Bots don’t have to be malicious to cause problems, however.  If neutral or even friendly bots hit your site too many times they will slow down your site’s backend performance, causing more harm than good.  If your prized human visitors are having a bad experience on your site, no bot is worth its weight in gold.Throttling limits the number of times any one client can hit your site.  You set the maximum number of requests allowed in a given time period, and if a bot or any other type of client hits that number, it will be rejected.  The idea is to set your throttling limit high enough that friendly bots have room to roam your pages, but low enough that bots hitting your site excessively will be cut off before they can seriously impact performance.

Bot Traffic

To find your ideal throttle limit, examine your traffic patterns.  Develop an idea of how many requests per minute from each client are normal for your site.   A rule of thumb for setting the throttle figure might go something like this:

  • Find out the peak number of hits your server receives per day on its busiest days (let’s call this number “A”)
  • Find out the typical number of visitors to your site per day (number  “B”)
  • Divide A and B (call the result “C”)
  • Convert C to a “per minute” figure by dividing by 1440 (call the result “D”)
  • Multiply D by 10 (number “E”)

“E” would be the threshold to set as the throttle limit on a per-minute basis.  It assures that all activity on your site will go on undisturbed except for a totally wayward bot that is threatening your site’s performance.

(If you’re a Yottaa customer, you can go to “Optimizer Overview” page under the Optimizer tab on the Yottaa Dashboard, and you’ll see a graph of your site’s requests.  This includes all hits to your server, including bot traffic invisible to Google Analytics.)

How Much Bot Traffic Do YOU Have?
Take a look at your site traffic and see if bots are in danger of impacting your site’s performance.  Don’t let bot traffic drag you down!

Photo courtesy of extranoise on Flickr


Yottaa How to Optimize Order of Execution Ebook Download

Don’t let slow site performance cost you conversions.Let's Talk