The Truth About Bot Traffic (And How to Prevent it from Killing Performance)
Typical traffic numbers for small- to medium-sized web sites.
Imagine you own an eCommerce site, and dozens of customers are trying to check out their shopping carts at 4:00 pm. If your server CPU is busy serving thousands of hits from bots at that moment – with bot requests taking up, say, 80% of your server processing capacity – then the human visitors’ experiences will suffer. Some of them might bounce away and never return, finding the product elsewhere. And it’s all because of bots!
Why Don’t I See Bots From Web Analytics?
The Good, The Bad, The?
It would be one thing if blocking all bots were a viable solution. But as you’re surely aware, the “good” bots from sites like Google, Bing, and other companies drive your site’s SEO value. Smart site owners would sooner roll out a red carpet and usher these bots onto their pages than block them. So if you want to both allow good bots to crawl your site but save your server processing capacity for your human visitors, then the solution for dealing with bot traffic must lie somewhere between “carte blanche” and total blocking.
There are various methods for dealing with bots that fall into this middle space. IP blocking is a way to deal with individual bots that you have identified as troublesome by keeping them off your site. Throttling deals with bots generally (good, bad, and everywhere between) by placing a limit on the number of times one can hit your site.
How IP Blocking Can Help Your Site’s Safety and Performance
Some bots are bad. They might be scraping your site’s content to republish it (illegally) elsewhere on the web, posting spam comments on your blog, or showing advertisements to some of your visitors.
To deal with pernicious bots, there are tools that allow you to block them. First, you must find out which bots are causing trouble. There are a number of ways to find out if your site is getting unusually high amount of hits from a certain bot: your hosting provider, firewall provider, or other backend service provider can provide that information. There?s also the raw server log for your site, but logs are usually so massive that you need a tool like Deep Log Analyzer or AWStats to read it effectively.
Once you’ve identified one or more IP addresses to block, enlist a firewall service (Yottaa’s is pictured above) or use your existing one. Enter as many IP addresses as you wish to block.
How Throttling Can Help Your Site’s Performance
Bots don’t have to be malicious to cause problems, however. If neutral or even friendly bots hit your site too many times they will slow down your site’s backend performance, causing more harm than good. If your prized human visitors are having a bad experience on your site, no bot is worth its weight in gold.Throttling limits the number of times any one client can hit your site. You set the maximum number of requests allowed in a given time period, and if a bot or any other type of client hits that number, it will be rejected. The idea is to set your throttling limit high enough that friendly bots have room to roam your pages, but low enough that bots hitting your site excessively will be cut off before they can seriously impact performance.
To find your ideal throttle limit, examine your traffic patterns. Develop an idea of how many requests per minute from each client are normal for your site. A rule of thumb for setting the throttle figure might go something like this:
- Find out the peak number of hits your server receives per day on its busiest days (let’s call this number “A”)
- Find out the typical number of visitors to your site per day (number “B”)
- Divide A and B (call the result “C”)
- Convert C to a “per minute” figure by dividing by 1440 (call the result “D”)
- Multiply D by 10 (number “E”)
“E” would be the threshold to set as the throttle limit on a per-minute basis. It assures that all activity on your site will go on undisturbed except for a totally wayward bot that is threatening your site’s performance.
(If you’re a Yottaa customer, you can go to “Optimizer Overview” page under the Optimizer tab on the Yottaa Dashboard, and you’ll see a graph of your site’s requests. This includes all hits to your server, including bot traffic invisible to Google Analytics.)
How Much Bot Traffic Do YOU Have?
Take a look at your site traffic and see if bots are in danger of impacting your site’s performance. Don’t let bot traffic drag you down!
Photo courtesy of extranoise on Flickr.