Seldom will you hear people involved with
managing a Web site say they have too much traffic. More is always better,
right? More traffic leads to more impressive traffic reports that leads to more
impressions that leads to higher ad rates that leads to everybody being happy.
Nearly all of the time
It was a Friday afternoon when the Web server
died. A quiet Friday. Too quiet, as it turned out. I wasn’t looking for
trouble. Trouble found me.
The Milwaukee-area news and information Web
portal I manage started seeing slow response times. I began receiving e-mails
from people citing the site’s sluggish performance.
You always ignore the first one or two of those
e-mails due to the variable nature of people’s Internet connections; you wait
for a pattern to emerge.
It didn’t take long, as it happened, to confirm
why those questions were coming in.
I began noticing the same poor performance over
our internal network connection. It rapidly got a lot worse and pages were
taking up to two minutes to load.
Plot thickens
But not all pages. I quickly tried another test
site and discovered it was working just fine. That site happened to use only
simple .htm pages. I quickly came to the realization that pages with the .asp
extension were the culprit.
Investigating the server found little hard drive
activity but very high processor utilization. A reboot cleared the problem
temporarily — for about a minute. Then it slugged down again. And again. I
tried shutting down the .asp service and restarting. Still nothing.
If you’ve ever had a Web site that was
effectively down, you know how it feels in the pit of your stomach. It’s that
same feeling I imagine you get as a publisher if the paper is going to be late.
It’s the feeling that comes after realizing you’ve tried every trick in the
book to fix the problem without success.
The dread and the drone
Next comes the dread of calling some Microsoft
tech support line only to hear some drone on the other end: “OK, first we’re
going to try rebooting the server. OK, have you stopped and restarted the
service? OK, have you made any changes to your settings lately that you can
remember? OK, I think we’re going to have to go ahead and reinstall the OS.”
Then, triage. Changing pages to .htm from .asp
strips out some functionality but people can still get their news online. This
goes on for hours and the idea of leaving work on Friday evening at anything
resembling a reasonable hour was fast becoming a memory at this point.
It ends, but why?
Then, nearly as abruptly as it started, it was
over. The .asp functionality started to reappear on pages I was monitoring,
improving minute by minute until performance was totally back to normal.
I’m ambivalent about the miracle cure, however.
The only thing worse than a major problem is a major problem that magically
cures itself and then stays cured. Ever have a car that would only act up for
you and never for the mechanic? From that point on you might be willing to drive
around town, but you’re sure not going to take it on a long trip!
About two weeks later I was running the monthly Web traffic reports and
immediately noticed an enormous spike in traffic on one particular date: Feb.
21. It didn’t take long for my mind to begin screaming “That’s it! That
was the day!”
The culprit uncovered
The report soon revealed that the entire spike
was centered around one page and coming from one referrer: drudgereport.com.
If you’re in the news business you’re
probably familiar with the site. It’s a news and compilation site that samples
headlines deemed of interest to one particular person, Matt Drudge, who runs the
site as a one-man operation and receives enormous amounts of traffic (a lot more
detail on how he runs his site can be found in a recent issue of Business 2.0
located at http://www.business2.com/articles/mag/0,1640,47762,00.html).
Having a news link go live on the site can spike
traffic upwards in an instant.
The Drudge Report maintains archives and it was a
simple task to verify that a story from our site had been linked on Feb. 21.
The volume of requests for that page had exceeded
the available process threads for the ASP service and it took minutes to clear
enough threads for a new one to start. In other words, a logjam.
Often, news links from The Washington Post, The
New York Times and other similarly large papers are posted daily on the Drudge
Report. These cause few problems for originating newspapers because they have
the infrastructure necessary to deal with the flood of subsequent traffic.
But sites with less technological horsepower can
indeed run into problems. The good news is that poor site performance occurs
much less often that it used to.
It was only four or five years ago when Web sites
such as Amazon.com and eBay were crashing on almost a daily basis.
Site reliability has improved dramatically,
thanks to clustering and massively redundant arrays of servers. Most sites today
boast uptime statistics that compare favorably to gas or electric utilities.
However, as site managers we all deal with the
budgets our masters (can you tell I’ve watched “Ronin” a few too many
times?) have assigned us. Web architecture is a lot like building a race car:
Going fast costs money. How fast do you want to go? It’s not technically
difficult to build a site that can handle a Drudge-like flood, but for most
organizations, a site with that much processing power would be akin to smashing
a mosquito with a sledgehammer.
Fortunately, most news sites build traffic in a
fairly linear and predictable fashion. That allows publishers to budget wisely
as they expand bandwidth and server capacity.
But when you’re Drudged, look out. Linear and
predictable fly out the window, as I found out the hard way on a quiet Friday
afternoon.
Hays Goodman is the webmaster for Newspapers
& Technology and GMToday, a Milwaukee-area portal. He has been involved in
professional Internet development for five years and welcomes your comments,
feedback and suggestions for future Tips & Tricks columns. Write to him at webmaster@conleynet.com
and include your contact information.