If you haven’t yet, you should sign up for a Pingdom account. I don’t care how good your nagios setup is, just remember: “who watches the watchers? ” I can’t count the number of times that Pingdom has noticed and alerted on our downtime first, BEFORE the monitoring system.
Yep, it’s $10 a month, but it’s dead-simple to set up and provides an automated and much-needed “down for everyone or just me?” sanity check on your servers. That’s not an affiliate link above, by the way. I don’t care whether or not you use Pingdom, but I’ve found it to be a very helpful tool.
OK, so I promised increased customer satisfation — what gives? Well, Pingdom has another trick up their sleeves: hosted status pages, which easily justifies the price of the service:
With the addition of a DNS entry, suddenly you have
status.myapp.com available for public consumption. As I’ve written about before your customers don’t care about your uptime… until you’re down. Suddenly, you’ll find that they care a lot about what the %#@$ is going on!
Downtime Visibility is the name of the game. If your app goes down and your customers don’t know why, or when it might be back, then you can kiss them goodbye — either to the immediate “SCREW THIS” factor or the long slow death of a thousand frustrations.
It’s simple: when your app’s down, tell your customers right away. Executed correctly, it looks like this:
@myapp (3 minutes ago) The site’s currently down. We’re working to diagnose and fix the problem, updates here and at status.myapp.com.
This communicates three key things:
- You know that there’s a problem.
- You care that your customers know about it, enough to have a status page and tweet about it right away.
- You’re working to fix it.
If your downtime lasts longer than 30 minutes, then tweet every 30 minutes, even if just to say “Still working on it.” In the absence of communication, people will assume that you don’t care. Obviously, you need to fix the problem, but it’s worth the 20 seconds (MAX!) it takes to send a quick tweet.
OK, I hear you, Pingdom’s status page is a bit spartan and you’d like something more. Here are some ideas:
- S3 static site hosting. Throw up a static HTML or text file.
- GitHub pages.
- A cheap hosting account. All you need is index.html, no dynamic scripting necessary.
- Blogger, WordPress, etc.
- A redirect to your
@myapp_ops) twitter feed (make sure that the redirect’s hosted by your DNS company).
I wouldn’t recommend:
- Putting your status page on your app server, or even your primary web server — this defeats the purpose.
- Using a name other than
status.myapp.com. People know to check there, and it’s easy to remember. See http://status.foxycart.com http://status.linode.com https://status.github.com, etc.
Now if you want to get fancy, you can: use a static site generator, a blog engine, etc. Style that page, brand it, go hog wild.
Just BE SURE to get VERY comfortable with your tools before the crap hits the fan. If your site goes down, and you’re having trouble updating your status page, or your status page gets overloaded… What’s the old saw? “Now you have two problems.”
Keep it simple.
- Plan for downtime.
- Know how to use your status tools AHEAD of time. Make sure you can quickly and easily post to Twitter and update your status page.
Patrick has an internal status page that exercises all of the parts of his system, and has Pingdom check THAT page. Much more complete than just a “hey, is