Robert, our Technical Services Director, sent the following note to our tech team the other day. While it was aimed at an internal audience, I think there's value here for everyone.
Uptime is very important, both from our side and the customer’s side. It’s a way to quantify reliability and it’s one of the most important things for someone in the IT industry to understand.
While a 99% uptime guarantee sounds good on paper, what does that really mean? In reality it means 14 minutes and 24 seconds of downtime a day, and when you add it up it’s over 87 hours per year.
Typically downtime doesn’t happen in small 14 minute blocks, but hours at a time… hopefully not all at once.
What’s tricky is when an issue happens and it doesn’t affect reliability numbers, but it severely affects service- such as Gmail’s issues back in September. They had delayed emails for up to half a day, but technically the emails were delivered. Google's systems weren’t completely down, so it didn’t count against their uptime numbers (not picking on Gmail- this is just an example).
So you can’t just go off of uptime as the only way to quantify quality.
Both Microsoft and Google guarantee 99.9% uptime- allowing for 9 hours of downtime a year. Putting yourself in our customers' shoes, 9 hours of downtime is a big deal. Suddenly the idea of 99.9%- which might sound like it's just a rounding error away from perfection- doesn't sound so perfect.
It's our job to understand this reality and ensure that our customers are armed and ready to deal with the inevitable downtime that will come their way- no matter how many 9's of uptime is guaranteed.
Pingdom is an example of a third-party service that helps measure uptime and reliability for themselves.
They also provide this uptime cheat sheet for a quick reference.