Amazon and 11 nines

Amazon has claimed 11 nines on Availability . It is very very hard feast to accomplish, and if they have done it (I would love to know how they decided on that number), that is a ground breaking achievement.

To see why, lets see what it means. Availability is measured as MTTR (Mean time to Recovery)/ MTTF (Mean time to Failure) as a percentage. In other words, it is time to recover after a failure, divided by mean time for such a failure happen. Reliability is measured in terms of number of nines in availability. So Amazon S3 will be fail for a second only for every 10^9 seconds, or 10^9/(360*24*60*60) = 32 years!!
On their seminal paper “High Availability Computer Systems”, Jim Gary and Daniel Siewiorek defined availability classes, as follows
unmanaged 90.% – 50,000 mins/year downtime
managed 99.% – 5,000 mins/year downtime
well-managed 99.9% – 500 mins/year downtime
fault-tolerant 99.99% – 50 mins/year downtime
high-availability 99.999% – 5 mins/year downtime
very-high-availability 99.9999% – .5 mins/year downtime
ultra-availability 99.99999% – .05 mins/year downtime
As you will notice even they defined only 7 nines. So we do not have a name to call what Amazon has claimed.


