Intuit and the Tyranny of the Uptime Clock
Those of you following my Twittering and blog posts must think I have become obsessed with the Intuit outage. At CSG we operate a hosted enterprise software service and face the tyranny of the uptime clock -- just like Intuit. As the technology industry moves to adopt cloud computing, we all suffer a credibility loss when a major player like Intuit has a long term outage like this one. The lack of an explanation, and generally poor levels of communication by Intuit during this episode does not help. Sure they could not post on their own websites while down, but they have official blogs outside of their control that were up and so they did have the ability to communicate. Here is a short list of the communications:
Intuit on Twitter @Intuit: First post was 11.5 hours into the outage, at this writing 8 posts, including a gap of 16 hours before the latest post saying they are now on the way back up. The posts pointed to their community page with 4 undated or time stamped updates, and 2 references to the small business blog, where they posted an update 12 hours into the outage.
Quicken on Twitter @Quicken: First post was 13 hours into the outage, at this writing only 4 posts -- saying they are working on it.
Official Quicken Blog: No posts, last post was April 26th.
Quickbooks on Twitter @Quickbooks: No posts, last post was May 21st.
The main site just now came back online -- making the outage approximately 34 hours in duration. Current explanation:
Our preliminary investigation indicates the outage occurred during a routine maintenance procedure Tuesday night. An accidental power failure during that procedure affected both our primary and backup systems, taking a number of Intuit websites and services offline. While power was quickly restored, we're working diligently to validate our systems and bring them back into full operation.
Intuit reported 300,000 online customers in May of this year -- many of whom use accounting and merchant services applications that require near universal uptime. In the industry this is often referred to and "four nines" or "five nines" uptime for 99.99% and 99.999% uptime.
A few basics about uptime: Scheduled outages are usually not included in the calculation, so the .001% downtime permitted in five nines uptime buys only 5.26 minutes of unscheduled downtime in a year. Three nines gets you almost an hour, and two nines gets you almost a day. Fortunately nobody died in this outage, so even a 34 hour outage is not a catastrophe on the BP scale. But it will take 388 years of perfect uptime before Intiut can claim five nines of uptime.
All of us are relieved that they are back online. This event will undoubtedly slow migrations to the cloud, and should give all of us reason to check and recheck our redundancy and uptime plans. In addition, we should be checking and rechecking our communication plans associated with any downtime. We are certainly capable of turning a bad situation worse by failing to communicate well with customers.