Yesterday, I had one of those nasty experiences that teach us a lot about good business practices. The story I’m about to tell you doesn’t contain much in the way of news but it does serve to remind us all about a few areas of best practices.
I use Spry to host my web site and my company mail. I access my mail via Thunderbird, Gmail, or my Blackberry depending on where I am and on what computer. When things became, as they say in the movies, “quiet…too quiet”, I dug a little deeper.
“Surely,” I said to myself, “if there was a problem there would have been some sort of notice sent to me. This isn’t a free service, after all.” Lesson number one – if you’re a paid option in a field where free ones exist, what people are paying for, in part, is excellent service and support.
It turned out Spry had suffered a power outage. I gather that they had voltage loss issues with Seattle City Light and things went downhill from there. While it seems they had some fail-over measures in place, they didn’t quickly recognize that the original voltage loss also prevented their battery backups from recharging from available power. Lesson number 2 – which Spry actually seems to have learned – is that the first question one should ask the first day on the job is “what can go wrong.” Then put a WRITTEN plan in place to handle the crisis while doing everything one can upfront to prevent it in the first place. Obviously Spry did nothing wrong here in terms of prevention – when the power company fails you, you’re screwed. They had battery backups in place and diesel generators that they switched to (after a 3 hour outage) which allowed them to restore parts of their service. One which remained down (besides my mail) was their forums which is the main method through which they communicate.
On to lesson 3 and this is the big one. No news here – we’ve discussed it before. However – one more time – when you have a problem, the single most important thing you can do is COMMUNICATE with the people who are inconvenienced, which is probably an understatement for those of us who were unable to receive communications from business partners. What could Spry have done since their forums weren’t working to let customers know there was a problem?
- Every account has a backup email, one which Spry doesn’t host. Send a mail to us to let us know what’s going on. That database is on an unavailable Spry server? Not good enough – keep a backup someplace else, preferably burned each night to a disc or flash memory card. Yes, there are data security risks that way but encrypt it, password protect it, and only make one copy.
- Spry’s home page was working. Post something there, even if it’s a link to another, less visible page. Does this conflict with your “100% up-time guarantee” which is on the home page? Yes, but it also shows you’re honest, communicative, and customer-focused, even at your own expense.
- The forums did come back up way before email did. In fact, there was one thread started by Spry that said they were having a problem. That’s it – it was not updated hourly (which is what I would have done). In fact, it wasn’t updated after about 10 hours into the crisis. Users, of course, started another thread which contained such sweet thoughts as “So, 21 hours. Nothing?” and “the low level of communication is unbelievable and unacceptable.” No kidding.
- Did they monitor Twitter and other widely available communication tools for comments on what was going on? Had they planned better (and they now have done this), they could encourage users to follow them on Twitter and direct message everyone when there is a crisis. Find tools that aren’t dependent on your own service and which can’t fail when you do!
There is a full explanation and an apology posted this morning, 30+ hours after the problem began. Too little too late, no? Stuff happens. Power fails, people don’t show up for work, your plane is delayed by weather. The issue isn’t what went wrong – it’s what you did to fix it and how you communicated with anyone the problem affected. Did you acknowledge the problem and explain what you were doing to fix it? Did you file status reports on a regular basis?
Spry failed. What are you doing to make sure you don’t?