Incidents have always been a fact of life for people in IT and Ops. Today, it’s web developers, cloud service providers, and DevOps practitioners that are getting a crash course in incident communication.
Web scale incident communication is more complex than simply sending a bulk email. There are different audiences to consider. Different thresholds for messaging and response expectations.
Since downtime is inevitable, it’s best to plan ahead and make sure your team is ready.
If you’re hosting on AWS, you can expect some pretty excellent reliability and availability.
If your service isn’t responding, it’s likely an issue with your own code. On the other hand, system outages do happen. They’re usually pretty minor.
Sometimes, they’re not.
While AWS is the largest cloud provider and boasts excellent reliability, the service still experiences downtime. On Feb. 28, 2017, a US region of Amazon’s heavily-used S3 storage facility went dark for the better part of 4 hours. As a result, major organizations across the web experienced total or partial outages, including Quora, Medium, Imgur, Twilio, MailChimp, and many more.
This is our best attempt at a guide on actively keeping yourself in the loop for AWS system status. When incidents like this happen, it’s helpful to know how AWS thinks about reporting system-wide status, and what you can do to keep yourself updated.
Whether you’re a developer building on AWS, or a journalist or analyst keeping in the know on the web’s largest cloud provider, this guide aims to help you understand AWS status a little better.
When teams at Facebook, Twitter, Netflix, and Airbnb turn to a service for design collaboration, they fire up InVision. With millions of users worldwide, InVision is a powerful platform for product design collaboration.
As a cloud service serving so many end users, it’s critical for InVision to keep users updated about service status. The team brought on StatusPage to help communicate with its community around incidents, downtime, and scheduled maintenance.
“We viewed StatusPage as the industry leader and was the obvious choice by our support and engineering leadership," said Brandon Wolf, Vice President of User Enablement at InVision. "Personally, I was enamored with the extensibility and stock integration with other best-of-breed services, affording a quick-to-implement convenient web of statuses. Knowing StatusPage resides within the greater Atlassian family only made our selection and continued relationship easier."
At StatusPage, we’ve come across this question a lot.
“I’ve got my users on all these different deployments. How do I let one group know about an outage without alarming all the users on different servers who aren’t affected?"
It’s a good question. We talked with the team at Duo Security and learned about how they’re solving this problem.
Duo Security provides two-factor authentication and other security services for thousands of companies and millions of end users. Teams at Facebook, NASA, Yelp, and many more top companies count on Duo to keep their IT secure.
Launched in 2010, Duo puts a lot of effort into what security means for teams using cloud tools and working remotely. As a security service hosted in the cloud, Duo’s system status is extremely critical to their customers. When incidents occur, their customers need clear, correct, and immediate updates.
Just over three years ago, we embarked on a journey with a simple goal in mind. The software world was moving quickly in the direction of rented servers, hosted solutions, and outsourced vendors, all in service of allowing teams and companies to move quicker and to be more nimble than ever before. What used to be built and maintained internally was now delegated to other services or vendors.
And although every service provider and vendor strives for perfect uptime and operations, the data around availability tell us what we already know. Unexpected problems happen, and they happen to everyone. From Amazon Web Services, to Salesforce, to Comcast phone service this week, nobody is safe from things going wrong (even Pokemon Go!).
This new world was great, but it was missing a core component in the relationship between companies and their service vendors. That component, of course, was status communication.
Before we got started, status communication was very costly to build and maintain, and in most cases just didn’t exist. Our simple goal was to provide the ability for every software company in the world to build and maintain their own custom status page. Having felt this pain ourselves, we were as equipped as anyone to build the right solution, and the timing couldn’t have been better. From a handful of customers in early 2013 to thousands of customers today, it’s been amazing to watch all different types of companies build trust with their customers and their colleagues, saving everyone lots of time and money in the process.
Today, we’re super excited to announce that we’re joining forces with Atlassian to accelerate our progress and our continued march toward transparency across the web.
This is a guest post from Alistair Mclachlan. Alistair is Head of Support at FiveStars Loyalty, a San Francisco based startup that helps businesses and communities thrive by turning every transaction into a relationship.
To hear ‘Server Down’ is to hear two words which instill fear into the heart of any Support Leader. System outages are unavoidable but while Engineering is scrambling to fix the issue, there are certain things we can do in Customer Support to mitigate the impact on paying customers.
During normal circumstances, FiveStars Support carefully treads the tightrope, balancing call deflection with issue resolution in a way that doesn’t negatively impact Customer Experience. But if you have a Support Team staffed to take 20 calls per hour, a sustained increase to ten times that volume is going to sink the ship unless you have effective outage planning.
We’re half way through 2016. It’s hard to believe. We wanted to mark the occasion by celebrating the best reads we’ve seen this year so far. Today we’re looking at the best writing on startups and entrepreneurship. Enjoy.
We’re half way through 2016. It’s hard to believe. We wanted to mark the occasion by celebrating the best reads we’ve seen this year so far. Today we’re looking at the best writing on customer support. Enjoy.
Let’s meet at 8 a.m.
Yes, that’s Okay.
With writing, it’s the one thing that’s not debatable. Especially when you’re writing on behalf of an organization.
There's been a lot of focus on “self service” in customer support over the last several years. Support teams are seeing fewer “easy” support questions and more complicated problems because customers are solving the easy stuff on their own.
I have a few thoughts on why this might be. For one, design and product teams continue to improve their craft. Products are simply better than they used to be. Navigation paths are more clear, experiences are more intuitive. Secondly, the users are more sophisticated and skilled at interacting with technology tools. People are spending a lot more time with software and interfaces than ever before. Our brains are drawing connections between all the different apps we use every day.
Designers call these UX patterns. The idea being that most software users have come to recognize and expect an interface to behave certain ways. So even brand new users have some subconscious familiarity with your product.
In other words: most people can figure out how to change their password.