Dispatch #10/23
Hey there, welcome to our Dispatch!
We know that Incident Management can be a bit of a buzzkill, but trust us, it's front and center in SRE and DevOps for a reason. We get it. No one likes to be on-call and work under pressure, especially late at night or on weekends. But fear not; we're here to help! The company needs some help adapting its incident management to modern methods. And you know what that means - time to optimize!
Corporations still believe that optimization only means automating workflows and performing proper problem management. But this does not avoid those late-night headaches. Unsurprisingly, they've been using the same old techniques for the past ten years and wondering why their employees are quitting, and customers are complaining.
But now, they've finally realized that they must deal with increasing production changes and rely more heavily on 3rd party vendors like cloud providers. The world is changing, and so must operations. So it's time to take a closer look at Incident Management!
Here are some links we found interesting regarding this topic:
Report: Like always, it's interesting what Google is doing: https://sre.google/resources/practices-and-processes/anatomy-of-an-incident/
Article: Here is a second link regarding the approach from Google https://cloud.google.com/docs/security/incident-response
Article: And have you heard of Pager Duty? Small teams love their approach to managing on-call and incident response. It's fascinating stuff: https://response.pagerduty.com/
Podcast: If you prefer something to listen to, this podcast episode dives into the human aspect of incidents with Nora Jones of Jeliio: https://oncallmemaybe.com/episodes/finding-humanity-in-incidents-with-nora-jones-of-jeliio
Videos: If you want to see how incident management is transforming in the age of DevOps and cloud, these conference and presentation videos will do the trick: https://www.irconf.io/#session-videos
Video: Damon Edwards from Rundeck explains the transformation of incident mgmt in the age of DevOps and Cloud https://www.infoq.com/presentations/incident-management-devops-sre/
Now, we want to know your thoughts on incident management
What tools are essential, and where do you record and manage incidents?
How do change, problem, and incident management work together in your company?
Are you using the Incident-Command System? Have you combined build-run teams?
Let's get the discussion flowing and make incident management fun again!
Regards, Florian
www.TechAccelerationAndResilience.com
Published weekly in a row: 10