Operations in DevOps: Command & Conquer
This was the best strategy game of 1995. The gameplay could be the godfather of modern Ops teams, super modern Ops teams who are looking beyond ITIL. Because it’s time to move into the future. And if you can play this kind of strategy game, you can move into the future.
In Command & Conquer, the player has what we call the god view. The gamer sees the world from above. With a clear mind the player deploys troops and leads the battles. It's a real time strategy game. This was my world back in the days when I had time for gaming.
The next generation after ITIL is using a similar model to command, control and coordinate emergency response. It's called the Incident Command System (ICS). This is beyond ITIL’s traditional structure which has 1st-, 2nd- and 3rd-Levels. ICS was developed to respond to wildfires in California and Arizona. Tech also has wildfires, when incidents jump and spread from one system to the next.
The idea is simple and requires three roles:
Operations Lead: Analyzes, investigates and fixes the incident
Incident Commander: Coordination of the incident response and resolution
Communication Lead: Manages the communication between teams and stakeholders
The same person can handle all three roles in a standard scenario, when for example only low priority tickets are coming in. But for medium and especially for high priority incidents, or for a high number of tickets, it’s necessary to split these tasks between three people:
The operation lead has to work without interruption. When your house is burning, you don’t want the fireman holding the hose to stop the water and answer questions. Each interruption slows the solution down.
If the operation lead needs something, another hose for instance, they contact the incident commander. The incident commander coordinates additional support and is ultimately responsible for providing a solution.
In the meantime, the communication lead collects the information and informs the users and other stakeholders about the work on the incident.
Why do we need to change to a new model? Because ITIL optimizes for cost with its 1st, 2nd, 3rd level, structure. But when your system is burning, you don’t have the time to work through each level until you get a result. You need the best strategy to act fast and command and conquer the situation.
With DevOps, we optimize for speed - we want to get the incident solved and keep it from spreading. And we have the flexibility to search for the person with the best knowledge for the job.
A structure that can handle wildfires can handle incident tickets in an IT department. And if you have played Command & Conquer, you are optimally trained for the Incident Commander role already.
To accelerate your incident response use the structure: command, communicate and operate.