By Dominic Wellington, Director of Marketing, EMEA at Moogsoft Inc
It’s important to learn from experience, but focusing too much on the past can blind you to the future. AIOps offers the promise of real-time insights to help experts work more efficiently and solve problems faster.
It is said that armies are always ready to fight the last war. There are quite enough military metaphors in IT already, but I think this one is interesting to explore briefly. One reason for that fascination with matters military is that both IT and the armed forces deal with the problem of creating standardised responses to any conceivable problem, and making preparations to put those plans into action at scale and at speed.
At root, this is a worthwhile activity; after all, in the heat of the moment, there might not be time to spare to refer to experts and work out a course of action from first principles. The typical IT incident response process does not involve bullets (the private fantasies of burnt-out sysadmins notwithstanding), but the principle holds: When the manure impacts the air-conditioning unit is no time to be wondering what to do. You want to have a well-defined and documented go-to response that you can set into motion right away — ideally from a position of shelter.
There is an undisclosed assumption at the heart of all this, though, and here it is: that it is possible to document contingencies and their responses exhaustively.
Militaries around the world might actually have plans for what to do in case of invasion by Canada, or the advent of a zombie apocalypse (no, really). However, none of these plans have actually been practiced and drilled for, so they remain more exercises for planners than real, practical contingency plans.
Even these somewhat silly examples, though, are still in the domain of what Donald Rumsfeld famously called the “known unknowns”: situations that we can imagine and plan for. Militaries are by and large supremely hard-headed organisations, and do not venture very far beyond these areas.
How Can IT Operations Deal With The Unknown?
IT also used to have the luxury of operating in this relatively predictable way. Every piece of infrastructure and its configuration could be documented in the CMDB. Incident response could be mapped out in an ITSM system, and over time, known good responses could be documented in a knowledge base. Standard responses to recurring situations could be written into run books, which operators would refer to when something went wrong.
These days, every part of that comforting assumption has been ripped away. Infrastructure comes and goes with user demand, modifying itself without human input, growing and shrinking faster than it can be documented. The response that was good yesterday is obsolete today, as underlying assumptions have changed beyond recognition. Referring to a months-old run book is as likely to make the problem worse as it is to solve it. We are operating more and more in the domain of Rumsfeld’s “unknown unknowns.”
The shell-shocked veterans of this new world tend to have one of two reactions: either to give in to what looks like chaos and dismiss the entire notion and possibility of process, or to cling ever more obstinately to that process, forgetting what the structure of the process is supposed to enable.
Here Comes IT Operations
There is a middle way. Going back one last time to the military analogy, armed forces do not throw out their old doctrines, but they do update them. These days, “cavalry” does not mean horses, but it does still refer to rapidly-deployed forces, generally vehicle- or even helicopter-borne. There are any number of differences between a helicopter and a horse, but the core is that both allow a commander to position new troops quickly into the battlefield, fulfilling much the same general function as horse-mounted troops did before the walls of Troy.
IT needs to go through the same transition. The idea that we need to give up on is that everything can be planned for, and that humans can fill the gaps. The new world of IT is far too unpredictable for that, and the business is pushing for ever faster changes. Meanwhile, human operators are burning out on tasks that should rightly be beneath their notice.
The scale and rate of change of modern IT infrastructure mean that trying to document every possible contingency is an unwinnable Red Queen’s Race. No matter how much effort you put into filtering events, documenting the environment, and mapping out responses, you will never achieve full coverage — and meanwhile you have no energy to spare for what business users are asking you to do.
Fortunately, the new field of AIOps offers an alternative. Algorithmic and data-science techniques can sift the stupendous volumes of event data generated by modern IT infrastructures, and deliver useful insights without needing to be spoon-fed by humans. Meanwhile, those humans can focus on more useful work, not just frantically stuffing fingers into leaks.
Human Expertise Is Too Valuable To Waste
Human expertise is too valuable — and, quite frankly, too expensive — to be wasted on routine issues. Machine analysis offers us the opportunity to preserve human intervention for genuine exceptions.
In this scenario, there is still a service desk, and the service it offers is actually significantly better, because humans and machines are working together, each to their strengths. Services still need to be mapped, but the human input is at a high level, defining the business relationships. The actual ever-changing infrastructure can be filled in by automated systems, and algorithms can then use that information in real time to tie any issues back to the business service they are affecting. There is still a need for human expertise, but it can be applied intelligently and captured algorithmically, avoiding unnecessary distraction and disruption.
If this sounds like science fiction, it really is not. However, it does require accepting that the terrain has changed substantially since the maps were drawn up, and dead reckoning is only going to get us so far. New approaches are required, and they do not require burning down everything that went before. AIOps is about applying new techniques to the same old problems of IT Operations, in combination with what went before and will remain relevant in the future.
Learn more from Dominic about AIOps at The Conference For Service Desk Leaders 2018 in March, where he will be presenting: IT Operations, AI & Machine Learning. Book your tickets to #SDI18 here.