The State of IT Automation: New Pressures Invite New Opportunities Read Report

Key Pillars of Incident Resolution and Understanding Their Inter-Dependencies

Key Pillars of Incident Resolution and Understanding Their Inter-Dependencies
December 1, 2014 • Resolve Staffer

Large network and IT operation centers handle hundreds to thousands of incidents daily. Many of these incidents even impact the quality of service offered to customers and affect revenue and customer loyalty. It is imperative for these operations centers to have a robust software tools strategy to quickly and cost effectively handle the resolution of these incidents. Despite trying numerous approaches like knowledge management and automation, these operations centers struggle to find a satisfactory approach to incident resolution.

L1 agents on the frontline resolving these incidents don’t have access the precise, context-specific guidance that’s needed to effectively solve the problem. They are forced to search for “hits” in the knowledge repositories that are designed to provide some directional guidance at best–not precise and tested prescriptive procedures for the specific context of the incident that the L1 agents need. They have therefore become escalation points to Level 2 engineers or field dispatch, significantly increasing costs and time to repair. In addition, these knowledge management systems have no mechanism to reduce tedious time-consuming steps or reduce scope for manual errors.

Automation software such as IT Process Automation tools from vendors like BMC, CA, HP and IPSoft can at best be applied to scenarios where closed loop automations can be defined and developed to diagnose and repair a problem. However, it is estimated that only 15% of incidents can be handled using this closed loop automation approach. The remaining 85% need some manual intervention that the automation tools do not provide.

These limitations of knowledge management or closed loop automation tools means that operation centers are forced to operate under a highly inefficient and costly incident resolution strategy with an extremely large army of L1 agents to manage the scale. What is needed is a solution with multi-faceted capabilities that work in complete synergy, making automation and manual resolutions effective. What should these capabilities look like? Let’s take a look.

1. Automation

Automation is the most powerful tool to accelerate the resolution process and reduce cost and error rates. Traditional approaches have looked at automation as all or nothing –either automation is applied to the resolution process from validation to diagnostics and resolution, or does not play a role in incident resolution. This does not have to be true. Automation can be used to automate parts of a manual process. The results of these automated sub-steps can then be contextually fed into manual-resolution guidance to drive a harmonious interplay between automated and manual sub-processes. This approach can not only cut time and error rate from manual processes, it can also allow businesses to invest in automation for any resolution process. As a plus, they can resolve incidents at the pace they deem most fit and not be blocked by development resources and budgets. For example, the validation of the incident can be automated in phase 1 and the diagnostics and repair can be automated in later phases.

Most automation vendors have failed to make automation drop dead easy to develop and have not considered the fact that operation centers have limited dedicated developer resources. It should be possible for non-professional developers like L2 agents or Subject Matter Experts (SME) to be able to codify procedures into automated steps without advanced coding skills. This becomes all the more important as incident resolution as any bottleneck to automation roll-out makes the solution ineffective.

2. Process Guidance

The first line of incident resolution when a human is required is the L1 agent. For resolution strategies to be effective, these L1 agents need to be able to quickly validate, diagnose and repair the problem. Most times, these L1 agents are less experienced and do not have the skill or knowledge to close the incident without precise and context-based guidance and support. The resolution strategy needs to accommodate this reality. Tools need to give L1 agents step-by-step and prescriptive guidance, hiding the underlying complexities, while simplifying the interaction with the tool. More importantly, the guidance tool needs to be able to leverage the power of automation at every possible opportunity (e.g. creating and updating tickets, validating the incident, gathering diagnostic information etc.) to accelerate the resolution process and remove scope for errors that L1 agents can inadvertently introduce.

3. Collaboration

The lifecycle of incident resolution is a team activity rather than an individual one. L1 agents need access to the right subject matter experts and contextual guidance to resolve each incident. When gaps in procedures or automations are found by L1 agents, they need to be able to seamlessly escalate to procedure / automation development teams to fill the gaps. When new automations / procedures are developed, key stakeholders and users need to be notified. L1 agents and SMEs need to have the ability to rate or comment on procedures and automations, or start a discussion thread on a specific ticket. All these collaborative activities need to happen seamlessly within the lifecycle of an incident resolution.

4. Insight

Large scale of incidents has created the need to have deeper insight into the nature of incidents in order to devise a pragmatic strategy for resolution. Some key questions are: Which incidents occur most frequently or have the highest business impact? What are the resolution steps that agents are following to solve an incident and which steps are causing the most delay? Who are the most productive agents, and are they effectively collaborating? These are key questions that need to be easily answered to devise a strategy spanning automation and manual processes. The incident resolution tool needs to make such insight part and parcel of the core resolution capabilities. Automations can be developed for incidents with the strongest business impact. Guided procedures can be tightened in cases where L1 agents are consistently getting stuck.

5. Process Improvement

One of the most important reasons for the failure of the incumbent approaches to incident resolution is their non-maintainability in the long haul. Procedures and automations are not maintained and fall stale. Gaps begin to emerge in the automation and procedure library and there is not a clear way to quickly identify these gaps and add new procedures and automations to the library in an agile fashion. A major contributor to this adverse situation is that consumers of the procedures and automations are not empowered to create and maintain content. Dedicated content and automation development teams, separate from the users, are created and over time a big chasm develops between the processes of the two groups. With the lack of a clear content maintenance strategy these systems lose credibility and agents simply stop using them, leading to their ultimate demise. Any successful incident resolution system needs a maintenance model that is seamless and integrates with the use of the tool.

Resolve is the only software tool in the market that fully integrates all the core capabilities – automation, process guidance, collaboration, insight and process improvement – to provide the market leading resolution solution. It is not just the individual capabilities but how these key pillars support and complement each other that provides the magnified value to customers. Some highlights of Resolve:

  • Resolve addresses 100% of the incidents through its support for both automations as well as guided manual resolutions.
  • Resolve takes the power of automation even deeper inside manual process to be able to automate sub-steps (e.g. incident validation, ticket creation and update). In addition, Resolve provides tools that empower even non-pro developers like L2 agents or SMEs to build automations with our RESOLVE Automation Builder.
  • Resolve supports a phased rollout approach to empower customers to get started quickly and see immediate ROI. This is critical for the success of any IT project and its sustained funding. In due course of time, and as the business case demands, manual processes can be progressively transformed into automated processes.
  • Resolve provides the best tools to analyze and draw insights from the incidents log as well as the use of resolution procedures to rapidly fine tune the procedures.
  • Resolve uniquely supports methodologies like Knowledge Centered Support (KCS), where the users of the procedures and automations are empowered for its maintenance. This immensely helps in the sustainability of the solution
Resolve-Staff

About the Author, Resolve Staffer:

This post was written by one of the awesome contributors on the Resolve team.

Recommended Reads

The Rise of the Cognitive NOC and the Role of IT Process Automation

The Rise of the Cognitive NOC and the Role of IT Process Automation

Find out how the Cognitive NOC has become the driving force in network management.

What Is the Network Operations Center (NOC): A Brief Overview

What Is the Network Operations Center (NOC): A Brief Overview

How to make your NOC performance reach its full potential.

Getting Out of the 2010s Era of Alarm Avalanches

Getting Out of the 2010s Era of Alarm Avalanches

Leverage a scalable approach to alarm management by allowing technology to do the work.