The State of IT Automation: New Pressures Invite New Opportunities Read Report

New Paradigm for Network Operations Incident Resolution with Human-Guided Automation

New Paradigm for Network Operations Incident Resolution with Human-Guided Automation
September 25, 2018 • Resolve Staffer

Enterprises continue to struggle with the scale and impact of network incidents every day. Inability to respond fast enough to these incidents impedes the ability of Network Operations to deliver mission-critical business services. This adds significant risks to the business including decreased customer satisfaction and attrition, lost revenue, financial penalties, damage to brand and more.

Network Incidents & Outages Impact all Industries

Financial & Banking

Today, transactions in this industry are largely digital. A network outage can be paralyzing to customers and catastrophic to the business. Financial institutions and banks depend on reliable and time-bound execution of financial transactions. In May 2018, Visa’s payment systems suffered a serious network outage before the weekend, leaving customers unable to make card payments. ATMs soon ran out of cash in several locations, further exacerbating the problem.

Retail

Millions of business transactions take place daily online or through a vast number of outlets and their point-of-sale (PoS) systems. A network outage impacting these outlets or the ecommerce system can hurt revenue, and businesses can even lose customers’ patronage forever. Macy’s payment processing system crashed on a critical shopping day, Black Friday 2017—a nightmare scenario for the business. Both in-store and online customers could not complete transactions, costing the chain millions of dollars.

Ready to learn more about accelerating Incident Resolution for Retail with Network Operations Automation? Read the blog now.

Transportation

In 2017 a network and system outage caused the entire British Airways fleet to be grounded, leaving thousands of passengers stranded over one of the busiest weekends for travel in the UK. This had a serious financial impact via lost revenue and compensation, and it also damaged the perceived trustworthiness of the airline.

Network Issues are Frequent and Growing, While Remediation Methods are Inefficient and Slow

According to a global survey of hundreds of network ops professionals by Dimensional Research, 74% of respondents said network-related outages occur several times a year in their organization, and 59% reported that network complexity is only growing and network outages are more frequent than ever. Over 60% of the network-related issues were reported by end users, indicating that network incidents materially affect employees and customers. When it came to remediation, 79% reported it takes hours and even up to two days for network issues to be remediated on average.

Businesses and Network Operations teams are not blind to these challenges and current methods are inadequate. Diagnosing and resolving network incidents are largely manual processes, prone to human error and extremely time consuming. According to another study of network ops professionals, 71% of respondents reported primarily using command line interface (CLI)-based tools for troubleshooting, and only 4% of teams indicated they apply automation for network diagnosis and troubleshooting in a satisfactory way.

Frontline agents have to swivel chair across multiple command-line tools and poorly-maintained scripts. They also lack contextual and prescriptive procedural guidance to quickly validate, diagnose, and resolve network incidents. This leads to a high rate of escalations to expensive and scarce level 2/3 agents, Subject Matter Experts (SMEs), and engineers who have to repeat steps to address the incident, leading to further delays.

The other significant challenge is that key operational knowledge is in the minds of SMEs (aka “tribal knowledge”), and they are spread across the organization. No satisfactory tools are available to capture this tribal knowledge and make it available to frontline agents to quickly resolve incidents. In the same survey cited above, 57% of respondents stated that the inability to codify and share best practices was hampering effective troubleshooting, and 45% state that problems collaborating and coordinating across teams was a top challenge for network troubleshooting.

Automation and Leveraging Human Expertise are Essential to a Robust Network Incident Resolution Strategy

Automation accelerates the resolution process, reduces human errors, and helps Network Operations teams handle a growing network footprint and associated incidents.

Current automation tools fail to live up to their promise for many reasons:

  • Lack ease of use and hard to maintain
  • Do not have essential capabilities for incident resolution
  • Require professional development skills that are not easy to find in Operational teams and are expensive resources

Meanwhile, existing Knowledge Management tools also fail to live up to their promise to capture tribal knowledge and deliver context-specific operational guidance to frontline agents and work in tandem with automation.

How can Human-Guided Automation help Network Operations teams in their approach to incident resolution?

Learn more about how automation accelerates network connectivity & performance incident resolution now.

Understanding Human-Guided Automation

Automation and human actions dramatically improve incident resolution capabilities for Network Operations. Human-Guided Automation brings together two core ingredients for an effective and scalable incident resolution strategy, providing users the ability to:

  • Fully automate validation, diagnosis, and resolution of network incidents with human approvals baked into the automated workflow at key points for human oversight
  • Use results of automation to lead humans, including less-experienced Level 1 agents, to precise, context-specific guidance to diagnose and resolve an incident
  • Automate component tasks within a larger manual process. Results of automation is available on-demand to other automation and human agents through the entire resolution process
  • Seamlessly embed automation within SME-approved manual procedures; results of automation can be consumed inline within manual actions to maximize agents’ productivity
  • Capture tribal knowledge with no coding skills to memorialize SME knowledge as automation for use by other individuals and teams
  • The simplest to the most complex incident types can be optimally addressed by a mix of automation and human expertise and judgment
  • The enterprise can implement an effective “left shift” strategy for the fastest response at the lowest cost. Incidents can be resolved with no human involvement or when a human needs to be involved, frontline agents or users in a self-service mode are able to resolve the incident. Costly and time-consuming escalations are a last resort.
  • Organizational knowledge no longer resides only in the minds of experts. It can be captured as procedures and automation and rapidly be put to use for resolving incidents. These packaged “safe” automation can be shared with frontline teams like Service Desk and Level 1 to execute specific diagnostic and remediation actions on the network infrastructure without escalations, whereas normally these frontline agents would not have permissions to access these systems.

Resolve Systems is the market leader and pioneer in the concept of Human-Guided Automation. Resolve has been deployed by the largest Enterprises, Communications Service Providers, and Managed Service Providers across the world and is proven to rapidly transform network incident resolution with this powerful concept. Explore Human-Guided Automation to transform incident resolution for network operations teams.

Resolve-Staff

About the Author, Resolve Staffer:

This post was written by one of the awesome contributors on the Resolve team.

Recommended Reads

The Rise of the Cognitive NOC and the Role of IT Process Automation

The Rise of the Cognitive NOC and the Role of IT Process Automation

Find out how the Cognitive NOC has become the driving force in network management.

What Is the Network Operations Center (NOC): A Brief Overview

What Is the Network Operations Center (NOC): A Brief Overview

How to make your NOC performance reach its full potential.

Getting Out of the 2010s Era of Alarm Avalanches

Getting Out of the 2010s Era of Alarm Avalanches

Leverage a scalable approach to alarm management by allowing technology to do the work.