Enterprises continue to struggle with the scale and impact of network incidents every day. Inability to respond fast enough to these incidents impedes the ability of Network Operations to deliver mission-critical business services. This adds significant risks to the business including decreased customer satisfaction and attrition, lost revenue, financial penalties, damage to brand and more.
Today, transactions in this industry are largely digital. A network outage can be paralyzing to customers and catastrophic to the business. Financial institutions and banks depend on reliable and time-bound execution of financial transactions. In May 2018, Visa’s payment systems suffered a serious network outage before the weekend, leaving customers unable to make card payments. ATMs soon ran out of cash in several locations, further exacerbating the problem.
Millions of business transactions take place daily online or through a vast number of outlets and their point-of-sale (PoS) systems. A network outage impacting these outlets or the ecommerce system can hurt revenue, and businesses can even lose customers’ patronage forever. Macy’s payment processing system crashed on a critical shopping day, Black Friday 2017—a nightmare scenario for the business. Both in-store and online customers could not complete transactions, costing the chain millions of dollars.
In 2017 a network and system outage caused the entire British Airways fleet to be grounded, leaving thousands of passengers stranded over one of the busiest weekends for travel in the UK. This had a serious financial impact via lost revenue and compensation, and it also damaged the perceived trustworthiness of the airline.
According to a global survey of hundreds of network ops professionals by Dimensional Research, 74% of respondents said network-related outages occur several times a year in their organization, and 59% reported that network complexity is only growing and network outages are more frequent than ever. Over 60% of the network-related issues were reported by end users, indicating that network incidents materially affect employees and customers. When it came to remediation, 79% reported it takes hours and even up to two days for network issues to be remediated on average.
Businesses and Network Operations teams are not blind to these challenges and current methods are inadequate. Diagnosing and resolving network incidents are largely manual processes, prone to human error and extremely time consuming. According to another study of network ops professionals, 71% of respondents reported primarily using command line interface (CLI)-based tools for troubleshooting, and only 4% of teams indicated they apply automation for network diagnosis and troubleshooting in a satisfactory way.
Frontline agents have to swivel chair across multiple command-line tools and poorly-maintained scripts. They also lack contextual and prescriptive procedural guidance to quickly validate, diagnose, and resolve network incidents. This leads to a high rate of escalations to expensive and scarce level 2/3 agents, Subject Matter Experts (SMEs), and engineers who have to repeat steps to address the incident, leading to further delays.
The other significant challenge is that key operational knowledge is in the minds of SMEs (aka “tribal knowledge”), and they are spread across the organization. No satisfactory tools are available to capture this tribal knowledge and make it available to frontline agents to quickly resolve incidents. In the same survey cited above, 57% of respondents stated that the inability to codify and share best practices was hampering effective troubleshooting, and 45% state that problems collaborating and coordinating across teams was a top challenge for network troubleshooting.
Automation accelerates the resolution process, reduces human errors, and helps Network Operations teams handle a growing network footprint and associated incidents.
Current automation tools fail to live up to their promise for many reasons:
Meanwhile, existing Knowledge Management tools also fail to live up to their promise to capture tribal knowledge and deliver context-specific operational guidance to frontline agents and work in tandem with automation.
How can Human-Guided Automation help Network Operations teams in their approach to incident resolution?
Automation and human actions dramatically improve incident resolution capabilities for Network Operations. Human-Guided Automation brings together two core ingredients for an effective and scalable incident resolution strategy, providing users the ability to:
Resolve Systems is the market leader and pioneer in the concept of Human-Guided Automation. Resolve has been deployed by the largest Enterprises, Communications Service Providers, and Managed Service Providers across the world and is proven to rapidly transform network incident resolution with this powerful concept. Explore Human-Guided Automation to transform incident resolution for network operations teams.
Automating network health checks & diagnostics accelerates service restoration during severe weather