A leading provider of advanced network communications and technology solutions for consumers, small businesses, enterprise organizations, and carrier partners across the U.S. wanted to become more powerful, using automation, as to better understand the customer impact of bad weather and proactively improve their customer experience.
The Fortune 500 company would need to overcome a handful of pain points:
- Alarm volume was too great for technicians to handle manually.
- Alarm rationalization only went so far. Post-rationalizing left too many alarms to investigate, correlate, and find their root cause.
- Mean Time to Ticket Outages (MTTT) was too high to achieve an MTTR goal of less than four hours.
- Seventy percent of Outage Identification came from the customer and customer-facing centers.
- Process documents for alarms and event handling were outdated.
And they had their sights set on two big business goals, including advancements in scripting and reducing alert volume to 10,000 per week.
Network Operations Center: 4 Automation Use Cases for Transformation
The company started in its Network Operations Center (NOC) as its ideal location for automation opportunities, and what would be come transformative change. Company leaders mapped its automation needs to its business goals, and identified four use cases that made the most sense for the organization.
Automation, built onto those already existing, led to a successful transformation of the NOC. The company improved its alarm triage processes and made it much more efficient, as the need for a NOC surveillance organization was completely eliminated. With the help of automation, each IT professional who was bogged down triaging alarms could then focus on remediation issues and more important tasks that supported business goals.
1. Circuit Enrichment: Looking up Circuit ID numbers had been a manual process that took up an astounding amount of time. It limited the response team’s efficiency and productivity, especially considering the overwhelming quantity of alarms coming in.
Automation took place of the IT team, automatically looking up Circuit IDs and adding them into the Netcool alarm.
2. Maintenance Correlation: During regularly scheduled maintenance windows, and as expected, hundreds of alarms were generated. Each alarm had to be factored into the IT team’s time and effort.
Again, to remove tedious busy work from IT’s workloads, automation was implemented to tag each alarm appropriately and once the window closed, clear the alarms out.
3. Power Alarm Processing: The company relied on its IT team to recognize alarms from different locations whenever a power outage occurred. The IT staff risked making mistakes, as they had to pay attention to each and every alarm.
With automation, the staff no longer carried out this process. Verifying the alarms and appropriately escalating them to a technician was done for them, which allowed technicians to be dispatched immediately.
4. TDM Switch Diagnostics: The company’s IT team was also responsible for running diagnostics on the switches to identify fault packs.
Once automation was implemented for this case, the fault packs were automatically identified and important details were escalated to technicians, enabling dispatch of techs to the right locations right away, for faster remediation of the issue.
RELATED BLOG: The NOC of the Future: What Businesses Must Know Now
Weathering Heavy Alarm Volumes: 5 Phases to Reach More Automations and More ROI
Weather was the unstoppable force that struck the organization’s operations and created an unwavering storm of alerts. With what was called “Storm Mode Automation,” the company was able to learn more about the impact on customers and get a handle on the quantity of alarms per week.
The company also correlated network events and found they were all related to a single site. Automation made the events relatively easy to verify, provided a straightforward process to follow with minimal remote work, and gained control for the team by securing the process for handling the events.
Phase 1: The company started using automation for alarm acknowledgement, verification, ticket creation, and routing to the right person for follow-up
Phase 2: They Increased scale of automation with additional devices; and therefore, automation touched 85 percent of the total alarm volume.
Phase 3: The IT team increased the scope of automation with additional functionality for technician dispatch. Right after dispatch, automation follows up, and cancels or closes the ticket.
Phase 4: The “Storm Mode Automation” correlated with Digital Subscriber Line Access Multiplexer (DSLAM) monitoring to examine and comprehend the alarms’ impact on customers. Automation also verified customer services upon receipt.
Phase 5: The company added in lower volume devices until 100 percent of the alarm volume was touched by automation.
Hurricane Harvey and Hurricane Irma: True Tests of Automation Strength
It was Aug. 25, 2017 when Hurricane Harvey made landfall along the Texas coast near Port Aransas, according to the National Weather Service (NWS). The Category 4 storm brought devastating impact, and continued its damaging path inland, to Victoria, Texas. The hurricane slowed its forward motion, and dropped tremendous rainfall as it paved forward for five more days.
The company, fearing Hurricane Harvey’s catastrophic damage and how it would affect customers, was skeptical of automation during such a storm that would produce a surge in alarm volume.
Hurricane Irma followed closely behind Harvey, and wreaked havoc in the Caribbean after forming in the Atlantic Ocean on Aug. 30. Named a Category 5 storm on Sept. 5, Irma’s wind speeds reached a rare 185 mph, making it only the fifth hurricane to ever reach that speed. By the time Irma reached the Florida Keys on Sept. 10, its winds had slowed to 130 mph, and it’d fell to a Category 3 intensity when it made landfall near Marco Island. As Irma hit Florida, tropical storm force winds extended outward up to 400 miles from the center, and hurricane force winds extended out to 80 miles.
Seeing the opportunity for automation during Hurricane Irma, which would come from an unsettling spike in alarms, the company changed its operations philosophy. For four days after Irma hit the Florida Keys, the company leaned on automation – testing its ability to hold up, and still provide benefits, when the alarm volume was unimaginable.
It was a test worth taking. The company saw a drastic change for the better, including 2,244 clear events, 1,207 correlations, and 710 tickets created.
Resolve Automation in Action: A Fiber Cut’s 4 Key Indicators
Fiber cut marked another use case for the organization. Resolve worked with the organization’s L-3 engineers to identify key indicators, as well as the types of alarms that came in when a fiber was cut. The indicators found were codified into a Resolve Workflow, and it then successfully started correlating and identifying the issue. The four key indicators include:
- A Netcool alarm is generated when a fiber is cut.
- Resolve automation correlated all alarms and events related to the fiber cut at the location.
- Resolve automation notified the parties involved, as well as external customers, which dramatically reduced mean time to resolution (MTTR).
- Resolve automation dispatched a technician to remediate the issue, from where it was reported.
During the automated remediation, in this use case, an alarm came in for a customer port, from which an alarm was triggered. The Resolve engine identified alarms as they came in from customers. Next, the alarms were automatically triaged and diagnosed, and finally, automation ran diagnostics to check for a loop, and it attempted to drop a loop and rebuild the circuit, as to see if it came back up.
The Calm After the Storm: The Benefits of Resolve Automation
Using automation at the height of severe weather proved to be of great value to the company, starting with a jump in return on investment (ROI), and direct and indirect cost savings. As with many automation cases, the company started with the tasks that took too much time to complete and were too many for humans to manage. Not only can the company accomplish more with fewer resources, but it benefits from having human IT engineers available for more powerful work that supports the business and moves the needle.
After all, the company set out to better understand its customers’ experiences during weather changes, and the company’s staff was then free to contribute to goals like these that require a human’s deep thinking and analysis.
The company also saw a bright change in meeting service level agreements (SLAs) and returning service to a normal state fast enough.
No volume of alerts is completely preventable, and customers will inevitably have to deal with power outages when thunderstorms, increased precipitation, and high wind speeds roll through. However, with too many alarms to manage, the outages couldn’t be addressed quick enough to meet the SLA’s terms. By failing to stay within the acceptable outage range, as indicated in the contract, the company risks a lot. A breach of SLAs will cause financial penalties and can lead to damaged relationships and legal troubles.
Resolve’s automation enhanced the entire alarm management process, by adding consistency throughout, from step to step. As a result, the company improved and maintained data integrity, refined ITSM records and resolution, and built detailed audit trails to help determine root cause analysis.
Resolve’s solutions were remarkably easy for the company to use, and developers were trained quickly – it only took four weeks to get the company’s automation up and running fully, and functioning without flaw.
RELATED CUSTOMER STORY: One Year of Automation, 100K Staff Hours Saved: A Telco Giant’s Big Gain
No matter your industry, Resolve can help you weather any storm caused by inefficiencies and drawn-out, unproductive processes. Reach out to Resolve to start automation according to your company’s needs.