The State of IT Automation: New Pressures Invite New Opportunities Read Report

When Incident Resolution Meets “Play Next” on Netflix: Practical IT Automation

When Incident Resolution Meets “Play Next” on Netflix: Practical IT Automation
August 10, 2018 • Resolve Staffer

After a long day resolving too many incidents, you may go home to binge watch Netflix and fall asleep at 8pm. After receiving the “Are You Still Watching” message one too many times (trust us, we’ve been there too) you eventually give up. But when you’re contending with putting the day’s incidents and unnecessary escalations behind you, zoning out to the latest Stranger Things is sometimes necessary. Hopefully you’re not having nightmares about when the next critical incident happens or the workload of tomorrow. After all, dealing with the volume of events generated by IT monitoring tools is a challenge only 52% find manageable while 13% struggle and 1% are just completely overwhelmed. Manageable isn’t positive, but at least it’s not escalated phone calls at 3am!

Working in an enterprise IT Operations team means you’re handling hundreds, even thousands, of incidents weekly. Incidents are flooding in from end users or monitoring tools watching numerous applications, IT infrastructure systems, devices and network elements. We’re not saying give up binge watching, but we are recommending to give up burdensome manual incident resolution.

Resolving incidents quickly protects operational efficiency, employee productivity, customer satisfaction, service level agreements, revenue, etc. So, what’s the answer? Say it with me now: Automation. Don’t get scared now.

Consider the number of incidents you experience – there are all kinds of use cases, severity, and criticality. Automation has a significant role to play validating, diagnosing, and resolving these incidents, including reducing Mean Time to Acknowledge (MTTA) and Resolve (MTTR); improving service delivery; eliminating human errors; and supporting the business better without headcount.

But there is a trick to automation and this is where Netflix comes in. You’re enjoying a program and the next episode is auto played. When a show is over, they suggest another show you may like based on your past viewing. Is that the dream? Can the same be done with Incident Resolution automation?

Automation projects often prove to be time consuming, expensive endeavors lacking justification to further invest. Automation is not as easy as watching television, but your methodology could be.

Resolve & Chill: Use Incremental IT Automation

Beyond “play next” and without the judgement of “yes, I’m still watching, please stop asking”, Netflix provides where to go next. When it comes to Incident Resolution automation, so does Resolve. Automating based on our use cases means your IT automation is driving you forward. It’s the age-old crawl, walk, run adage. To succeed, you need:

  • A multi-phase roll-out of automation
  • Clearly defined use cases for each phase
  • Targeted outcomes for each phase

And never forget the people of the solution, because at the end of the day, it’s the people that make up your team and will have the capacity to work on other projects with automation in play.

Episode 1: Press Play on IT Automation for Quick Wins

Get started. Press play. What is the #1 use case that can be taken care of with end-to-end automation? Fully automate reoccurring, simple use cases that burden your Service Desk or get escalated. Automate tasks in the incident resolution process can yield immediate value, including:

  • Password resets
  • Disk space cleanup
  • Server health checks & reboots
  • WiFi access points checks

With minimal investment of time and development resources, the above incidents can be validated, diagnosed, and resolved automatically to give you more time to chill – or, really, to work on more prudent projects and remove the noise of the alerts for simple tasks.

When you’ve conquered episode 1, tell the automation you’re still “watching” and press play for the next episode.

Optimize your Investment in an ITSM – Read the White Paper Now

Episode 2: Press Mute on Alert Noise

Validating whether an event is truly a problem incident and not just a duplicate or something already taken care of is often a time consuming task which can sit in an agent’s queue for a too-long time. Your L1s probably also lack permission to even handle them on their own! Mute the noise by automating the validity of your alerts, particularly:

  • CPU Usage
  • Application Performance
  • Host Down
  • URL Availability
  • Link Down
  • Device fan failure

Play Next Series: When You’ve Automated Simple Use Cases, Handle the Complex

When tackling simple use cases with IT automation, it’s easy to see the benefits quickly. In fact, with end-to-end automation, you may not even notice how many incidents Resolve is handling in the background.

Like Netflix, once you’ve rolled out automation to handle password resets, server health checks, and application performance issues, it’s time to play the next series – more complex for more impact.

Validating, diagnosing, and resolving with incident resolution IT automation empowers everyone to tackle more complex use cases with human-guided automation. Where human intervention is required, frontline operators should be given context-specific guidance and steps in a larger runbook for incident resolution. Escalations to Level 2 and Level 3 become less frequent and shift knowledge left. This tackles frequent and more challenging use cases like:

  • Web Service Incidents
  • Directory Service Incidents
  • DB Service Performance Degradation
  • Host CPU, Memory, Storage Utilization
  • VPN Tunnel failure
  • Proactive health checks of Linux servers, Windows servers, and SQL Server Instance and Databases

While addressing the more complex above use cases, take care of some tasks to ensure the success of this phase as well as provide a foundation for potential self-service, from customer portal or ERP outages to firewall and VPN Throughput Saturation.

With ongoing improvements, Resolve also provides analytics and dashboards for insight into the entire resolution lifecycle and enables cross-department collaboration for accelerated remediation. Empowering end users to solve frequent incidents without calling the Service Desk reduces false alerts and enable SMEs to retain tribal knowledge with complex automations rolled out without developers.

Resolve & Solve: Plug and Play the Right Automation with the Right Incident Resolution Platform

Although it may sound too easy, getting started with automation is the first step. Outlining the next steps, and the next use cases to tackle, is a strategy to free up your operations team with IT Automation. Resolve provides the capabilities to get started small, with the simple use cases like password resets, and scales out to core services and other use cases.

For more information on Resolve’s IT Automation Use Cases, visit the solutions pages...


About the Author, Resolve Staffer:

This post was written by one of the awesome contributors on the Resolve team.

Recommended Reads

The Rise of the Cognitive NOC and the Role of IT Process Automation

The Rise of the Cognitive NOC and the Role of IT Process Automation

Find out how the Cognitive NOC has become the driving force in network management.

What Is the Network Operations Center (NOC): A Brief Overview

What Is the Network Operations Center (NOC): A Brief Overview

How to make your NOC performance reach its full potential.

Getting Out of the 2010s Era of Alarm Avalanches

Getting Out of the 2010s Era of Alarm Avalanches

Leverage a scalable approach to alarm management by allowing technology to do the work.