At the risk of sounding like a broken record, we all know Network Operation Center’s (NOCs) and IT Ops are under constant pressure to deliver increased efficiency and productivity to manage an infrastructure while continuously increasing scale and complexity. Adding to this, the introduction of new DevOps processes and new technologies such as Containers, Network and Storage virtualization and Hybrid Cloud can paint an intimidating picture.
There is no secret sauce – you need to understand the strategic importance of various aspects of your environment and processes and make the necessary investments to make those areas as efficient and streamlined as possible.
In a typical IT and NOC environment, the central system that drives the human activity and workflow is the event management system such as IBM Netcool, CA Spectrum, HP Operations Manager. They function to consolidate and normalize the extremely large volume of events to a much smaller number in the hope that there is sufficient capacity, knowledge and experience in the operations team to be able to handle them. Over time, tools like Splunk Enterprise, performance management and other monitoring tools are added. Both feed additional alerts into the event management system, providing deeper insight and analytics into the applications, systems and networks that they monitor.
The fundamental challenge for IT and Network Ops is how to directly scale the capacity of the operations team to be able to respond to the increasing volume and complexity of alerts.
The answer to these challenges, you need to do the following things well:
Splunk IT Service Intelligence (Splunk ITSI) provides a fresh makeover to your 20 year old event management tools. What if instead of “normalizing” the rich event data into a common data model, you were able to keep all that information, drive business analytics and visual dashboards in real-time, while being able to streamline your workflow on a single platform?
Many organizations already use Splunk Enterprise to:
Splunk ITSI poses a strong case of whether this should all be a single integrated platform by avoiding the unnecessary friction going back and forth. Below we’ll explore the proposition in more detail, especially when we add Resolve into the mix to address where Splunk ITSI monitors and manages the alerts, while Resolve automates and accelerates the resolution to the alerts..
The obvious key to scaling operations is automation. How you approach and adopt automation however, often can determine whether you spend a lot of cycles with limited results, or are able to quickly see significant ROI for your efforts. A good place to start is by examining your event and ticket data. Many IT operations are already feeding their event and ticket data into Splunk Enterprise, as it makes it easy to visualize and get answers to these types of questions:
The analysis provides a starting point for identifying potential candidates for automation and discussion with subject matter experts. From our experience, only 10-15% of alerts can be automated fully from end-to-end. Automating remaining alerts should adopt the human-guided automation approach by integrating automation with process guidance and knowledge instructions as described in the following section.
The full automation candidates however, can significantly reduce the human workload and often fall into a category of automations we refer to as “event validation.” Example use cases:
A typical NOC at a Communications Service Provider (CSP) may receive several million events per day. Event validation automations is a good place to start and can provide significant reduction in the number of actionable alerts and ROI.
The last element to scaling the operations team is enabling lower tiers or broader teams to take on more of the activities to triage and fix repetitive problems, which reduces escalation to higher tiers with limited resources. This involves providing clear process guidance, instructions and decision support, automation that enables the technicians to carry out the diagnostic and repair tasks without requiring direct access or training necessary to perform the command.
Unlike traditional knowledge management or wiki, Resolve’s approach to process guidance unifies or embeds the automation directly within the process instructions rather than requiring users to switch to a separate automation tool. This concept, human-guided automation transforms how automation can be utilized to accelerate complex processes with decision and control being made by a technician versus the risk to develop a full, end-to-end automation with the many possible unknowns.
Resolve Human-Guided Automation provides the following benefits:
Unlike traditional knowledge management or wikis, Resolve process guidance is “actionable” versus an optional reference material that is not used. Resolve engages the technicians as the embedded automation saves them time and enables them to carry out tasks that previous would have to be escalated while ensuring a consistent and streamlined process is followed.
Resolve and Splunk ITSI Workflow
Watch the video and see Resolve in action with Splunk ITSI.
Splunk ITSI Demo
Automating network health checks & diagnostics accelerates service restoration during severe weather