So, you’ve spent some money and you’re the proud owner of a shiny new AIOps tool that helps improve your Network Operations.
Network alarms are now usable, but with all the constant monitoring, supervision, and incident management, your Network Operations Center (NOC) is still overwhelmed.
It’s time to pull out another stop.
Tapping into the potential of automation will make the difference that NOCs need today, to achieve the efficiency and productivity needed for business growth – if not survival. Automation enables organizations to maximize their investments in AIOps and truly realize its full potential.
Automatically Troubleshooting and Responding to RAN Hanged Cells
What might seem like a fantasy is actually a reality: a world where AIOps quiets the noise and automation runs to remediate issues. For telecommunications companies, troubleshooting and responding to hanged cells in a Radio Access Network (RAN) are an ideal use case example.
A hanged cell can occur for a variety of reasons so it’s important to troubleshoot it correctly and log the details for future reference.
Troubleshooting and resolution can be completed automatically. On the Resolve workflow, a hanged cell alarm will have a node name and a corresponding ticket identification number. Resolve will automatically create an incident in an incident management solution like BMC Remedy and summarize actions to web notes. The Resolve Diagnostics and Troubleshooting dashboard, within the BMC Remedy user interface, keeps everything together so IT staff don’t have to switch back and forth between applications. Resolve will display which steps in the automation failed, are completed successfully, or require further assistance.
The Resolve dashboard tells users that it’s received and acknowledged the event, and it displays information on related historic activity.
Resolve can also integrate with many IT Service Management tools, such as ServiceNow, and thought management systems to display the number of tickets recorded for a specific Configuration Item (CI). It also shows warnings for visibility when a device experiences enough faults to cause concern.
Resolve then connects into an Operations Support System (OSS), such as the Ericsson platform, via a Secure Shell (SSH) session, and loads the needed objects to interact with the relevant network elements. Resolve initiates the remote restart of the affected Radio Base Session (RBS) for the required restart reason, attached to the command.
Once it’s been restarted, Resolve connects back into the RBS to ensure the correct cell status and radio links are enabled, for instance, looking for a non-zero validation check
s. Next, Resolve goes back and validates that the original event has been cleared following the restart, and updates the ticket before closing it.
Achieving Zero-touch Maintenance with an AI-enabled Cognitive NOC
“Self-this, self-that” … is the theme of today’s NOC visions, looking to become totally autonomous and minimize, if not eliminate, the need of human help and intervention.
Zero-touch networks can self-heal, self-configure, self-optimize, and self-protect themselves, which means they can free up IT staff and reduce operating costs. Zero-touch networks allow people to focus on critical tasks and enable businesses to generate more revenue.
When we talk about what’s happening today, in the journey from manual to autonomous operations, there’s an automated step that takes place at the halfway point for workflows and automatic task execution. Typically, automated workflows replace repetitive tasks, but IT staff still needs to determine how the workflow is designed and how the tasks should be performed. Zero-touch maintenance hasn’t been achieved … yet.
Eventually, as more processes become automated, humans can move away from handling things like unexpected events and telling the system what to do in unpredictable situations.
Enter artificial intelligence (AI) – the enabler of a fully autonomous NOC, also known as the Cognitive NOC. This Cognitive NOC employees AI and machine learning (ML) to improve network management and operations. Working with IT process automation (ITPA), the Cognitive NOC sends efficiency and effectiveness of network operations through the roof, and eliminates unfortunate downtime, significantly lowers operational costs, and overcomes all challenges of optimizing network performance.
AI and ML can transform Cognitive NOCs when used to their full potentials. ITPA is the essential element for making the Cognitive NOC the best it can be, by automating and accelerating routine jobs like data collection and analysis. For example, ITPA helps establish that AI and ML are being trained on high-quality, consistent information. At the same time, automation helps guarantee that models relying on such data for training are accurate and relevant.
In another instance, ITPA automates AI and ML deployment of ML models in the Cognitive NOC, as well as expedites their time to value.
Auto-remediating Network Issues with Tailor-made IT Incident Response
It only gets better from here. While the NOC might no longer need any hand-holding, no one is completely immune from outages and unexpected events, and the immediate rush to get back to normal when they happen.
The way alarms are triaged and remediated can make or break a business that relies on fast mean time to resolution (MTTR) and addressing high-priority alerts. Automation seals the deal when it comes to focusing on the alerts that matter. It’s all about automated incident response to reduce the amount of time it takes to diagnose, respond to, and remediate incidents.
Tailor-made IT incident response allows organizations to achieve MTTR and other goals in four steps using Resolve Actions, including the following:
1. Automatic Trigger
Automations can be triggered from observability or AIOps tools, or from many different ITSM platforms using Resolve Actions. Resolve integrates into most common tools, that are already available for download.
2. Triage and Validation
The alert data is parsed for key information that’s needed for triaging, like node name and IP address. The alert is validated to rule out a false positive through predefined tests. Once complete, an IT ticket and its priority are then created automatically.
Valid alerts kick off comprehensive diagnostics across all related IT systems and application components. Automation is triggered within minutes of the alert, so it’s done much quicker than a manual method could. Along every step of the way, this information is updated in real time in the IT ticket, as the system of record for a detailed audit log.
Based on the information that was uncovered in the previous step, a corrective action is then taken to remediate the issue. Remediation steps for common, recurring incidents are codified to be performed by automation. Processes such as restarts or low disk space can be automated to allow for self-healing infrastructure.