The State of IT Automation: New Pressures Invite New Opportunities Read Report

Episode #30: How AIOps Will Hasten The Digital Transformation Of Data Centers

In today’s podcast, we interview Gabby Menachem – CEO of Loom Systems.

n just about every data center worldwide, there’s a lot of experience, expertise, and instinct stored in the brains of the people keeping that data center running. However, the surge in demand for data center services, combined with the increasing complexity of their IT infrastructures, is putting a tremendous burden on their staff. It turns out that while hardware and software may be scalable, people are not. Further exacerbating matters is that previous concerns about the age of the equipment have given way to concerns about data center staff aging faster than the equipment. These converging predicaments are fueling the search for a solution. Enter AIOps, which at its essence is a way to bottle a data center’s tribal knowledge, and supercharge the speed at which it’s applied.

To learn more about how an enterprise can benefit from AIOps, we turn to Gabby Menachem, CEO of Loom Systems, which publishes an AI solution that predicts IT incidents before they impact operations. Gabby shares with us many specific use cases where AIOps can add both value and competitive advantage to an enterprise. Along the way, we’ll learn how AIOps might ultimately allow organizations to reduce the skillset required for some IT positions, how rapid determination of incident root cause can unexpectedly save an organization millions of dollars, and how AIOps can not only cut costs but surprisingly help grow a business.

Read Full Transcript

Guy Nadivi:Welcome everyone. My name is Guy Nadivi and I’m the host of Intelligent Automation Radio. Our guest on today’s episode is Gabby Menachem, CEO of Loom Systems, publishers of an Artificial Intelligence Solution that predicts IT incidents before they impact operations. And this type of solution falls under a new product category that Gartner has termed AIOps, which has the potential to seriously disrupt the way IT operations are managed. MarketsandMarkets, an industry analyst firm, has forecasted that between 2018 and 2023 the AIOps market will grow from two and a half billion to over 11 billion annually. Since we’re focused on disruptive, fast-growing technologies like automation and AI, we brought Gabby into talk with us specifically about AIOps and get a better understanding of the direction this space is going in. Gabby, welcome to Intelligent Automation Radio.

Gabby Menachem: Thanks for having me, Guy. This is really exciting.

Guy Nadivi: Gabby, AIOps is still somewhat ambiguous to a lot of people. Can you please tell our listeners how you define AIOps?

Gabby Menachem:Sure, I think it’s a good question because a lot of people ask it these days. And the way I would answer it is I think the industry terms speak of many different things, whereas the majority of people capture this term as being a way to take all these different parts of IT operations today and have artificial intelligence help you in doing your daily job.

So different verticals like digital experience monitoring, network performance, application performance, and IT infrastructure monitoring are all different places in the organization where AI is being commissioned to really help solve these problems with automation and with intelligence built around that, especially using AI but in other means as well.

Guy Nadivi: Okay. So for context, can you please talk about some particularly interesting use cases where AIOps made predictions that mitigated big problems?

Gabby Menachem: Sure. I think the term prediction is sometimes throwing people off because the use of data to find predictions and really build something into a prescriptive way and really solving a problem is what everyone is envisioning when you talk about predictions. In many cases, the way to really predict things in IT has a lot more to do with catching things when they’re small or when the system bends before it breaks. And I’ll give two examples. One is of a customer-facing issue with one of our biggest clients, a CPG company has a lot of websites. And one of the problems that Loom found initially when we installed was that they came up with a new product on their website and engagement platform for people to interact with their product and really build a schedule, something like that on how they are going to use the product. And after marketing and IT signed off from it & made all the necessary tests, it just went to production and a lot of TV commercials and others have been used to promote this.

But when Loom was used to look at the results of that website, we saw that some backend servers were talking about missing resources and it turns out that the experience over time was degraded significantly because there was some configuration error at the server. And the problem is that user experience, things like that are typically not reported and especially in the Fortune 2000 organization you can see that this sign-off mechanism is more of a onetime thing and systems are not continuously tested for the customer experience.

So this kind of prediction, finding things that are broken or giving a bad experience in a way that assists them that doesn’t know anything about your server can really find just because it has artificial intelligence, is a form of proactively finding out customer experience issues and how they relate to your servers and how to fix them, actually. The second example I can give is really on internal issues. And again in the Fortune 2000 organization, most of them report to the SEC on a quarterly basis and obviously the last day of the quarter is something that they all dread in terms of whether the ERP system is going to fail or if email is going to fail, and obviously all the salespeople are putting in all the orders at the last minute because that’s how people buy, and it puts a lot of stress on the systems. In these cases, looking into how the primary site and the secondary site and obviously they all use high availability for their systems, looking at this continuously can find issues before any kind of failover would happen. In the case I’m talking about is looking into an SAP environment where the secondary site was not configured right and just by putting Loom on that secondary site logs, we were able to figure out that there are these configuration errors and told the team ahead of time they could have fixed it before the fail-over actually happened at the end of the quarter. So in that way we solved and basically mitigated, prevented a problem that could have happened unless they knew about the secondary site problems.

Guy Nadivi:I’ve heard you talk elsewhere about how the biggest ROI realized by companies deploying AIOps is in allowing them to reduce the skillset required for some IT positions. And I’m curious, given the difficulty a lot of enterprises are having in finding skilled IT staff, can you please elaborate on how AIOps can alleviate that pressure?

Gabby Menachem: Of course, I think the entire industry is driven by one big vision, which is the vision for self-healing. And if you work backwards from this vision of the systems that work and operate themselves and really figure out what the problem is and then can remediate it with some sort of automation. In this case intelligent automation, you come to grasp when you take it into reality you see that the majority of the high value problems in IT, are really solved by people. And that solution space is governed by people with a lot of skillset. The subject matter expertise for different systems is something that is hard to proliferate within an organization. And I think that was initially the idea behind ticketing systems alongside the workflow management. And thinking about this problem actually led us in Loom to build a tribal knowledge base that is associated with our predictions. So whenever we find a problem we can really talk about and give a resolution and connect it to the specific log line that is responsible for the problem. This sort of root cause leads to resolution, leads to intelligent automation that can go solve this problem with a product like Ayehu, is something that I think a lot of the customers are looking for. And if you think about it, it’s also a good solution to the skillset problem, because when you would build those automations and resolutions into your tribal knowledge base, you’re actually proliferating that knowledge throughout the organization. And in the case of Loom, it’s even across customers in a crowdsource sort of way, and can give you this risk reduction about how to solve an issue, how to get to the root cause faster and really do those both these high skillset problems, solve them with software instead of people. It’s the true realization of what AIOps can do in an organization. I think we’re still at the early days of customers using that kind of intelligent automation. But it’s definitely exciting and definitely what I see customers are interested in.

Guy Nadivi: So in thinking about AIOps, I’m curious how much data and what kind, does an AIOps tool need to ingest and analyze before it can start churning out meaningful predictions?

Gabby Menachem: That is a great question! I think the way that people perceive AI today, and especially with reference to machine learning and the methods behind it, is that they think you have to have a lot of data before you can make any kind of or extract any kind of value from that data. Whereas the reality in my view and especially for my background, is that you can still get a lot of value by using an unsupervised method of machine learning, where you don’t really need to come in and have a ton of data before any value would be extracted. A lot of the things that you do as a person, and obviously this solution space is being addressed with people today, is not by looking at a ton of data before anyone makes a decision. So I really think that the AI focus and developing these algorithms, should not be concerned with more data equals more precision or better solutions, but by building intelligent algorithms that actually mimic the way that people make those decisions. And in that way you can really start extracting value as fast as a person can do that. You can also make use of the fact that computers can go over a lot of information very fast. So if you come to an organization that has and most of them do hold information from the last month say then there’s an opportunity if your tool is fast enough and easy to integrate, to learn from existing information very fast and get started right away. So in that scenario, even though the tool needs data to ingest and analyze before it goes live (in quotes), you can still get lot of that value out of the box because it will analyze historical data and can predict based on that historical data what the future looks like.

Guy Nadivi:Does the combination then of those capabilities, do you think that… could AIops mean the beginning of the end for data centers?

Gabby Menachem: So I actually love data centers. I think I tell everyone that I meet that one of the best stickers I’ve ever seen was the one that says that there is no cloud and it’s just someone else’s computer. I think data centers are here to stay. The fact that we use the cloud or any kind of abstraction does not mean that we’re getting rid of data centers. They still exist, they’re just being operated by other people. And I actually think it brings a very interesting question for those operators because now there’s more economies of scale. The tools like AIOps are actually much more valuable for these kind of players because they have a lot more servers and they can use the wisdom of the crowd or the wisdom of having all this data into making their operation more efficient and effective. So as I see it, data centers are here to stay, and AIOps is actually a great solution for that complexity that is rising within the places where we concentrate these data centers.

Guy Nadivi: What kinds of benefits then should an organization expect in the first six months of an AIOps deployment? For instance, how can a CIO, CTO or other IT executive cut costs by using AIOps?

Gabby Menachem: So we actually built an ROI calculation and we have a white paper on our website that talks about five different areas where you can see and realize a benefit, one that you can talk to your CFO about because we find that the new IT leaders are concerned with being business enablers, having a seat at the table really at the business regime. And we want to support that. So when we work with a customer to build any kind of ROI, we look at different places where this either cut costs or enables more business to be done. And what we see out of those five pillars, I have to say that two really stand out like the exponential nature of everything around us. And one of them is really solving and mitigating P1 issues, where using Loom in many cases we see that we can predict and prevent over 40% of those that really create a high value and reducing costs from lost revenue or anything else that is considered a P1. The other thing that really creates millions of dollars of savings is looking at the time you can save by getting to the root cause faster. And in this case, I’m not talking about just the time to resolution, I’m also talking about the fact that now the way you learn about a problem through Loom or through any other AIOps solution is connected to something that is happening on your server and really at the detail level of a granular logline. That also means that once you learn about the problem, you’re already at least halfway to a solution that typically reduces the MTTR, meantime to resolution by about 45%. And if you look at the layer two people, tier two, sometimes they are called, that solve these issues. A lot of these people are basically developers that could have done tremendous amount of work and enablement for future products for that enterprise, and now they’re being used for support. If you take that time and quantify it, it becomes millions of dollars over three years and basically that’s the biggest pillar in our calculation of where ROI comes from.

Guy Nadivi: Okay, so we know about cutting costs, but you also just mentioned that AIOps can contribute to business growth. And I’ve also heard you speak in a number of your presentations about how AIOps can help grow a business. Can you give some real world examples of how it can do that?

Gabby Menachem: Certainly, I think when you build or when you go through a digital transformation, the majority of IT leaders are concerned with actually getting into some kind of a business outcome. Usually any kind of cloud migration or building new system, making them digital, has to do with some business objective of getting more customers, growing the business, becoming more efficient. All these different initiatives are being hindered by IT growth. When you look at the way they make decisions, in the past it’s been business leaders make the decisions and then IT is working out the details. I think now the CIOs are being consulted a lot more because they’re a big enabler, not just to getting started but also to making that experience for customers seamless and exciting and actually that brings more business as these systems improved. Looking at AIOps as part of that, we are actually seeing that IT operations is starting to be viewed as not just as people that solve problems but also as people that enabled the business to run faster. And as much as business schools and books are talking about how to build the right hierarchy, how to communicate better, all these different business tips of how to grow a business faster. One of the biggest problems that I see across organizations that we work with is how do you grow faster in IT operations, because so much of what we do today is digital and has to do with IT operations at the back end. In these cases, AIOps is not just the way to do more, it’s also a way to survive because if your business is growing really fast and your stock is skyrocketing, it means that customers are looking for what you have, but they get their experience not by talking to people anymore it’s through their interaction with your systems. And if you want to be on top of everything and do this 24/7 with no barriers to how granular you can be, getting to a single person experience and really making sure that every one of your customers is getting a consistent good experience, that can only be done with a system, and that is really where I see AIOps going in the next few years.

Guy Nadivi: Gabby, the entire value proposition of AI & machine learning, being able to get you to the point where you can predict failures before they happen and mitigate them in advance, is all predicated on one particular skill – data science. And if you don’t have the data scientist to build the algorithms to generate the predictions, then you can’t really leverage AIOps. And right now there is a very big shortage of data scientists. I think that the August 2018 LinkedIn workforce report stated that there was a nationwide deficit of over 150,000 data scientists. How will companies like Loom Systems overcome this staggering talent shortage?

Gabby Menachem: I think this is a great question for the business leaders, and I think it’s actually the reason AIOps systems will create or bridge that shortfall. Basically, the idea behind an AIOps system is to build that sort of data science project into a product. And I think vendors that create an opportunity to build a project with data scientists are really solving a problem that would come back again the next year. The focus of IT leaders in buying any kind of solution today should be focused on buying a product that brings not just the data science tooling, but also the opinionated view of how data science can be applied unsupervised like I said, and with a methodology that is built into the product. I think that’s the biggest movement we’ve seen within AI and even with all the hype around us, the companies that really manage to bring AI to the organizations and create value. The ones you hear about are the ones that can make their tool be self-sufficient, prescriptive, and provide value out of the box. In the cases where data scientists are needed in order to use a tool, I think those tools still have a way ahead of them and it talks about the maturity of the market, in AI generally and in the AIOps space specifically. Talking about Loom Systems, we took that approach of being totally data science free from the customer point of view. What it means is that the operators of Loom Systems in an organization are the same people that solve IT problems today. And there’s no math or AI or data science needed in order to operate Loom Systems. The only way that you really need to change or do a paradigm shift, is between responding and being reactive to tickets, and then moving to the new way of predicting those tickets and acting on them ahead of time.

Guy Nadivi: In deployments where AIOps has succeeded, what do you see as the next step those organization should take to get to the next level of digital transformation?

Gabby Menachem: In our experience and we have a couple or even more than a couple of CIO’s advising us on the way that CIOs think of and plan for the future, we see that they are focused on being a business person within the organization much more and they want to grow and improve the customer experience of that enterprise. In that context, what I see the next move for AIOps and especially within a digital transformation is to enable more of that business. So right now when we talk to CIOs that have to do anything like a cloud migration, their thoughts are on how to get to the same level of operations as they had before that migration. I think the conversation should change. Once you use an AIOps tool, you need to be able to leverage that wisdom of the crowd like we do at Loom Systems, and be able to make that migration seamless. Also, thinking about why are you making that migration. So if you’re doing this to get more scale or if you’re doing this to enable more or better solutions to your business, you need AIOps to be there to help you solve those minute situations and small stuff that you really don’t want to focus on while you’re creating that vision for your business. I think AIOps as we move forward into the future is going to become a necessity, something that we can’t live without because there’s just going to be so much knowledge and intelligence in how those systems help people, that we’re only going to be interested in working with people on how to build our business and to grow it.

Guy Nadivi: Interesting. Looking into the future, how do you see AIOps and automation bridging together to increase the value of an AIOps deployment?

Gabby Menachem: So I think early on in the conversation I talked about all of this being a brainchild of thinking of self-healing. And self-healing has been with us in thoughts I think for the last 30 years already. The only reason we’re seeing such an upsurge of this space, has to do both with the advancement of AI and big data, but also the fact that, that everyone is having a digital transformation. When you think about how people are using these systems today, what I see in front of me is a great opportunity to think of how to automate more things that really require a person today and should not require a person in the future just because the way to mitigate this issue is well-known or that we don’t really want to have a lot of skillset in people’s heads instead of where we actually have domain expertise in something that grows our specific business. So when I look at automation, I’m thinking everyone bought an RPA or an ITPA in the last couple of years, and they’re looking for more things to automate. AIOps is actually bridging that opportunity into incident management. It’s not just low level operations that people used to do like punching in things from scanned PDFs. Now, you can actually solve and proactively solve things that are happening on your servers, issues that customers are experiencing at the granularity of a single person, and you can do all of these things by predicting them and then running a run book that is pre-configured and actually get rid of the risk of certain people making mistakes along the incident response and remediation process.

Guy Nadivi: That would be an interesting future. Gabby, for the CIO, CTOs and other IT executives that are listening in, what is the one big must have piece of advice you would like them to take away from our discussion with regards to implementing AIOps for their operation?

Gabby Menachem: I think the advice I would give anyone who’s a good friend of mine in implementing any kind of AIOps solution is really focusing on three things. First, you need to start with a data-driven approach of looking at your tickets. Where is the value, what the ROI could be. And the good a vendors can really work with you to give you those assumptions ahead of time before even installing any kind of agent or server within your environment. And I think together with what I know about this market, I think those ROI benefits should amount to millions, otherwise they’re not worth pursuing at this point. And after you see this, you move to the next level where you have to get to a very specific method of operation at plan that don’t just talk about technology or that project, but also talks about people and process. As we talked earlier today, this has a paradigm shift in the way that actually people interact with systems. Now, they’ll be able to predict failures and they need to have the proper method and process to get rewarded for it. Incentives needs to be in the right place, and good vendors that they’d saw, successful AIOps implementations, can work you through that playbook of how to be successful with their software. So I think learning from your vendor and really taking the advice on how to implement something like this is something that makes you much more successful. And last, I’d say you need to connect your AIOps plan and strategy with how you do automation, because AIOps right now, with being able to predict and show resolutions with a product like Loom, is only a step in the actual vision of getting to self-healing. For that, you’d need to be able to integrate to your automation and be able to really get the tribal knowledge that your subject matter experts have into a system and into an intelligent automation future.

Guy Nadivi: The journey to self-healing is definitely very exciting for anybody working in IT and I think it’s going to be a very interesting future going forward. All right, looks like that’s all the time we have for on this episode of Intelligent Automation Radio. Gabby, I know you’re a very busy man because it wasn’t easy getting you scheduled today, but I thank you for coming on the show and sharing your insights with us. It’s been great having you on.

Gabby Menachem: Thanks very much, Guy. I was enjoying it as well.

Guy Nadivi: Gabby Menachem, CEO of Loom Systems, publishers of an artificial intelligence solution that predicts IT incidents before they impact operations. Thank you for listening everyone and remember, don’t hesitate, automate.

Gabby Menachem

Gabby Menachem

CEO of Loom Systems

Gabby brings over 15 years of technology innovation and entrepreneurship experience to Loom Systems. Gabby was previously co-founder and CTO of Voyager Analytics, a product that analyzes social network data with a range of customers that include leading financial institutions. Prior to that, Gabby served as GM and VP R&D in a microwave engineering startup.

Gabby can be reached at:


Listen to the Podcast