The famous management consultant W. Edwards Deming once said “In God we trust, all others bring data.” Enterprises embracing digital transformation are bringing that data in droves, and cultivating it in ways unimaginable even just a few short years ago. Thanks to new tools, techniques, and technological advances, massive volumes of historical & real-time data can be analyzed to make forecasts about future events. This field of “predictive analytics” has enormous implications for everyone. For IT professionals, it offers a chance to not only anticipate problems before they occur, but to automate their mitigation before they materialize.
Predictive analytics seems like the modern day equivalent of a crystal ball, but with data scientists and statisticians divining the future instead of fortune tellers. To get us up to speed on the state of predictive analytics today, we turn to one of its most notable thought leaders, Theresa Kushner, recent inductee into the Analytics Hall of Fame. Having managed huge analytics projects for Dell, Cisco, VMWare & IBM, Theresa knows what it takes to convert data into ROI-generating predictions. In this episode, she’ll share with us some of the unexpected ways predictive analytics can be used, how a new occupation called “Business Scientist” might be the key to alleviating the drastic shortage of data scientists, and what the single most unrealistic expectation is plaguing the field of predictive analytics.
Guy Nadivi: Welcome, everyone. My name is Guy Nadivi, and I’m the host of Intelligent Automation Radio. Our guest on today’s episode is Theresa Kushner, Partner in Business Data Leadership. Theresa was previously Dell Senior Vice President of their Performance Analytics Group and prior to that she held senior executive positions with Cisco, VMWare, and IBM. But probably the coolest thing about Theresa is she was recently inducted into the Analytics Hall of Fame, which is quite a testament to her accomplishments and thought leadership, so we’re thrilled she’s joining us today. Theresa, welcome to Intelligent Automation Radio.
Theresa Kushner: Thank you very much, Guy. So happy to be here.
Guy Nadivi: Theresa, we’re living in an increasingly data-driven world, and a lot of that data is fed into machine learning and artificial intelligence systems to make predictions about all kinds of things. This is still kind of viewed as a black art among many people, and so to start, could you please break down for our audience from a high level perspective, what are the steps between putting data into one end of an analytics black box, so to speak, and getting predictions coming out the other end?
Theresa Kushner: Okay. To begin with, let’s start with the end in mind. You always need to know where you are going before you figure out how to get there. So let’s start with where you are going, and let’s assume that you want to predict which of your customers will buy a product from you during this quarter. That means that you must first of all understand where you are today, which customers are buying, what are they buying, from whom are they buying, how much are they paying, and what profits are you making from each individual customer. This beginning can be done with just ordinary reporting. That’s why when I build an analytics team or an enterprise information team, I insist you have to have data management, business intelligence, and advanced analytics as part of the same team. Although the predictive part, may be the tip of the arrow that predicts which customer will buy, that prediction can’t be done without good data, and a clear description, through reporting, of what has happened up to this point.
Theresa Kushner: The reporting that you have can easily point out areas that might be of help to you and predicting the next customer who will purchase. For example, if your report shows your most profitable segments are those customers who buy from your partners, then you might consider a program that helps you predict which partner would be best to deliver for which customer, not which customer is going to buy from you. Once you’ve done some experimenting around what you want to predict, to ensure that you have a good solid return for your effort, you’ve got to collect the right data, and often this task takes more time than anyone anticipates, and there are several things that can happen. First, the data’s just not available. For example, let’s say you want to predict which customer would best to deliver to which customer, which partner’s best to deliver to which customer, but you’ve never captured that information when a sale is recorded.
Theresa Kushner: You’ve never captured which partners sold to which customer. That makes it very difficult to predict which partner’s the best one to engage. The data, number two, the data you have is incomplete. This is usually the case. You may need a data set that includes partner, customer, date of sale, date of delivery, product, et cetera, and although you have all of these elements available, perhaps they’re all populated at various levels of completeness and accuracy. Since most prediction is built on looking back and extrapolating to the future, you could have some issues if your view is incomplete or inaccurate. Once you have a good, solid data set, then you can develop your algorithms to predict, and most data scientists build a predictive model that looks at patterns in the data. That’s all predictive analytics really is. The model provides a predictive score, or the probability that something will happen, that a customer will buy, a patient will improve, a system will fail.
Theresa Kushner: The most important part of this process is the refinement of that model, and to refine a model, you usually start with data sets. You have a training data set and you have a testing data set. The training data set is usually larger, and it’s randomly selected from your actual data. The testing data set is smaller, and it’s used to test the model, to make sure that the model works. Training data sets are used that the data scientists develop and redevelop the model. It’s not called “data science” by accident. If you do it right, you follow the process, you observe, you question, you hypothesize, you experiment, you analyze, and you conclude. Since most predictive models create a score that provides the probability of something happening, for example, a customer buying, or whatever else you’re predicting, then the business must find a way to use that score to their benefit. That means teaching people how to incorporate the score into their processes or more importantly today, creating processes that embed that score.
Theresa Kushner: This is one of the biggest issues in predictive analytics. It’s getting that score embedded into the process. From the beginning you start with where you want to end up, which is you want the score embedded in your process and that’s the process you go through.
Guy Nadivi: You made me realize that you’ve worked, I’m sure, with so many data sets that now I’m curious. What is the biggest data set you’ve ever processed?
Theresa Kushner: That question, I’m not so sure I really know. I can give you the idea. At VMware, we were on a Green Plum database, which was approximately about 4 terabytes, but that data was growing rapidly, and we were doing a lot to manage it into the process. The Dell’s business management system was probably, it was petabytes, so it was 10 to 20 petabytes. It was huge, but not all of that data is ever used in one algorithm. That’s what the data storage requirement was to store the data that we needed access to, but for analytics, the data was managed down to sizes, test data for example, that could run on desktops. You don’t need massive amounts of data to build a predictive algorithm, but you do need it to find patterns in the data. The larger the data set, the more patterns that are potentially visible, and that’s why big data’s so important to data analysts.
Guy Nadivi: Theresa, tell us about what are some of the most interesting results and predictions that you’ve seen come out of predictive analytics projects you’ve worked on.
Theresa Kushner: I think the one that is… The one that comes to mind immediately, which was…it was very difficult to do, was an interesting project that my team at Dell worked on, was one that helped out a part of the business that we hadn’t traditionally supported – Quality control. The problem was a very complicated one. When you have a server that can be configured a hundred different ways, and you change a component on that server, you have to check out whether the new configuration with the new component will fail or not. Now imagine that you have a hundred different components that can be configured a hundred different ways, and have to be integrated into a hundred different servers. How do you test that many configurations? That was the problem. The QC team couldn’t afford the time or people to test every configuration or every component that went into every configuration, so they brought us the problem.
Theresa Kushner: Our team worked with the QA, the QC team, and developed an algorithm that would enable the teams to predict which configurations would be most likely to generate the response that we need, or show a propensity to fail. Not exactly one of the most common predictions that you would find, but it actually enabled them to narrow down this huge number of configurations to something that was very manageable.
Guy Nadivi: Very interesting, I’m sure, for the IT executives hearing today’s episode.
Theresa Kushner: Yeah. Something they should consider, for sure.
Guy Nadivi: Yes. Yes. Your field is heavily dependent on data scientists, and the August 2018 LinkedIn Workforce Report stated that there was a nationwide deficit of over 150,000 data scientists. How do you think your profession can overcome this staggering talent shortage?
Theresa Kushner: I think you’re going to find this rather controversial, but I don’t believe that we have as great a shortage as they are predicting. For one thing, an example is this. Just last week, there… I met with an organization in the Northeast called U of Next, and the charter of this organization was to build programs for the most underserved populations in the Northeast. I don’t know if you know this or not, but the Northeast would, if you put all the GDP together for Boston, and New York, and all of the area where the universities are in the Northeast, it would be the sixth largest country in the world. So they have a lot of people in those areas, though, that are not being served. What they’re building is a unique program that takes people who have been laboring away for years in the data management profession, and converts them to data scientists through a program they are developing specifically for this type of individual.
Theresa Kushner: Data management professionals know data. That’s the thing that they’re assuming. With some instruction in statistics, not every one of them will move to a data scientist, but you have the beginnings of a good data scientist, because they already know the data. The other thing that’s going to happen is that the tools are going to become easier to use, and with each generation of new tools, we’ll be able to do the programming and the statistics that data science has required just by clicking and moving icons around on our desktop. Trust me, we’ll have enough people. We just need to prepare them right.
Guy Nadivi: Given the high rate of pay that data scientists earn today, I think what you just said really piqued the interest of the data management people out there.
Theresa Kushner: Yeah.
Theresa Kushner: Unfortunately I think that, like all industries, you see a high rate of pay for the data scientists now, but over time that’s going to come down to something more manageable, and I think the companies are going to realize that they don’t need a lot of data scientists. You don’t need a stable of data scientists. You need one or two good data scientists surrounded by data engineers, storytellers, business analysts. In fact, there’s this new term coming up called “business scientist”, is that what do we do with the people who are really using information and facts within the business? They are the people who are going to be making some of these, filling these new kinds of requirements.
Guy Nadivi: Would business scientists be sort of a parallel to the term I’m starting to hear of a “Citizen Data Scientist”?
Theresa Kushner: Exactly.
Guy Nadivi: Okay.
Theresa Kushner: Exactly.
Guy Nadivi: Interesting. Your expertise is in predictive analytics for marketing, but I want you to put on your IT cap for a moment, and consider your marketing target to be the end users of an organization’s IT infrastructure. Where would you start using predictive analytics to improve the IT department’s outcomes in servicing its end users?
Theresa Kushner: Yeah. One of the biggest issues with marketing, especially when a real time environment is required, is that the systems, and the websites, and everything that supports marketing, stay up. I can’t count the number of times that a down system has put a campaign or a marketing program at risk because a lot of the things that we do nowadays in marketing are real time. They’re reactive to the customer immediately, so you want to make sure that the systems are always there. I think the one area that IT could apply predictive analytics to immediately is in predicting when a disk, a network, or a storage failure is about to happen, and correct it before it does. They could also help us in applying some predictive analytics to the flow of data from one system to the next. Especially if the business has a seasonal component like retail on Black Friday.
Theresa Kushner: They should be able, IT should be able to help us predict large loads on the systems. We shouldn’t just have that surprise us, so I think that there’s a lot of things that IT should be doing that can help predict with the business what’s going to happen.
Guy Nadivi: I think those predictive failure use cases you just cited are very much why the emerging field of AIOps is gaining attention, because that’s exactly what they want to try and do.
Theresa Kushner: Exactly what they’re doing.
Guy Nadivi: Theresa, what are some of the most unrealistic expectations currently plaguing the field of predictive analytics?
Theresa Kushner: There are a few. Most people believe if you have a lot of data, you can predict anything, and then they pick the one thing to predict that they don’t have any data for. For example, if you want to predict which of your employees might be on the verge of leaving the company, you should probably have some good data on employee engagement, their past performance, their attendance records, and even, even their email, and what they’re doing, how they’re communicating with outside sources, and it should be collected to be able to predict this well. Here’s an example. Ginni Rometty from IBM announced last week, or a few weeks ago, that IBM had done just this, that they had created an algorithm that with a 95% accuracy could predict which of their employees was getting ready to jump ship. She mentioned in the article that I read that she had a somewhat difficult time convincing management to use the results, and herein is the rub.
Theresa Kushner: This is the most unrealistic expectation – that whatever we predict will be used to make decisions or drive the business. Usage of predictive models only helps us get better, but it can be difficult to get the usage started with the business, especially if there’s not a track record of trusting the data that you have.
Guy Nadivi: Is the lack of trust what’s preventing embracing and adopting the predictions?
Theresa Kushner: I think in some cases it is. I think in other cases, I think business leaders who’ve done business a particular way for a very long time feel in their gut how it should be done, and so they don’t necessarily always rely on data to help them guide, to help guide them. Nowadays with business moving as fast as it is, it’s very difficult to get information that people trust immediately. We as the data professionals need to work hand in hand with business and IT to make sure that everybody understands what the expectations are before we start.
Guy Nadivi: You just got me curious. Is there anything that you’ve seen, any programs, incentive programs or anything of the like, that have been effective in getting managers to embrace, adopt, and deploy the results of the analytics?
Theresa Kushner: The best I’ve seen is that when, and it has to do with sales, is that when you actually commit in a sales environment, and usually propensity models are best used in sort of a telesales environment, where you can order the calls you have to make that day, and the top 10 are what you go call on. Those work very well when the sales management is as involved with the modeling and the use of those models as the individuals on the phone making the calls. I’ve seen them work incredibly well when there’s a concerted effort, and they reward people for completing their list, and they make a big deal of the effort that’s put forth. I’ve seen a lot of sales guys come from the very last place in their sales hierarchies up to the top, just by putting in place the predictive modeling.
Guy Nadivi: Speaking of managers using predictive analytics, what do you think enterprise executives who’ve never dealt with predictive analytics know before deploying it?
Theresa Kushner: The one thing I always make sure that executives know is that predictive analytics is not a silver bullet. You can’t predict your way to revenue growth, or profitable businesses without involving the executives and getting their support, and that means their time, their money, and more importantly, their attention to this. There has to be a buy-in to use the predictions that get generated. The one thing that every executive should know before deploying a team to do predictive analytics is that the team must have business guidance and support. There’s no other choice.
Guy Nadivi: For the CIO, CTOs, and other IT executives listening in, what is your one big must-have piece of advice that you’d like them to take away from our discussion with regards to implementing predictive analytics?
Theresa Kushner: I’ve seen this a lot of times, and this is my one mantra here, is my one word of advice, is don’t build a predictive analytics team in a silo. Make sure that whatever you do, you have collaboration across IT and the business, that you tackle the projects together, and that you fail or succeed together.
Guy Nadivi: Excellent advice. All right. Looks like that’s all the time we have for on this episode of Intelligent Automation Radio. Theresa, thank you very much for joining us today and sharing your-
Theresa Kushner: Thank you, Guy, for having me. I really enjoyed it.
Guy Nadivi: Oh, your perspective was fascinating, and especially with predictive analytics for IT, I think a lot of people will really enjoy this discussion.
Theresa Kushner: Good.
Guy Nadivi: Theresa Kushner, Partner in Business Data Leadership, formerly Dell’s Senior Vice President of their Performance Analytics Group, and recent inductee into the Analytics Hall of Fame. Thank you for listening everyone, and remember, don’t hesitate, automate.