I have been consulting on what used to be called data mining or predictive analytics, and more recently morphed into data science, for ten years. In the first five, I only did a handful of data mining projects a year, but recently this has grown a lot. No surprise software vendors, including my good friends at Microsoft, have been outdoing each other in giving us even easier—and cheaper!—technologies for advanced analytics: cue in the newest kid on the block, Cortana Analytics Suite, the Office-like umbrella, with a gallery of ready-to-run templates, for all Microsoft analytical technologies, notably centred around the not-so-young-anymore Azure Machine Learning, old-by-IT-standards R, and the still-a-bit-teeny Hadoop ecosystem.
If there is one thing that I have noticed across all the projects, workshops, courses (please check out my Live Tutored Online Courses Practical Data Science classes) and countless questions and answers given, it would be that analysts, IT people, and even data scientists constantly feel like they have to justify their work to their business sponsors. The most common question I get asked by budding data scientists is: how can I explain the value of data science to my customers? Invariably, each had to think about the reasons they do what they like to do, and this article is a summary of my interactions in the real world of practical data science, that is in the context of the business use of data science.
But first things first: let me briefly explain what I mean by all those keywords you read in the title of this article, because they are still being redefined and reinterpreted. When I talk about advanced analytics, I usually think of a mix of data science supplemented with modern, exploratory data visualisation (think Power BI, Tableau, QlikView, even ggplot2 or ggvis), perhaps some traditional BI (think cubes or tabular models), and a good dose of data acquisition, which may include ETL but usually means hunting for the data you need, a-la Power Query, and usually not in an orderly catalogue, or even generating data through experimentation.
Today, advanced analytics is something that you do, it is human-intensive, and most of the time not quite a finished and a deployed system in line with traditional business intelligence—not yet, at least. But in time, it could be a deployed system, nicely buzzing through some interaction of web services like those Azure ML helps you deploy, or Azure Data Factory workflows, and neatly integrating into your operations or BI-style analytics. Hey, you could even plug live data into it using your favourite Internet-of-Things data collector or Stream Analytics—but all of that is still in the (near) future for many of my customers, while the explorative advanced analytics has been a very real thing for my customers for well over a decade.
I would also need to explain what I mean by data science, I suppose. I strongly believe that data science is the application of the scientific method of reasoning (more about it later) to decision-making based on data representing facts. Data science spans four classes of activities: statistics, machine learning, perhaps small use of some big data technology, and an inordinate amount of data wrangling—I know I am not the only one who feels that as a data scientist I am most often playing the role of a janitor cleaning and prepping data for analysis. Believe me: as odd as it may sound, the janitorial part is sometimes the most valuable one to the customer!
Anyway, those four, if you put them together, would amount to no more than just some statistically- and buzzword-enhanced data engineering. What makes it into data science is that we apply that scientific method of reasoning when we say that this data suggests that something is significant. Or, perhaps, that that other data does not support a hypothesis that your latest spectacular-spectacular promotion was responsible for what you saw as an increase in sales. Sounds like statistics? Yes, but extended to the world of patterns discovered by the unruly machine learning algorithms, or as we just used to call it, data mining.
While I am at it. Data mining, for most business users, is the same thing as machine learning. For the nitpickers and all duly-respected academic discourse, there is a difference. Data mining, more strictly, is the application of machine learning algorithms for the detection of patterns usually hidden in flattened data sets. Oh well, call it what you prefer, but after a drink I usually revert to calling it data mining—it is the marketeers who tell me (not sure if with any significance) that the machine learning monicker is more appealing nowadays, so I have edited this piece accordingly.
So now that I have explained all those terms (did I mention you can learn it all in depth by attending one of my 4-day hands-on classes?) I should come back to where we started: what is the value of doing this advanced analytics thing to my customers, or their customers if they are consultants? There are five common reasons my customers do it, I have found, and in this order:
I admit that very few of my customers do number 5, but they seem to be aiming for it, especially the younger, more start-upy ones, who perhaps don’t have as much intuition to drive their decision making as the leaders of the more traditional business have developed over time.
This leads to an interesting point. Is the value of advanced analytics mainly in trying to remove our reliance on the human intuition, that gut feeling that a good CEO has when they are making a decision? When I was younger, I thought that could be the case. A few years ago I realised I was wrong. Intuition is, actually, the generator of the hypotheses that a decision maker considers. Without intuition, or some other form of creativity, businesses can only copy what others have done, but they could not lead. With perfect intuition you instinctively know which ideas are worth risking and which are not. Sure, having a great team helps to moderate the vagaries of youthful intuition, but this is precisely where advanced analytics can help, in my opinion, an awful lot.
Go ahead, use your, and your colleagues intuition to generate ideas, but don’t rely on intuition alone to decide if to risk your inheritance on them. State those ideas as testable hypotheses, find or generate data to validate them, and use plenty of advanced analytics, perhaps more machine learning here, perhaps only statistics there, to figure out if your hypotheses should be rejected or not. By the way, if you do it the statistics way, you will usually want to state your optimistic goals in reverse, sounding pessimistic by default, therefore hoping to always reject the hypothesis. It is an oddity that kind of makes sense, and which you can pick up very quickly when you start doing it (did I mention I teach this?).
So what if you have found significance in your hypothesis? Perhaps you would design an experiment to test it in the real world, to make sure that you are not relying on the same old data all the time. Perhaps you could skip this step and just go ahead and risk it. Either way, you would generate new data, which, by using data science—probably more the statistical part this time, less the data mining bit—would get you that warm fuzzy feeling that what started as intuition is something significant, meaningful, and not only because you had that feeling in your tummy, but more so because the scientific method of reasoning gave you the confidence you have lacked before.
Of course, this does not mean that you have certainty at this stage. Oh no! You will be wrong from time to time, even though everything seems to suggest you should have been right. However, you will know exactly how often your decisions could be wrong and you will be able to predict, and measure, the likelihood of committing such errors. You will learn to live with the uncertainty, that knowledge that nothing is sure, but you will have the power to measure how much less sure would this be than that. Indeed, this is the bread-and-butter of that old-fashioned statistical bit of advanced analytics. And in case you are wondering if anyone else is doing things this way, rest assured this is how the working parts of the modern world have operated for a long time. All of modern medicine, physics, biology, not to mention technology, social sciences, economical science, game theory…new software and apps that have built-in intelligence…in-game AI… You are not alone, but you would be an early businessperson to apply it to all of your decision-making. I think we are still some decade or so away from the time that Data Driven Decisions become the norm.
Data Driven Decisions is a cool concept that summarises this application of advanced analytics to decision making. There is some serious academic research that shows that companies that apply data driven decisions outperform their peers by 5–6% in terms of their bottom line. There is much understanding why this is the case, and if you need the ultimate stick with which to beat your business decision maker while trying to convince them about DDD, make sure to read Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance? a 2011 paper by Brynjolfsson, Hitt, Kim, from MIT & Penn’s Wharton School.
Ultimately, that is the value of advanced analytics most expect me to lay out in front of them, I suppose: if you apply advanced analytics, you will perform better in business. But there is much more than just some ever-increasing figure. From an organisational perspective, data science introduces a pattern of human interaction between people who make decisions, those who maintain the data, and those who can do data science—creating an intersection where significant organisational intelligence can be found. It makes the entire decision making process more precise, less onerous, seems to make people less worried about decisions, and perhaps even happier, and, may I say it, it makes experimenting with new ideas more fun.
Go on, check out those courses (or learn with me online) I promise I will teach you everything you need to know to get started with Practical Data Science in just 4 very intensive days, made easier thanks to Microsoft Cortana Analytics and its fresh, easy to use approach. Or, if you are ready to take the plunge, hire us for a 3-day on-site consulting-based workshop to get jump-started doing advanced analytics using your own data, and your own analytical and data teams.
Rafal