Descriptive and predictive analytics with Big Data are becoming more prevalent in a wide range of industries. Gah-Yi Vahn makes sense of what is truly a 21st century industry.
Big Data is here to stay. It’s not a temporary hype, but a fundamental technological change to the business landscape, just as when computers first arrived in the office. Referring to the abundance of cheap and easily accessible information to support decision-making, Big Data is at the core of daily operations of companies such as Google, Facebook and Amazon. Others are eager to follow suit.
Using data to support decision-making is not new, and falls under the umbrella of business analytics. The difference now is that one can collect much more information about any element relevant to the decision-making, thanks to the ever-decreasing costs of data collection, storage and processing. For example, an online retailer today can collect a diverse range of information such as customer demographics (gender, location, age), weather, real-time inventory information from RFID (radio frequency identification) chips, and even blog post and video reviews of products. The size of the recorded dataset thus grows quickly as you record more of such relevant features repeatedly over time, sometimes as often as once every few seconds.
Traditional business analytics can be classified as descriptive, predictive and prescriptive analytics. Descriptive analytics takes available data to describe what is happening. An important aspect of doing descriptive analytics well is in the presentation of information. Google Trends is an excellent example of visualising search term popularity by region and by time. Predictive analytics consists of using past data to forecast the future, and is routinely used in all aspects of a business. Prescriptive analytics, on the other hand, uses past data and a decision (optimisation) model to reach actionable recommendations. Whereas descriptive and predictive analytics require the presence of a human manager to interpret the results, prescriptive analytics allows for automated decision-making, as long as the decision model is decided upon a priori.
The future is not so uncertain
Of course, traditional business analytics have always used data. For example, suppose you were a bookseller and needed to make stocking decisions from the publisher in advance. Under small data analytics, you would collect historical sales data, observe any trends in the data (e.g. higher sales during the holiday season) and perform a time-series forecast of the future demand. These are examples of descriptive and predictive analytics. You can also perform prescriptive analytics on the dataset by computing an order quantity that maximises the total estimated future revenue.
On the other hand, in the new era of Big Data analytics, you can collect not just historical sales data, but data on other features also associated with the demand. The underlying premise for doing so is that the future is not so uncertain. In a much simpler system – an apple falling from a tree, for example – it suffices to record a few key elements (the current location and velocity of the apple) to predict the future precisely (e.g. the location of the apple one second later). This is possible because the system is simple enough and we know the underlying physical laws that relate the present to the future.
Yet predicting the demand for a product, or the spread of a virus, is much more complex because the underlying relationship between the present and the future is unknown for such systems. Nevertheless, as the future is always a function of the present (e.g. you catch flu after being sneezed on during a bus journey), recording as much information as possible about the present can only help you resolve the future better. And this is the justification for collecting Big Data – lots of relevant information (volume) that comes in many different formats (variety), as accurately (veracity) and as frequently (velocity) as possible.
Letting the data speak
To extract value out of Big Data, you still need to perform descriptive, predictive and/or prescriptive analytics. However, traditional analytics tools may no longer work due to the size of the dataset. Computations will be slower and larger memory will be required. If you are not careful, it is possible to find artificial relationships between the numerous features and the unknown, and predict the future with overconfidence – a phenomenon known as overfitting.
Nevertheless, descriptive and predictive analytics with Big Data are becoming more and more prevalent in a wide range of industries. Hospitals are using electronic medical records to identify patients with higher readmission risks, retailers such as Walmart perform targeted marketing by mining customer purchase data, and the Internal Revenue Service (IRS) of the US uses sophisticated predictive algorithms to identify fraud. The many different types of prediction algorithms are commonly referred to as ‘machine learning’ algorithms, to emphasise the fact that the relationship between the past and the future is learned agnostically from data.
Whereas descriptive and predictive analytics with fairly large datasets have already been successfully deployed, prescriptive analytics with Big Data are at an early stage of development and are mostly confined to academic research. For instance, Professor Cynthia Rudin of MIT and I have recently written the first paper on how to use Big Data for the newsvendor problem, a fundamental building block of many operational management problems. First devised as a banking problem in 1888 by economist-statistician Francis Edgeworth, this problem constitutes the basis of many problems ranging from inventory management to personnel staffing.
In our paper, we extended the classical newsvendor problem to incorporate Big Data. In particular, we constructed a prescriptive, machine learning algorithm for this setting. As a case study, we used our custom-made algorithm to prescribe nurse staffing in a hospital emergency room. As the number of patients coming to the emergency room is uncertain, the hospital incurs an overage cost if too many nurses are staffed and an underage cost if too few are staffed. We showed that doing Big Data analytics carefully improves upon small data analytics (analytics without using any relevant features) by up to 46 per cent in terms of the out-of-sample cost, with statistical significance at the 1 per cent level. As personnel staffing is a major source of expenditure in a hospital, the significance of these results cannot be ignored.
Bringing Big Data on board
So how should you bring Big Data to the daily operations of your organisation? First and foremost, there needs to be a diagnosis of the current analytics capability. If currently there is very little systematic data collection and analysis, then even doing small data analytics of the type that we teach in our MBA programmes can generate a great deal of value. Surprisingly, this is the case for many organisations.
As an example, the City of New York has recently shown that simple, small data analytics can have a big impact. With only 200 housing inspectors on board, the inspection of illegal conversion (illegally cutting up a residence into many smaller units) complaints was becoming an unmanageable problem in the city. To address it, Mike Flowers, the director of analytics for the city, hired a couple of college graduates to perform common-sense, descriptive analytics on the available data. By combining a listing of every property lot in the city with data sets that contain information such as foreclosure proceedings, anomalies in utilities usage, crime rates and rodent complaints, they identified that only 13 per cent of a total of 25,000 complaints made in a year were severe enough to warrant a vacate order, thereby containing a problem that was getting out of hand.
Going from small data analytics to Big Data analytics or to predictive and prescriptive analytics is trickier. Expanding in both dimensions is human capital intensive, requiring talented data scientists. A McKinsey report (2011) estimates that by 2018, there will be a shortage of 140,000 to 190,000 workers with “deep analytical” experience and a further 1.5 million data-literate managers in the US. Technology giants such as Google, Facebook and Amazon, and large investment banks and top hedge funds can afford such employees, however even now the competition is fierce, as is evidenced by the ongoing talent war in Silicon Valley. The data scientist is indeed a sexy job in the 21st century.
The current buzz around Big Data will result in a universal appreciation and practice of business analytics, regardless of the size of the data. We have already seen many success stories, from e-commerce and healthcare to government, and there will be many more to come.