Application to Business Analytics and
Consistent with the broader societal trends in using more information
and greater computing power, we have seen profound change in
analytical work and research. Data and information have become
widely available across companies, organizations, universities, and
individuals. The data sets are rich, widely available, and much easier
to access. Data has become democratized, with almost any private
individual or employee having access to large data sets.
Along with more data, we now have the computing power to
engage in increasingly sophisticated analysis of that data. With
the days of the IBM punch cards long gone, we can now show
correlations between an increasing number of variables with
relative ease using massive data sets.
With the continued improvement in data and computing
power, companies and governments have started to increase
their expectation of what can be done with this data. Managers
are demanding improved customer targeting programs. More
accurate predictions of the future are demanded. With greater
computing power and more data, governments are creating
increasingly complex public programs. University researchers
are using data to analyze complex government policies and the
actions of companies and individuals.
The stories abound in the media and popularized books on
the power of data. Companies have been able to better segment
customers with website views and promotional coupons; baseball
players are more accurately valued; and voters are more effectively
targeted. The power of data is seemingly limitless.
With the building blocks to conduct more extensive research through
more data or more sophisticated modeling, the competition to meet
these demands has also increased. Similar to many of the other advances that have taken advantage of more data and greater computing
power, this competition has produced meaningful results—particularly
when this competition produces useful real-time data or information
that can be used to predict the outcome of a simple system.
Information that is immediately useful: In this particular
case, the data gathering incorporates disparate sources of data and
then highlights the opportunity with no forecasting required. A
simple example would be a website that aggregates price data from
several sources and then highlights the best possible price. GPS
location information is another such example. In these cases, the
output of the data is simple and immediately useful to the end-user.
A special and well-known case study would be the oft-cited book
Moneyball. In the book, the author highlights how many teams
overvalued batting average and undervalued on-base percentage
(a metric that includes walks). Using this arbitrage opportunity, the
Oakland A’s were able to sign and draft players who were effectively
undervalued by the rest of the market. While this data analysis story
is often cited in the Big Data literature, the success and insight were
largely driven by a simple and obvious oversight in how baseball
teams evaluated the importance of walks—an oversight that was
discussed in sabremetrics circles well before the strategy was used
by the A’s. As a result, the actual Big Data analysis involved simply
gathering information on a player’s on-base percentage and then
developing a strategy to exploit the arbitrage opportunity.
Predicting simple systems: Beyond real-time information that is
immediately useful, with the advances in data and modeling, analysts
have developed more sophisticated models and have used more extensive
data sets to predict the future of simple systems—usually the buying
or voting behavior of individuals with a simple choice to purchase a
product or vote for a candidate. In the classic cases, a data scientist will
optimize the presentation of a webpage or promotion to better target an
individual’s taste based on a wide range of factors. In addition to using
large data sets to predict simple models, in this approach, the cost of
any mistake is small—the incremental reduction in the likelihood of
purchasing a product or any one individual voting for a candidate. As
a result, data analysts can usually rely on mere correlation as a driver of
a decision rather than consider whether one variable caused another.
In the above cases, the analytic competition to create more
sophisticated models and more data has the potential to produce
better outcomes. These cases also often serve as the case studies
highlighting the power of Big Data. That said, while the success
stories often emphasize the accuracy of these new approaches, they
rarely highlight the incremental cost of adding more data into the
analysis, nor the risk management and IT costs associated with
obtaining and analyzing this data.
The Downside to Analytic Competition:
Predicting and Explaining Complex Systems
The problem with analytical competition occurs when analysts
attempt to predict the future or explain the result of a complex
system that goes well beyond simple systems or valuable real-time
information. These complex systems could include the economy,
stock prices, housing prices, and health care costs in a rapidly
changing insurance environment.
In these cases, additional information and computing power
has the potential to improve strategic decision-making, but the
process is not consistent and will likely need to include combining
technical skill with many qualitative considerations. This inherent
subjectivity in how to conduct an analysis allows individuals to
develop conclusions using large data sets and computing power
that benefit their preferred narrative—and improve their career
prospects. These narratives could include stories that take credit or
ascribe blame to past decisions, highlight preferred future strategies,
or emphasize an intellectually appealing modeling technique. No
matter the rationale, the internal competition among individuals to
differentiate themselves often leads to biased analysis that accom-plishes little to improve an organization’s decision-making process.
Companies have been able to better segment customers with website views and
promotional coupons; baseball players are more accurately valued; and voters are more
effectively targeted. The power of data is seemingly limitless.