WILLIAM OF OCKHAM said that preference should be given to explanations with
fewer elements. Claude Shannon said that any methodology that is an effective compressor
is also an effective gambler. Andrey Kolmogorov said that the shortest possible program
that expresses a piece of data gives the most useful assessment of its information content.
Admiration of brevity and capability of doing more with less
runs across many professions. Poets celebrate the value of compression, lauding poems that say more than their words denote.
Mathematicians save the greatest honors for the shortest proofs
of deep results.
As actuaries, we are accustomed to working with complex
models and predictive tools. In order to improve our business unit’s
profits or productivity, we are often asked to refine those models to
increase the efficacy of the output. But sometimes the incremental
gain—in terms of increased profits or predictive power—does not
justify the additional costs—in terms of time of opportunity cost.
This article introduces concepts from Kolmogorov complexity,
which integrates concepts from the study of resource complexity
in computer science with information theory and statistics. This
approach can help in applying an actuary’s intuitive understanding
of statistical inference and credibility theory to other domains.
As an actuary, I might say that efficient use of data means not
reading more from the data than the data itself has to say. Taking
cues from computer programming and audio compression gives
us a means to pursue the same kind of efficiency in modeling and
There is an information metric called the Kolmogorov complexity
that gives a unifying framework for questions relating to complexity and simplicity. Kolmogorov complexity, also referred to
as descriptive complexity, is defined as the length of the shortest
computer program (in some computer language that is sufficiently
developed to be able to implement other computer language in a
“universal” manner) that outputs a given piece of data. The sense
is that a program that reproduces a piece of data encodes all of
the essential details of that data. The smallest program encodes
all of the essential details in the most compact manner possible.
The precise computer language that is used only matters significantly for relatively small pieces of data. There may very well be pieces
of data that are easy to express in a particular computer language, but
because we are measuring Kolmogorov complexity on a “universal”
computer, we could include code that translates from that language,
which makes the problem easy to our current working language.
It is not possible to directly calculate the precise Kolmogorov
complexity of a piece of data. However, it is possible to use any
particular approach to making data smaller as an approximation.
The most direct approach to approximating Kolmogorov complexity
is to use compression algorithms. Comparisons between file sizes
after compressing as a ZIP file or in other formats can serve as a very
rough comparison of relative Kolmogorov complexity of data sets.
Statistical analysis is also an avenue for compressing data.
Estimates of the likelihood of certain sequences occurring in
data provide a basis for more frequently occurring elements to
be mapped onto shorter codes. General-purpose compression
algorithms essentially perform statistical analysis of data based
on some assumptions about the structure of the data that work
reasonably well for common data formats but that will not necessarily identify deep structural relationships in data.
Data that is easy to compress has some form of redundancy
in it—elements repeat in fixed patterns, there are patterns with
predictable frequencies, or there are structural relationships
where the values of certain elements can be used to predict other
elements. Any form of compression that we apply identifies and
WHAT MIGHT HAVE BEEN
Kolmogorov initially wanted to be an historian. He
looked at the question as to whether taxes in Russia in
the middle-ages were collected at the house level or
at the village level. He analyzed tax data and showed
that that data had a much simpler description if taxes
were collected at the village level. (You can see the
seeds of Kolmogorov complexity here.) He presented
these results to the history faculty to great applause.
Asking them whether he could publish such a paper he
was told, ‘You only have one proof. You cannot publish
in a history journal without at least two more proofs of
your hypothesis.’ And so Kolmogorov left history for a
field where one proof suffices.
From the blog “Computational Complexity”
(It should be noted, however, that the redundancy required
by history journals serves a function of compensating for
accepting methods of proof with less than 100 percent
accuracy. Compression makes data more efficient and
resolves it to its underlying explanatory value, while
adding redundancy to data or processes makes them
more robust against corruption or failure.)