data can be viewed as the simplest explanation of that data
without any use of prior beliefs, knowledge, or assumptions.
This observation combined with the equivalence of maximum
likelihood estimation and the Huffman coding compression
algorithm[ 2] can be used to formulate inferential principles such
as the Minimum Description Length and Minimum Message
■ When Kolmogorov complexity is applied to programs that
accomplish specific tasks, such as performing a Fourier transformation or calculating the result of an actuarial model, the
shortest program is necessarily organized in a manner that
reflects the structure of the task itself. Computer programmers
call the elements of a computer program that allow a problem
to be tackled with less code “abstraction.” The shortest program
uses the most effective abstractions for the task.
■ Search and optimization algorithms that are efficient in their
use of computation time can be viewed as steadily accumulating
nonredundant information, which either assists in locating
the result or in proving that the solution is correct. Inefficient
algorithms recalculate earlier results or make evaluations that
can already be determined from the results of past operations.
■ The laws of physics can be treated as a computer because all
practical computers are based on physical principles. Kolmogorov
complexity could be used to model energy efficiency of physical
processes by looking at the question of what is the shortest sequence of incremental energy inputs into a system that results
in a task being completed. This example is counterintuitive in
some respects because common tasks such as moving an object
between two points would seem to be highly compressible, but
there is no mechanism for removing this redundancy with the
kinds of looping constructs available in computer languages.
The resolution of this apparent paradox involves recognizing
that while it is possible to compress information and computational processes that contain redundancy, it is not possible
to compress redundant mechanical processes.
The problems above all involve some notion of “efficiency.”
The fact that Kolmogorov complexity cannot be calculated deterministically and can only be approximated based on upper
bounds directly relates to the fact that these are all problems that
cannot be solved deterministically in the general case. The areas
that these problems come from all require empirically grounded
compromises and simplifications for real-world work.
For actuaries, a primary aspect of our work is making
the best and most practical use of the evidence available
to us for the benefit of a principal. This usually takes the form of
application of statistical inference or informal principles collected
from practice and deemed to be reasonable. Application of con-
cepts from algorithmic probability can make it more clear what
compromises we are making and what bodies of prior knowledge
we are drawing on in our work. It can also reveal possible paths
for borrowing from other disciplines.
As an example, it would be possible to formulate a compres-sion-based version of credibility theory based on Minimum
Description Length principles. An event frequency assumption
that is set based on a particular large data set can be used to
compress that data set using Huffman coding to produce a set of
codes for events with lengths proportional to the log-likelihood
of the event under the assumption. Improving the fit improves
the compression, but any gains created by adding finer-grained
breakdowns to the data increase the size of the code table, offsetting
the compression. This potentially gives you a credibility frame work
that can be adjusted more fluidly to the structure of the problem.
If you append a smaller data set to this data set, the optimal
assumption (yielding the best compression) for the combined
data set will range between setting the assumption used for the
smaller data set to be the same as the large data set to setting the
assumption used for the small data set independently. The point
in this range the new optimal assumption would occupy depends
on the size of the smaller data set and how much different the
smaller data set is from the larger data set. This does not necessarily
have to take a predetermined form in the manner that Bayesian
revisions typically do. As you scale these parameters, the solution
might shift from varying the assumption on the smaller data set by
applying modifiers to the existing assumption (which would not
require as much information as a new code table) to developing
more complex criteria as the data set grows.
This type of framework may or may not be more useful than
existing credibility frameworks. In any case, examining this kind
of concept gives you a more intuitive way of explaining the concept
of credibility methods. A fully credible source of information is
demonstrably statistically distinct from other candidate sources of
information; to the extent that fully credible sources of information
are not available, it is preferable to choose assumptions that can
explain (effectively compress) the available sources of information.
Algorithmic Statistics and Incremental Refinements
So far we have discussed things in absolute terms, seeking the
simplest or most efficient representation of a