computing the Probability of cancer The eight out of 28 physicians who answered that the patient has a 90 percent chance of having cancer made a common mis- take. Recall that 90 percent is the probability of a true positive: the patient testing positive for cancer, given that she actually has cancer. We are interested in a different quantity, however: the probability that she has cancer given that she tested positive. Pr(cancer | positive) ≠ Pr(positive | cancer). Bayes’ rule for updating subjective probabilities helps us avoid such confusions. It is the inductive logic analog of the syl- logism from elementary logic. Before you learn of the positive test result, you ascribe probability 0.008 to the proposition that your patient has cancer. This is your “prior probability” Pr(cancer). This probability changes, of course, in the light of the positive test result. Bayes’ rule states that your “updated” or “posterior” probability takes the form of a conditional probability: Pr(cancer) → Pr(cancer | positive). evidence Computing the posterior probability is an exercise in elementary probability theory: Pr(positive | cancer)*Pr(cancer) Pr(cancer | positive) = —————————————————————————. Pr(positive) Equivalently: Pr(positive | cancer)*Pr(cancer) Pr(cancer | positive) = ————————————————————————————————————————————————— Pr(positive | cancer)*Pr(cancer) + Pr(positive | nocancer)*Pr(nocancer)
All of the ingredients on the right side of the equal sign were given in the statement of the problem.
Prior probability: Pr(cancer) = 0.008
True positive rate: Pr(positive|cancer) = 0.90
False positive rate: Pr(positive|nocancer) = 0.07
Therefore:
(0.90)*(0.008) Pr(cancer | positive) = ————————————————————— ≈ 9.4%
(0.90)*(0.008) + (0.07)*(0.992) .
becomes even more sharply peaked close to one-quarter. The
process can continue indefinitely, the succeeding probability
distributions becoming ever more sharply peaked about the
observed frequency of heads.
So there we have it: Using only the data as summarized by
the likelihood function, the frequentist easily estimates the
probability of heads to be one-quarter. On the other hand, the
Bayesian tempers this likelihood-based estimate with the information encoded in the prior probability distribution. In this
scenario, no prior information was used, resulting in a Bayesian answer that is virtually identical to the frequentist answer.
Scenario 2 (qualitative background information): Next,
suppose we had slightly more information to begin with. For
example, suppose you were told that the statistician owns a
collection of biased coins (each with one of the edges sanded
down), most of which produce frequencies of heads between
0.35 and 0.65. In this case, a distribution somewhat peaked
around one-half seems a reasonable choice. Mavens: here we
use the Beta( 20, 20) prior distribution. This time, the Bayesian
estimate of heads on the 13th toss is approximately 0.44. Because we encoded prior information about the professor’s coin
collection in the prior probability distribution, the Bayesian updating process effectively “shrinks” (or “credibility weights”) the
purely data-driven maximum likelihood estimate (one-quarter)
toward the prior estimate of one-half.
This illustrates a key advantage of the Bayesian approach.
In real life, few people would be inclined to bet 3 to 1 in favor of
the coin coming up tails on the basis of a mere 12 tosses. Rather,
their decision about how to bet would be influenced by their
prior experiences of coin tosses and any available information
about the current coin and the person tossing the coin. Bayesian
statistics, unlike frequentist statistics, provides a mechanism for
incorporating prior information along with the data to make
predictions. This prior information can come in the form of
expert opinion (“actuarial judgment”), intuition, theoretical
knowledge, or other data sets.
This scenario is analogous to the classic actuarial problem
of estimating future losses for a small cohort of insurance policies (perhaps the policies from a small state) based on a limited
volume of historical data. Credibility theory—a form of Bayesian statistics—is used to balance the frequentist estimate that
comes from the data with complementary information. In this
simplified example, the complementary source of information
is some qualitative knowledge about the professor’s collection
of biased coins. In a real-life application of credibility theory, it
would be data from a similar cohort of policies (perhaps policies from a neighboring state).