# Bayesian probability

Bayesianism is the philosophical tenet that the mathematical theory of probability applies to the degree of plausibility of a statement. This also applies to the degree of believability contained within the rational agents of a truth statement. Additionally, when a statement is used with Bayes' theorem, it then becomes a Bayesian inference.

This is in contrast to frequentism, which rejects degree-of-belief interpretations of mathematical probability, and assigns probabilities only to random events according to their relative frequencies of occurrence. The Bayesian interpretation of probability allows probabilities to be assigned to random events, but also allows the assignment of probabilities to any other kind of statement.

Whereas a frequentist and a Bayesian might both assign a 1/2 probability to the event of getting a head when a coin is tossed, only a Bayesian might assign 1/1000 probability to a personal belief in the proposition that there was life on Mars a billion years ago. This assertion is made without intending to assert anything about relative frequency.

## History of Bayesian probability

"Bayesian" probability or "Bayesian" theory is named after Thomas Bayes, who proved a special case of what is called Bayes' theorem. The term Bayesian, however, came into use only around 1950, and in fact it is not clear that Bayes would have endorsed the very broad interpretation of probability now called "Bayesian". Laplace independently proved a more general version of Bayes' theorem and put it to good use in solving problems in celestial mechanics, medical statistics and, by some accounts, even jurisprudence. Laplace, however, didn't consider this theorem to be of fundamental philosophical importance for probability theory. He endorsed the classical interpretation of probability, as did everyone else at his time.

The subjective interpretation of probability theory (later called 'Bayesian') was proposed for the first time by the philosopher Frank P. Ramsey in his book The Foundations of Mathematics from 1931. Ramsey himself saw this interpretation as merely a complement to a frequency interpretation of probability. The one taking this interpretation seriously for the first time was the statistician Bruno de Finetti in 1937. The first detailed analysis came 1954 in the book The Foundations of Statistics by the philosopher L. J. Savage.

The general outlook of Bayesian probability has been that the laws of probability apply equally to propositions of all kinds. For a Bayesian probabilities are merely a measure of the degree of belief a (rational) person has in the proposition in question. Several attempts have been made to ground this intuitive notion in formal demonstrations. One line of argument is based on betting, as expressed by Bruno de Finetti and others. Another line of argument is based on probability as an extension of ordinary logic to degrees of belief other than 0 and 1. This argument has been expounded by Harold Jeffreys, Richard T. Cox, Edwin Jaynes and I. J. Good. Other well-known proponents of Bayesian probability have included John Maynard Keynes and B.O. Koopman.

The frequentist interpretation of probability was preferred by some of the most influential figures in statistics during the first half of the twentieth century, including R.A. Fisher, Egon Pearson, and Jerzy Neyman. The mathematical foundation of probability in measure theory via the Lebesgue integral was elucidated by A. N. Kolmogorov in the book Foundations of the Theory of Probability in 1933. Beginning about 1950 and continuing into the present day, the work of Savage, Koopman, Abraham Wald, and others has led to broader acceptance. Nevertheless, the rift between the "frequentists" and "Bayesians" continues up to this day, with mathematicians working on probability theory and empirical statisticians not talking to each other for the most part, not attending each others' conferences, etc.

## Varieties of Bayesian probability

The terms subjective probability, personal probability, epistemic probability and logical probability describe some of the schools of thought which are customarily called "Bayesian". These overlap but there are differences of emphasis.

Subjective probability is supposed to measure the degree of belief an individual has in an uncertain proposition.

Some Bayesians do not accept the subjectivity. The chief exponents of this objectivist school were Edwin Thompson Jaynes and Harold Jeffreys. Perhaps the main objectivist Bayesian now living is James Berger of Duke University. Jose Bernardo and others accept some degree of subjectivity but believe a need exists for "reference priors" in many practical situations.

Advocates of logical (or objective epistemic) probability, (such as Harold Jeffreys, Richard Threlkeld Cox, and Edwin Jaynes), hope to codify techniques that would enable any two persons having the same information relevant to the truth of an uncertain proposition to independently calculate the same probability. Except for simple cases the methods proposed are controversial. Critics challenge the suggestion that it is possible or necessary in the absence of information to start with an objective prior belief which would be acceptable to any two persons who have identical information.

## Bayesian and frequentist probability

The Bayesian approach is in contrast to the concept of frequency probability where probability is held to be derived from observed or imagined frequency distributions or proportions of populations. The difference has many implications for the methods by which statistics is practiced when following one model or the other, and also for the way in which conclusions are expressed. When comparing two hypotheses and using some information, frequency methods would typically result in the rejection or non-rejection of the original hypothesis with a particular degree of confidence, while Bayesian methods would suggest that one hypothesis was more probable than the other or that the expected loss associated with one was less than the expected loss of the other.

Bayes' theorem is often used to update the plausibility of a given statement in light of new evidence. For example, Laplace estimated the mass of Saturn in this way. According to the frequency probability definition, however, the laws of probability are not applicable to this problem. This is because the mass of Saturn isn't a well defined random experiment. From what population is the mass of Saturn taken? In what sense is Saturn picked at random from that population? Unless these questions are answered satisfactorily, frequentism says the laws of probability cannot be used.

## Applications of Bayesian probability

Today, there are a variety of applications of personal probability that have gained wide acceptance. Some schools of thought emphasise Cox's theorem and Jaynes' principle of maximum entropy as cornerstones of the theory, while others may claim that Bayesian methods are more general and give better results in practice than frequency probability. See Bayesian inference for applications and Bayes' Theorem for the mathematics.

Bayesian inference is proposed as a model of the scientific method in that updating probabilities via Bayes' theorem is similar to the scientific method, in which one starts with an initial set of beliefs about the relative plausibility of various hypotheses, collects new information (for example by conducting an experiment), and adjusts the original set of beliefs in the light of the new information to produce a more refined set of beliefs of the plausibility of the different hypotheses. Similarly the use of Bayes factors has been put forward as justifications for Occam's Razor.

Bayesian techniques have recently been applied to filter out e-mail spam with good success. After submitting a selection of known spam to the filter, it then uses their word occurrences to help it discriminate between spam and legitimate email.

See Bayesian inference and Bayesian filtering for more information in this regard.

## Bayesian data analysis

One criticism levelled at the Bayesian probability interpretation by frequentists is that a single probability cannot convey how much evidence one has. Consider the following situations:

1. You have a box with white and black balls, but no knowledge as to the quantities
2. You have a box from which you have drawn N balls, half black and the rest white
3. You have a box and you know that there are the same number of white and black balls

If a Bayesian were to assign a probability to the event "the next drawn ball is black", they would choose probability 1/2 in all three cases. However, frequentists will claim that this single number does not adequately model the above situations.

The confusion lies in the fact that frequentists assign probabilities only to random events, not fixed constants like the probability a drawn ball will be black. Bayesians can easily assign a probability to a probability (a so-called metaprobability). The above events would be modeled in the following way by a Bayesian:

1. You have a box with white and black balls, but no knowledge as to the quantities
Letting $\theta =p$ represent the statement that the probability that the next ball is black is $p$ , a Bayesian might assign a uniform Beta prior distribution:
$\forall \theta \in [0,1]$ $P(\theta )=\mathrm {B} (\alpha _{B}=1,\alpha _{W}=1)={\frac {\Gamma (\alpha _{B}+\alpha _{W})}{\Gamma (\alpha _{B})\Gamma (\alpha _{W})}}\theta ^{\alpha _{B}-1}(1-\theta )^{\alpha _{W}-1}={\frac {\Gamma (2)}{\Gamma (1)\Gamma (1)}}\theta ^{0}(1-\theta )^{0}=1$ Assuming that the ball drawing is modelled as a binomial sampling distribution, the posterior distribution, $P(\theta |m,n)$ , after drawing m additional black balls and n white balls is still a Beta distribution, with parameters $\alpha _{B}=1+m$ , $\alpha _{W}=1+n$ . An intuitive interpretation of the parameters of a Beta distribution is that of imagined counts for the two events. For more information, see Beta distribution.
2. You have a box from which you have drawn N balls, half black and the rest white
Letting $\theta =p$ represent the statement that the probability that the next ball is black is $p$ , a Bayesian might assign a Beta prior distribution, $\mathrm {B} (N/2+1,N/2+1)$ . The maximum aposteriori (MAP) estimate of $\theta$ is $\theta _{MAP}={\frac {N/2+1}{N+2}}$ , precisely Laplace's rule of succession.
3. You have a box and you know that there are the same number of white and black balls
In this case a Bayesian would define the prior probability $P(\Theta )=\delta ({\frac {1}{2}})$ .

Because frequentist statistics disallows metaprobabilities, frequentists have had to propose new solutions.. Cedric Smith and Arthur Dempster each developed a theory of upper and lower probabilities. Glenn Shafer developed Dempster's theory further, and it is now known as Dempster-Shafer theory.