Bayesianism is the philosophical tenet that the mathematical theory of probability applies to the degree of plausibility of a statement. This also applies to the degree of believability contained within the rational agents of a truth statement. Additionally, when a statement is used with Bayes' theorem, it then becomes a Bayesian inference.
This is in contrast to frequentism, which rejects degree-of-belief interpretations of mathematical probability, and assigns probabilities only to random events according to their relative frequencies of occurrence. The Bayesian interpretation of probability allows probabilities to be assigned to random events, but also allows the assignment of probabilities to any other kind of statement.
Whereas a frequentist and a Bayesian might both assign a 1/2 probability to the event of getting a head when a coin is tossed, only a Bayesian might assign 1/1000 probability to a personal belief in the proposition that there was life on Mars a billion years ago. This assertion is made without intending to assert anything about relative frequency.
History of Bayesian probability
"Bayesian" probability or "Bayesian" theory is named after Thomas Bayes, who proved a special case of what is called Bayes' theorem. The term Bayesian, however, came into use only around 1950, and in fact it is not clear that Bayes would have endorsed the very broad interpretation of probability now called "Bayesian". Laplace independently proved a more general version of Bayes' theorem and put it to good use in solving problems in celestial mechanics, medical statistics and, by some accounts, even jurisprudence. Laplace, however, didn't consider this theorem to be of fundamental philosophical importance for probability theory. He endorsed the classical interpretation of probability, as did everyone else at his time.
The subjective interpretation of probability theory (later called 'Bayesian') was proposed for the first time by the philosopher Frank P. Ramsey in his book The Foundations of Mathematics from 1931. Ramsey himself saw this interpretation as merely a complement to a frequency interpretation of probability. The one taking this interpretation seriously for the first time was the statistician Bruno de Finetti in 1937. The first detailed analysis came 1954 in the book The Foundations of Statistics by the philosopher L. J. Savage.
The general outlook of Bayesian probability has been that the laws of probability apply equally to propositions of all kinds. For a Bayesian probabilities are merely a measure of the degree of belief a (rational) person has in the proposition in question. Several attempts have been made to ground this intuitive notion in formal demonstrations. One line of argument is based on betting, as expressed by Bruno de Finetti and others. Another line of argument is based on probability as an extension of ordinary logic to degrees of belief other than 0 and 1. This argument has been expounded by Harold Jeffreys, Richard T. Cox, Edwin Jaynes and I. J. Good. Other well-known proponents of Bayesian probability have included John Maynard Keynes and B.O. Koopman.
The frequentist interpretation of probability was preferred by some of the most influential figures in statistics during the first half of the twentieth century, including R.A. Fisher, Egon Pearson, and Jerzy Neyman. The mathematical foundation of probability in measure theory via the Lebesgue integral was elucidated by A. N. Kolmogorov in the book Foundations of the Theory of Probability in 1933. Beginning about 1950 and continuing into the present day, the work of Savage, Koopman, Abraham Wald, and others has led to broader acceptance. Nevertheless, the rift between the "frequentists" and "Bayesians" continues up to this day, with mathematicians working on probability theory and empirical statisticians not talking to each other for the most part, not attending each others' conferences, etc.
Varieties of Bayesian probability
The terms subjective probability, personal probability, epistemic probability and logical probability describe some of the schools of thought which are customarily called "Bayesian". These overlap but there are differences of emphasis.
Subjective probability is supposed to measure the degree of belief an individual has in an uncertain proposition.
Some Bayesians do not accept the subjectivity. The chief exponents of this objectivist school were Edwin Thompson Jaynes and Harold Jeffreys. Perhaps the main objectivist Bayesian now living is James Berger of Duke University. Jose Bernardo and others accept some degree of subjectivity but believe a need exists for "reference priors" in many practical situations.
Advocates of logical (or objective epistemic) probability, (such as Harold Jeffreys, Richard Threlkeld Cox, and Edwin Jaynes), hope to codify techniques that would enable any two persons having the same information relevant to the truth of an uncertain proposition to independently calculate the same probability. Except for simple cases the methods proposed are controversial. Critics challenge the suggestion that it is possible or necessary in the absence of information to start with an objective prior belief which would be acceptable to any two persons who have identical information.
Bayesian and frequentist probability
The Bayesian approach is in contrast to the concept of frequency probability where probability is held to be derived from observed or imagined frequency distributions or proportions of populations. The difference has many implications for the methods by which statistics is practiced when following one model or the other, and also for the way in which conclusions are expressed. When comparing two hypotheses and using some information, frequency methods would typically result in the rejection or non-rejection of the original hypothesis with a particular degree of confidence, while Bayesian methods would suggest that one hypothesis was more probable than the other or that the expected loss associated with one was less than the expected loss of the other.
Bayes' theorem is often used to update the plausibility of a given statement in light of new evidence. For example, Laplace estimated the mass of Saturn in this way. According to the frequency probability definition, however, the laws of probability are not applicable to this problem. This is because the mass of Saturn isn't a well defined random experiment. From what population is the mass of Saturn taken? In what sense is Saturn picked at random from that population? Unless these questions are answered satisfactorily, frequentism says the laws of probability cannot be used.
Applications of Bayesian probability
Today, there are a variety of applications of personal probability that have gained wide acceptance. Some schools of thought emphasise Cox's theorem and Jaynes' principle of maximum entropy as cornerstones of the theory, while others may claim that Bayesian methods are more general and give better results in practice than frequency probability. See Bayesian inference for applications and Bayes' Theorem for the mathematics.
Bayesian inference is proposed as a model of the scientific method in that updating probabilities via Bayes' theorem is similar to the scientific method, in which one starts with an initial set of beliefs about the relative plausibility of various hypotheses, collects new information (for example by conducting an experiment), and adjusts the original set of beliefs in the light of the new information to produce a more refined set of beliefs of the plausibility of the different hypotheses. Similarly the use of Bayes factors has been put forward as justifications for Occam's Razor.
Bayesian techniques have recently been applied to filter out e-mail spam with good success. After submitting a selection of known spam to the filter, it then uses their word occurrences to help it discriminate between spam and legitimate email.
Bayesian data analysis
One criticism levelled at the Bayesian probability interpretation by frequentists is that a single probability cannot convey how much evidence one has. Consider the following situations:
- You have a box with white and black balls, but no knowledge as to the quantities
- You have a box from which you have drawn N balls, half black and the rest white
- You have a box and you know that there are the same number of white and black balls
If a Bayesian were to assign a probability to the event "the next drawn ball is black", they would choose probability 1/2 in all three cases. However, frequentists will claim that this single number does not adequately model the above situations.
The confusion lies in the fact that frequentists assign probabilities only to random events, not fixed constants like the probability a drawn ball will be black. Bayesians can easily assign a probability to a probability (a so-called metaprobability). The above events would be modeled in the following way by a Bayesian:
- 1. You have a box with white and black balls, but no knowledge as to the quantities
- Letting represent the statement that the probability that the next ball is black is , a Bayesian might assign a uniform Beta prior distribution:
- Assuming that the ball drawing is modelled as a binomial sampling distribution, the posterior distribution, , after drawing m additional black balls and n white balls is still a Beta distribution, with parameters , . An intuitive interpretation of the parameters of a Beta distribution is that of imagined counts for the two events. For more information, see Beta distribution.
- 2. You have a box from which you have drawn N balls, half black and the rest white
- Letting represent the statement that the probability that the next ball is black is , a Bayesian might assign a Beta prior distribution, . The maximum aposteriori (MAP) estimate of is , precisely Laplace's rule of succession.
- 3. You have a box and you know that there are the same number of white and black balls
- In this case a Bayesian would define the prior probability .
Because frequentist statistics disallows metaprobabilities, frequentists have had to propose new solutions.. Cedric Smith and Arthur Dempster each developed a theory of upper and lower probabilities. Glenn Shafer developed Dempster's theory further, and it is now known as Dempster-Shafer theory.
- Frequency probability
- Bayesian inference
- Doomsday argument for a controversial use of Bayesian inference
- MaxEnt thermodynamics - Bayesian view of thermodynamics
- On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, has many chapters on Bayesian methods, including introductory examples; compelling arguments in favour of Bayesian methods (in the style of Edwin Jaynes); state-of-the-art Monte Carlo methods, message-passing methods, and variational methods; and examples illustrating the intimate connections between Bayesian inference and data compression.
- Jaynes, E.T. (1998) Probability Theory : The Logic of Science.
- Bretthorst, G. Larry, 1988, Bayesian Spectrum Analysis and Parameter Estimation in Lecture Notes in Statistics, 48, Springer-Verlag, New York, New York;
- David Howie: Interpreting Probability, Controversies and Developments in the Early Twentieth Century, Cambridge University Press, 2002, ISBN 0521812518
- Colin Howson and Peter Urbach: Scientific Reasoning: The Bayesian Approach, Open Court Publishing, 2nd edition, 1993, ISBN 0812692357, focuses on the philosophical underpinnings of Bayesian and frequentist statistics. Argues for the subjective interpretation of probability.
- Jeff Miller "Earliest Known Uses of Some of the Words of Mathematics (B)"
- Paul Graham "Bayesian spam filtering"
- novomind AG "Outlook categorizing tool based on Bayesian filtering"
- Howard Raiffa Decision Analysis: Introductory Lectures on Choices under Uncertainty. McGraw Hill, College Custom Series. (1997) ISBN 007-052579-X
- Devender Sivia, Data Analysis: A Bayesian Tutorial. Oxford: Clarendon Press (1996), pp. 7-8. ISBN 0-19-851889-7