Infinite monkey theorem

File:Monkey-typing.jpg
According to the second Borel-Cantelli lemma, given enough time, a chimpanzee like this one typing at random will almost certainly eventually type out a copy of one of Shakespeare's plays.

The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard will almost surely eventually type every book in France's Bibliothèque nationale de France (National Library). In the restatement of the theorem most popular among English speakers, the monkeys eventually type out the collected works of William Shakespeare.

The original image was presented in Émile Borel's 1913 book "Mécanique Statistique et Irréversibilité," J. Phys. 5e série, vol. 3, 1913, pp.189-196. These "monkeys" are not actual monkeys; rather, they were a vivid metaphor for an imaginary way to produce a large, random sequence of letters. Borel said that if a million monkeys typed ten hours a day, it was extremely, extremely unlikely that their output would exactly equal all the books of the richest libraries of the world; and yet, in comparison, it was even more unlikely that the laws of statistical mechanics would ever be violated, even briefly. For Borel, the purpose of the monkey image was to illustrate the magnitude of an extraordinarily unlikely event.

After about 1970, popular statements of the monkey image draw upon infinity, saying that an infinite number of monkeys typing for an infinite amount of time will produce a given text. To insist on both infinities, however, is excessive. A single immortal monkey who executes infinitely many keystrokes will almost surely eventually type out any given text, and an infinite number of monkeys will almost surely begin producing all possible texts immediately, with no wait. In fact, in both cases, the text will almost surely be produced an infinite number of times.

Intuitive proof sketch

The infinite monkey theorem is straightforward to prove, even without appealing to more advanced results. If two events are statistically independent, meaning neither affects the outcome of the other, then the probability of both happening equals the product of the probabilities of each one happening on its own. For example, if the chance of rain in Sydney on a particular day is 0.3 and the chance of an earthquake in San Francisco on that day is 0.8, the chance of both happening on that same day is 0.3 × 0.8 = 0.24.

Now, suppose the typewriter has 50 keys, and the word to be typed is "banana". Typing at random, the chance that the first letter typed is b is 1/50, as is the chance that the second letter typed is a, and so on. These events are independent, so the chance of the first six letters matching "banana" is 1/506. For the same reason, the chance that the next 6 letters match "banana" is also 1/506, and so on.

Now, the chance of not typing "banana" in each block of 6 letters is 1 − 1/506. Because each block is typed independently, the chance, X, of not typing "banana" in any of the first n blocks of 6 letters is X = (1 − 1/506)n. As n grows, X gets smaller. For an n of a million, X is 99.99%, but for an n of 10 billion it is 53% and for an n of 100 billion it is 0.17%. As n approaches infinity, the probability X approaches zero; that is, by making n large enough, X can be made as small as one likes. If we were to count occurrences of "banana" that crossed blocks, X would approach zero even more quickly. The same argument applies if the monkey were typing any other string of characters of any length.

The same argument shows why infinitely many monkeys (almost surely) produce a text as quickly as it would be produced by a perfectly accurate human typist copying it from the original. In this case X = (1 − 1/506)n where X represents the probability that none of the first n monkeys types "banana" correctly on their first try. When we consider 100 billion monkeys, the probability falls to 0.17%, and as the number of monkeys, n increases to infinity the value of X (the probability of all the monkeys failing to reproduce the given text) decreases to zero. This is equivalent to stating that the probability that one or more of an infinite number of monkeys will produce a given text on the first try is 100%, or that it is almost certain they will do so. The only difficulty remaining is in locating a successful monkey.

Formal statements

Although the infinite monkey theorem is usually stated informally, a precise formal statement clarifies its exact meaning. It is easiest to state in the computer science theory of strings, which are sequences of characters chosen from some finite alphabet. In this setting, the two statements above would be stated formally as follows:

• Given an infinite string where each character is chosen uniformly at random, any given finite string almost surely (with probability 1) occurs as a substring at some position (and indeed, infinitely many positions).
• Given an infinite sequence of such infinite strings, where each character of each string is chosen uniformly at random, any given finite string almost surely occurs as a prefix of one these infinite strings (and indeed, as a prefix of infinitely many of these strings in the sequence).

Both follow easily from the second Borel-Cantelli lemma. Suppose our desired text has length n. For the second theorem, let Ek be the event that the kth string begins with the given text. Because this has some fixed nonzero probability p of occurring, the Ek are independent, and the below sum diverges, the probability that infinitely many of the Ek occur is 1. The first theorem is shown the same way, except that we divide the random string into nonoverlapping blocks of n characters each, and make Ek the event where the kth block equals the desired string.

$\displaystyle \sum_{i=1}^\infty P(E_k) = \sum_{i=1}^\infty p = \infty.$

In fact, even going to infinity may be excessive. If the alphabet has size a, then it can be shown that the probability that one of the first an events occurs is at least 1/2. Thus, 20an tries would suffice to type the given text with probability very close to 1. The problem also parallelizes well; k monkeys can type the text k times as quickly (linear speedup). For small n this isn't too bad; for example, a thousand monkeys typing random letters at 100 characters per minute would very likely type the word "banana" within 6 weeks.

The theorem is an instance of Kolmogorov's zero-one law.

Because we need to be precise about whether it is the number of monkeys or the time available which is infinite, the name "infinite monkey theorem" is a misnomer; each individual monkey is finite. Mathematicians would say "infinite monkeys" only if they mean each monkey alone is infinite, and "infinitely many monkeys" if they mean that.

Probabilities

Ignoring punctuation, spacing, and capitalization, and assuming a uniform distribution of letters, a monkey has one chance in 26 of correctly typing the first letter of Hamlet. It has one chance in 676 (26 times 26) of typing the first two letters. Because the probability shrinks exponentially, at 20 letters it already has only one chance in 2620 = 19,928,148,895,209,409,152,340,197,376, roughly equivalent to the probability of buying 4 lottery tickets consecutively and winning the jackpot each time. In the case of the entire text of Hamlet, the probabilities are so vanishingly small they can barely be conceived in human terms. The text of Hamlet, even stripped of all punctuation, contains well over 130,000 letters.

The mere fact that there is a chance, however unlikely, is the key to the "infinite monkey theorem", because Kolmogorov's zero-one law says that such an infinite series of independent events must have a probability of zero or one. Since we have shown above that the chance is not zero, it must be one. To consider that an event this unlikely is almost guaranteed to occur given infinite time can give a sense of the vastness of infinity.

Gian-Carlo Rota wrote in a textbook on probability (unfinished when he died):

"If the monkey could type one keystroke every nanosecond, the expected waiting time until the monkey types out Hamlet is so long that the estimated age of the universe is insignificant by comparison ... this is not a practical method for writing plays. (We cannot resist the temptation to quote from A.N. Whitehead, "I will not go to infinity".)"

In The Nature of the Physical World: The Gifford Lectures (Macmillan, New York, 1929, page 72) the physicist Arthur Eddington wrote:

"If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel."

In physics, then, the force of the "monkeys argument" lies not in the probability that the monkeys will "eventually" produce something intelligible, but in the practical reality that they will not. Any physical process that is even less likely than such monkeys' success is effectively impossible; this is the statistical basis of the second law of thermodynamics.

It is sometimes reported, though nearly impossible, that Borel's use of monkeys and typewriters in his theorem was inspired by an argument used by Thomas Henry Huxley on June 30, 1860. Huxley was involved in a debate with the Anglican Bishop of Oxford, Samuel Wilberforce, held at a meeting of the British Association for the Advancement of Science at Oxford, of which Wilberforce was a vice-president, and was sparked by the publication of Charles Darwin’s Origin of Species seven months earlier, in November 1859. No transcript of the debate exists, but neither contemporary accounts of it nor Huxley's later recollections include any reference to the infinite monkey theorem. The association of the debate with the infinite monkey theorem is probably an urban myth triggered by the fact that the debate certainly did include some byplay about apes: the bishop asked whether Huxley was descended from an ape on his grandmother's or his grandfather's side, and Huxley responded that he would rather be descended from an ape than from someone who argued as dishonestly as the bishop. It is most unlikely that Huxley would have referred to a typewriter. Although patents for machines resembling modern typewriters were granted as early as 1714, commercial production of typewriters did not begin until 1870, and a skilled debater like Huxley would hardly have let his point depend on a device whose existence would have been unknown to most of his audience.

Literature and popular culture

Jonathan Swift's Gulliver's Travels (1782) anticipates the central idea of the theorem, depicting a professor of the Grand Academy of Lagado who attempts to create a complete list of all knowledge of science by having his students constantly create random strings of letters by turning cranks on a mechanism (Part three, Chapter five).

In "Inflexible Logic" by Russell Maloney, a short story that appeared in The New Yorker in 1940, the protagonist felt that his wealth put him under an obligation to support the sciences, and so he tested the theory. His monkeys immediately set to work typing, without error, classics of fiction and nonfiction. The rich man was amused to see unexpurgated versions of Samuel Pepys' diaries, of which he owned only a copy of a bowdlerized edition.

A similar theme was struck in the story "The Library of Babel" by Jorge Luis Borges, which contains a potentially unlimited number of volumes filled with random strings of characters. The narrator notes that every great work of literature is contained in the library; but these are outnumbered by the flawed works (which are vastly outnumbered by the gibberish).

Popular culture references to this theorem include The Simpsons (in one episode, Montgomery Burns has his own room with 1000 monkeys at typewriters, one of which is chastised for mistyping a word in the opening sentence of A Tale of Two Cities — "Blurst of times? You stupid monkey!"), Family Guy (a group of monkeys is shown collaborating on a line from Shakespeare's Romeo and Juliet in a cut scene) and The Hitchhiker's Guide to the Galaxy (Ford Prefect and Arthur Dent, under the influence of the Infinite Improbability Drive, are ambushed by an infinite number of monkeys who want their opinion on the monkeys' script for Hamlet). In the comic strip Dilbert, Dogbert tells Dilbert that his report would take "three monkeys, ten minutes". The debut album by Leeds punk rock band the Mekons is called "The Quality of Mercy is Not Strnen." Originally released on Virgin Records in the UK in 1979, its cover features a photo of a typing chimp (which, of course, is not a monkey at all).

The theorem is also the basis of a one-act play by David Ives called "Words, Words, Words", which appears in his collection All in the Timing. In the one-act, three monkeys named Milton, Swift, and Kafka have been confined to a cage by a scientist until they can write Hamlet. There is a humorous short story by R.A. Lafferty entitled "Been a Long, Long Time", in which an angel is punished by having to proofread all the output text until some future time (after trillions of Universes have been created and died) when the monkeys produce a perfect copy of Shakespeare's works.

In Tom Stoppard's play Rosencrantz & Guildenstern are Dead, one character says, "If a million monkeys..." and then cannot continue, as the characters are actually within Hamlet, one possible topic of this rule. He then finishes the sentence on a different topic.

In 2000, the IETF Internet standards committee's April 1st RFC proposed an "Infinite Monkey Protocol Suite (IMPS)", a method of directing a farm of infinitely many monkeys over the Internet.

WWDN, the blog of author and actor Wil Wheaton, uses the slogan, "50,000 monkeys at 50,000 typewriters can't be wrong." His witticism won him a Bloggie in 2002 for the category "Best Tagline of a Weblog."

Robert Wilensky once jocularly remarked "We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true."

Comedian Bob Newhart has a stand-up routine in which a lab technician monitoring an "infinitely many monkeys" experiment discovered that one of the monkeys has typed "To be, or not to be; that is the gezortenblatt."

Goats, a popular webcomic illustrated by Jonathan Rosenberg, featured a story line named infinite typewriters where several characters accidently teleport to an alternate dimension. There they find that this dimension is populated by monkeys with typewriters, presumable typing the scripts of many other dimensions.

Infinite monkey experiments

This is a thought experiment which clearly cannot be carried out in practice, since it calls either for infinite time or infinite resources. Nonetheless, it has inspired efforts in finite random text generation.

"The Monkey Shakespeare Simulator" website, launched on July 1, 2003, contains a Java applet that simulates a large population of monkeys typing randomly, with the stated intention of seeing how long it takes the virtual monkeys to produce a complete Shakespearean play from beginning to end. Though it does not update its records anymore, the generator has generated sequences up to 30 letters long (Example: "Suffolke. As by your high ImpeKnCmV"WLC3s"IvWzSCLviA2:tjbh[,s..." from Henry VI, Part II.) Due to processing power limitations, the program uses a probabilistic model (by using a random number generator) instead of actually generating random text and comparing it to Shakespeare. When the simulator "detects a match" (that is, the RNG generates a certain value or a value within a certain range), the simulator simulates the match by generating matched text.

In 2003, scientists at Paignton Zoo and the University of Plymouth, in Devon in England reported that they had left a computer keyboard in the enclosure of six Sulawesi Crested Macaques for a month; not only did the monkeys produce nothing but five pages (PDF) consisting largely of the letter S, they started by attacking the keyboard with a stone, and continued by urinating and defecating on it.