Randomized algorithm

From Example Problems
Jump to navigation Jump to search

A randomized algorithm is an algorithm which is allowed to flip a truly random coin. In common practice, this means that the machine implementing the algorithm has access to a pseudo-random number generator. The algorithm typically uses the random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case". Formally, the algorithm's performance will be a random variable determined by the random bits, with (hopefully) good expected value. The "worst case" is typically so unlikely to occur that it can be ignored.

Motivation

As a motivating example, consider the problem of finding an 'a' in an array of n elements, given that half are 'a's and the other half are 'b's. The obvious approach is to look at each element of the array, but this would take very long (n/2 operations) if the array were ordered as 'b's first followed by 'a's. There is a similar drawback with checking in the reverse order, or checking every second element. In fact, with any strategy at all in which the order in which the elements will be checked is fixed, i.e, a deterministic algorithm, we cannot guarantee that the algorithm will complete quickly for all possible inputs. On the other hand, if we were to check array elements at random, then we will quickly find an 'a' with high probability, whatever be the input.

Randomized algorithms are particularly useful when faced with a malicious "adversary" or attacker who deliberately tries to feed a bad input to the algorithm (see competitive analysis). It is for this reason that randomness is ubiquitous in cryptography. In cryptographic applications, pseudo-random numbers cannot be used, since the adversary can predict them, making the algorithm effectively deterministic. Therefore either a source of truly random numbers or a cryptographically secure pseudo-random number generator is required. Another area in which randomness is inherent is quantum computing.

In the example above, the randomized algorithm always outputs the correct answer, it is just that there is a small probability of taking long to execute. Sometimes we want the algorithm to always complete quickly, but allow a small probability of error. The former type are called Las Vegas algorithms, and the latter are Monte Carlo algorithms (related to the Monte Carlo method for simulation). Observe that any Las Vegas algorithm can be converted into a Monte Carlo algorithm, by having it output an arbitrary, possibly incorrect answer if it fails to complete within a specified time.

Also refer to Probabilistic analysis, which is based on assuming something about the set of all possible inputs. This assumption is then used to design an efficient algorithm. If the characteristic of the input to an algorithm is unknown, e.g. we do not know the distribution of the inputs, probabilistic analysis cannot be used. Usually in a randomized algorithm, some pseudo-random number generator is used inside of the program. An algorithm is randomized if its output is determined by the input as well as the values produced by a random-number generator.

Complexity

Computational complexity theory, the study of the computational resources required to solve a given problem, models randomized algorithms as probabilistic Turing machines. Both Las Vegas and Monte Carlo algorithms are considered, and several "complexity classes" are studied. The most basic randomized complexity class is RP, which is the class of decision problems for which there is an efficient (polynomial time) randomized algorithm (or probabilistic Turing machine) which recognizes NO-instances with absolute certainty and recognizes YES-instances with a probability of at least 1/2. The complement class is Co-RP, and the class of problems for which both YES and NO answers are allowed to be probabilistic is BPP. Problem classes having (possibly nonterminating) algorithms with polynomial time average case running time whose output is always correct are said to be in ZPP. For problems (believed to lie) outside these classes, such as NP-hard problems, even randomized algorithms do not suffice, and it becomes necessary to resort to approximation algorithms.

Historically, the study of randomized algorithms was spurred by the discovery by Miller and Rabin in 1976 that the problem of determining the primality of a number can be solved efficiently by a randomized algorithm. At that time, no practical deterministic algorithm for primality was known.

The Miller-Rabin primality test relies on a binary relation between two positive integers k and n that can be expressed by saying that k "is a witness to the compositeness of" n. It can be shown that

  • If there is a witness to the compositeness of n, then n is composite (i.e., n is not prime), and
  • If n is composite then at least three-fourths of the natural numbers less than n are witnesses to its compositeness, and
  • There is a fast algorithm that, given k and n, ascertains whether k is a witness to the compositeness of n.

Observe that this implies that the primality problem is in Co-RP. If one randomly chooses 100 numbers less than a composite number n, then the probability of failing to find such a "witness" is (1/4)100 so that for most practical purposes, this is a good primality test. If n is big, there may be no other test that is practical. The probability of error can be reduced to an arbitrary degree by performing enough independent tests.

Therefore, in practice, there is no penalty associated with accepting a small probability of error, since with a little care the probability of error can be made astronomically small. Indeed, even though a deterministic polynomial-time primality test has since been found, it is has not replaced the older probabilistic tests in cryptographic software nor is it expected to do so for the foreseeable future.

If, using a randomized method, the probability of error is 2−1000, the philosophical question arises: is this a mathematical proof? After all, the probability of the algorithm's error is distinctly smaller than the probability of an error in the computer hardware executing it, or the reader making an error in reading a proof - what does it mean in real terms to consider this small a probability?

Applications

Quicksort is probably the most familiar "real-world" algorithm in which randomness is very useful. The deterministic version of this algorithm requires O(n2) time to sort n numbers for some degenerate inputs -- such as the array elements being already in sorted order! However, if the algorithm randomly permutes elements before starting, it finishes in O(n log n) time with a high probability, for any input. The difference between the two is crucial when sorting large datasets.

A more complex example, representative of the use of randomized algorithms to solve graph theoretic problems, is the following randomized minimum cut algorithm:

   find_min_cut(undirected graph G) {
       while there are more than 2 nodes in G do {
           pick an edge (u,v) at random in G
           contract the edge, while preserving multi-edges
           remove all loops
       }
       output the remaining edges
   }

Here, contracting an edge (u,v) means adding a new vertex w, replacing any edge (u,x) or (v,x) with (w,x), and then deleting u and v from G.

Let n = |V[G]|. It can be shown that this algorithm outputs a minimum cut with probability at least n-2, thus running it n2log(n) times and taking the smallest output cut, we find a minimum cut with high probability.

References

  1. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, New York (NY), 1995.
  2. M. Mitzenmacher and E. Upfal. Probability and Computing : Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York (NY), 2005.

cs:Pravděpodobnostní algoritmy de:randomisierter Algorithmus he:אלגוריתם אקראי th:อัลกอริทึมแบบสุ่ม zh:随机化算法