# Kolmogorov complexity

In computer science, Kolmogorov complexity, (also known as descriptive complexity, Kolmogorov-Chaitin complexity, or algorithmic entropy) of an object such as a piece of text is a measure of how much information is needed to specify the object. For example consider the following two strings of length 100

0101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101

1100100001100001110111101110110011111010010000100101011110010110001101111111010001100011011001110111


The first string admits a short English language description namely "50 repetitions of "01"". The second one has no obvious simple description other than writing down the string itself.

More formally, the complexity of a string is the length of the string's shortest description in some fixed description language. The sensitivity of the complexity to the choice of description language is discussed in the article. It can be shown that that the Kolmogorov complexity of any string cannot be too much larger than the length of the string itself. Strings whose Kolmogorov complexity is small relative to the string's size are considered to be not complex. The notion of Kolmogorov complexity is surprisingly deep and can be used to state and prove impossibility results akin to Gödel's incompleteness theorem and Turing's halting problem.

Algorithmic information theory is the area of computer science that studies Kolmogorov complexity and other complexity measures on strings (or other data structures). The field was developed by Andrey Kolmogorov, Ray Solomonoff and Gregory Chaitin starting in the late 1960s. There are several variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on self-delimiting programs and is mainly due to Leonid Levin (1974).

## Definition

To define Kolmogorov complexity, we must first specify a description language for strings. Such a description language can be based on a programming language such as Lisp, Pascal, or Java virtual machine bytecode. If P is a program which outputs a string x, then P is a description of x. The length of the description is just the length of P as a character string. In determining the length of P, the lengths of any subroutines used in P must be accounted for. The length of any integer constant n which occurs in the program P is the number of bits required to represent n, that is (roughly) log2n.

We could alternatively choose an encoding for Turing machines (TM), where an encoding is a function which associates to each TM M a bitstring <M>. If M is a TM which on input w outputs string x, then the concatenated string <M> w is a description of x. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally prefered in the research literature. In this article we will use an informal approach.

Fix a description language. Any string s has at least one description, namely the program

 function GenerateFixedString()
return s


Among all the descriptions of s, there is one with shortest length denoted d(s). In case there is more than one program of the same minimal length, choose one arbitrarily, for example selecting the lexicographically first among them. d(s) is the minimal description of s. The Kolmogorov complexity of s, written K(s), is

$K(s) = |d(s)|. \quad$

In the other words, K(s) is the length of the minimal description of s.

We now consider how the choice of description language affects the value of K and show that the effect of changing the description language is bounded.

Theorem. If K1 and K2 are the complexity functions relative to description languages L1 and L2, then there is a constant c (which depends only on the languages L1 and L2) such that

$|K_1(s) - K_2(s)| \leq c, \quad \forall s$

By symmetry, it suffices to prove that there is some constant c such that for all bitstrings s,

$K_1(s) \leq K_2(s) + c.$

To see why this is so, there is a program in the language L1 which acts as an interpreter for L2:

  function InterpretLanguage(string p)


where p is a program in L2. The interpreter is characterized by the following property:

Running InterpretLanguage on input p returns the result of running p.

Thus if P is a program in L2 which is a minimal description of s, then InterpretLanguage(P) returns the string s. The length of this description of s is the sum of

1. The length of the program InterpretLanguage, which we can take to be the constant c.
2. The length of P which by definition is K2(s).

This proves the desired upper bound.

## Basic results

In the following, we will fix one definition and simply write K(s) for the complexity of the string s.

It is not hard to see that the minimal description of a string cannot be too much larger than the string itself: the program GenerateFixedString above that outputs s is a fixed amount larger than s

Theorem. There is a constant c such that

$K(s) \leq |s| + c, \quad \forall s$

The first surprising result is that there is no way to effectively compute K.

Theorem. K is not a computable function.

In other words, there is no program which takes a string s as input and produces the integer K(s) as output. We show this by contradiction. Suppose there is a program

  function KolmogorovComplexity(string s)


that takes as input a string s and returns K(s). Now consider the program

  function GenerateComplexString(int n)
for i = 1 to infinity:
for each string s of length exactly i
if KolmogorovComplexity(s) >= n
return s
quit


This program calls KolmogorovComplexity as a subroutine. This program tries every string, starting with the shortest, until it finds a string with complexity at least n, then returns that string. Therefore, given any positive integer n, it produces a string with Kolmogorov complexity at least as great as n. The program itself has a fixed length U. The input to the program GenerateComplexString is an integer n; here, the size of n is measured by the number of bits required to represent n which is log2(n). Now consider the following program:

  function GenerateParadoxicalString ()
return GenerateComplexString(n0)


This program calls GenerateComplexString as a subroutine and also has a free parameter n0. This program outputs a string s whose complexity is at least n0. By an auspicious choice of the parameter n0 we will arrive at a contradiction. To choose this value, note s is described by the program GenerateParadoxicalString whose length is at most

$U + \log_2(n_0) + L \quad$

where L is the "overhead" of GenerateParadoxicalString. Since n grows faster than log2(n), there exists a value n0 such that

$U + \log_2(n_0) + L < n_0. \quad$

But this contradicts the definition of having a complexity at least n0. Thus the program named "KolmogorovComplexity" cannot actually generate strings with the desired Kolmogorov complexity.

This is proof by contradiction where the contradiction is similar to the Berry paradox: "Let n be the smallest positive integer that cannot be defined in fewer than twenty English words. Well, I just defined it in fewer than twenty English words."

## Compression

It is however straightforward to compute upper bounds for K(s): simply compress the string s with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the resulting string's length.

A string s is compressible by c if it has a description whose length does not exceed |s| − c. This is equivalent to saying K(s) ≤ |s| − c. Otherwise s is incompressible by c.

The next important result is about the randomness of strings. Most strings are complex in the sense that they cannot be significantly compressed: K(s) is not much smaller than |s|, the length of s in bits. Fix a value of n. There are 2n bitstrings of length n. The uniform probability distribution on the space is of these bitstrings assigns to each string of length exactly n equal weight 2n.

The precise statement that most strings are complex is as follows:

Theorem. With the uniform probability distribution on the space of bitstrings of length n, the probability that a string is incompressible by c is at least 1 − 2c+1 + 2n.

To prove the theorem, note that the number of descriptions of length not exceeding nc is given by the geometric series:

$1 + 2 + 2^2 + \cdots + 2^{n-c} = 2^{n-c+1}-1.\quad$

There remain at least

$2^n-2^{n-c+1}+1 \quad$

many bitstrings of length n that are incompressible by c. To determine the probability divide by 2n.

This theorem is the justification for various challenges in comp.compression FAQ.

## Chaitin's incompleteness theorem

Though we know that most strings are complex in the above sense, the fact that a specific string is complex can never be proven (if the string's length is above a certain threshold). The precise formalization is as follows. First fix a particular axiomatic system S for the natural numbers. The axiomatic system has to be powerful enough to allow various encodings. See Gödel numbering.

Theorem. There exists a constant L (which only depends on the particular axiomatic system and the choice of description language) such that there is no string s for which the statement

$K(s) \geq L \quad$

can be proven within the axiomatic system (even though, as we know, the vast majority of those statements must be true).

The proof of this result is a variant of the proof of Berry's paradox. The proof is by contradiction. If the theorem were false, then

Assumption (X): For any integer n there exists a string s for which there is a proof in S of the formula "K(s) ≥ n"(which we assume can be formalized in S).

We can find an effective enumeration of all the formal proofs in S by some procedure

  function NthProof(int n)


which takes as input n and outputs some proof. This function enumerates all proofs. Some of these are proofs for formulas we do not care about here (examples of proofs which will be listed by the procedure NthProof are the various known proofs of the law of quadratic reciprocity, those of Fermat's little theorem or the proof of Fermat's last theorem all translated into the formal language of S). A small fraction are complexity formulas of the form K(s) ≥ n where s and n constants in the language of S. There is a program

  function NthProofProvesComplexityFormula(int n)


which determines whether the nth proof actually proves a complexity formula K(s) ≥ L. The strings s and the integer L in turn are computable by programs:

  function StringNthProof(int n)

  function ComplexityLowerBoundNthProof(int n)


Consider the folowing program

  function GenerateProvablyComplexString(int n)
for i = 1 to infinity:
if  NthProofProvesComplexityFormula(i) and
ComplexityLowerBoundNthProof(i) >= n
return StringNthProof(i)
quit


Given an n, this program tries every proof until it finds a string and a proof in the formal system S of the formula K(s) ≥ n. The program terminates by our Assumption (X). Now this program has a length U. There is an integer n0 such that U+log2(n0) <n0. Now

   function GenerateProvablyParadoxicalString()
return GenerateProvablyComplexString(n0)
quit


outputs a string s which ostensibly has complexity provably ≥ n0. However, s is also described by a program of length U+log2(n0) so its complexity is less than n0. This proves Assumption (X) cannot hold.

Similar ideas are used to prove the properties of Chaitin's constant.

The minimum message length principle of statistical and inductive inference and machine learning was developed by C.S. Wallace and D.M. Boulton in 1968. MML is Bayesian (it incorporates prior beliefs) and information-theoretic. It has the desirable properties of statistical invariance (the inference transforms with a re-parametrisation, such as from polar coordinates to Cartesian coordinates), statistical consistency (even for very hard problems, MML will converge to any underlying model) and efficiency (the MML model will converge to any true underlying model about as quickly as is possible). C.S. Wallace and D.L. Dowe showed a formal connection between MML and algorithmic information theory (or Kolmogorov complexity) in 1999.

## References

• Ming Li and Paul Vitányi, An Introduction to Kolmogorv Complexity and Its Applications, Springer, 1997
• Yu Manin, A Course in Mathematical Logic, Springer-Verlag, 1977
• Michael Sipser, Introduction to the Theory of Computation, PWS Publishing Company, 1997.