# Cantor set

The Cantor set, introduced by German mathematician Georg Cantor, is a remarkable construction involving only the real numbers between zero and one.

The Cantor set is defined by repeatedly removing the middle thirds of line segments. One starts by removing the middle third from the unit interval [0, 1], leaving [0, 1/3] ∪ [2/3, 1]. Next, the "middle thirds" of all of the remaining intervals are removed. This process is continued ad infinitum. The Cantor set consists of all points in the interval [0, 1] that are not removed at any step in this infinite process.

The first six steps of this process are illustrated below.

## What's in the Cantor set?

Since the Cantor set is defined as the set of points not excluded, the proportion of the unit interval remaining can be found by total length removed. This total is the geometric series

$\displaystyle \frac{1}{3} + \frac{2}{9} + \frac{4}{27} + \frac{8}{81} + \cdots = \sum_{n=0}^\infty \frac{2^n}{3^{n+1}} = \frac{1}{3}\left(\frac{1}{1-\frac{2}{3}}\right) = 1.$

So that the proportion left is 1 – 1 = 0. Alternatively, it can be observed that each step leaves 2/3 of the length in the previous stage, so that the amount remaining is 2/3 × 2/3 × 2/3 × ..., an infinite product which equals 0 in the limit.

From the calculation, it may seem surprising that there would be anything left — after all, the sum of the lengths of the removed intervals is equal to the length of the original interval. However a closer look at the process reveals that there must be something left, since removing the "middle third" of each interval involved removing open sets (sets that do not include their endpoints). So removing the line segment (1/3, 2/3) from the original interval [0, 1] leaves behind the points 1/3 and 2/3. Subsequent steps do not remove these (or other) endpoints, since the intervals removed are always internal to the intervals remaining. So we know for certain that the Cantor set is not empty.

### Non-endpoints in the Cantor set

It may appear that only the endpoints are left, but that is not the case either. The number 1/4, for example is in the bottom third, so it is not removed at the first step, and is in the top third of the bottom third, and is in the bottom third of that, and in the top third of that, and in the bottom third of that, and so on ad infinitum -- alternating between top third and bottom third. Since it is never in one of the middle thirds, it is never removed, and yet it is also not one of the endpoints of any middle third.

## Properties

### The Cantor set is uncountable

It can be shown that there are as many points left behind in this process as there were that were removed. To see this, we show that there is a function f from the Cantor set C to the closed interval [0,1] that is surjective (i.e. f maps from C onto [0,1]) so that the cardinality of C is no less than that of [0,1]. Since C is a subset of [0,1], its cardinality is also no greater, so it must in fact be equal.

To construct this function, consider the points in the [0, 1] interval in terms of base 3 (or ternary) notation. In this notation, 1/3 can be written as 0.13 and 2/3 can be written as 0.23, so the middle third (to be removed) contains the numbers with ternary numerals of the form 0.1xxxxx...3 where xxxxx...3 is strictly between 00000...3 and 22222...3. So the numbers remaining after the first step consists of

• Numbers of the form 0.0xxxxx...
• 1/3 = 0.13 = 0.022222...3 (This alternative "recurring" representation of a number with a terminating numeral occurs in any positional system.)
• 2/3 = 0.122222...3 = 0.23
• Numbers of the form 0.2xxxxx...3

All of which can be stated as those numbers with a ternary numeral 0.0xxxxx...3 or 0.2xxxxx...3

The second step removes numbers of the form 0.01xxxx...3 and 0.21xxxx...3, and (with appropriate care for the endpoints) it can be concluded that the remaining numbers are those with a ternary numeral whose first two digits are not 1. Continuing in this way, for a number not to be excluded at step n, it must have a ternary representation whose nth digit is not 1. For a number to be in the Cantor set, it will not to be excluded at any step, it must have a numeral consisting entirely of 0's and 2's. It is worth emphasising that numbers like 1, 1/3 = 0.13 and 7/9 = 0.213 are in the Cantor set, as they have ternary numerals consisting entirely of 0's and 2's: 1 = 0.2222...3, 1/3 = 0.022222...3 and 7/9 = 0.2022222...3. So while a number in C may have either a terminating or a recurring ternary numeral, only one of its numerals consists entirely of 0's and 2's.

The function from C to [0,1] is defined by taking the numeral that does consist entirely of 0's and 2's, and replacing all the 2's by 1's. In a formula,

$\displaystyle f \left ( \sum_{k=1}^\infty a_k 3^{-k} \right ) = \sum_{k=1}^\infty (a_k/2) 2^{-k}$

For any number y in [0,1], its binary representation can be translated into a ternary representation of a number x in C by replacing all the 1's by 2's. With this, f(x) = y so that y is in the range of f. For instance if y=3/5=0.100110011001...2, we write x = 0.200220022002...3 = 7/10. Consequently f is surjective; however, f is not injective — interestingly enough, the values for which f(x) coincides are those at opposing ends of one of the middle thirds removed. For instance, 7/9 = 0.2022222...3 and 8/9 = 0.2200000...3 so f(7/9) = 0.101111...2 = 0.113 = f(8/9).

So there are as many points in the Cantor set as there are in [0, 1], and the Cantor set is uncountable (see Cantor's diagonal argument). However, the set of endpoints of the removed intervals is countable, so there must be uncountably many numbers in the Cantor set which are not interval endpoints. As noted above, one example of such a number is 1/4, which can be written as 0.02020202020...3 in ternary notation.

### The Cantor set is a fractal

The Cantor set is the prototype of a fractal. It is self-similar, because it is equal to two copies of itself, if each copy is shrunk by a factor of 1/3 and translated. Its Hausdorff dimension is equal to ln(2)/ln(3). It can be formed by intersecting a Sierpinski carpet with any of its lines of reflectional symmetry (such as reading the center scanline).

### Topological and analytical properties

As the above summation argument shows, the Cantor set is uncountable but has Lebesgue measure 0. Since the Cantor set is the complement of a union of open sets, it itself is a closed subset of the reals, and therefore a complete metric space. Since it is also bounded, the Heine-Borel theorem says that it must be compact.

For any point in the Cantor set and any arbitrarily small neighborhood of the point, there is some other number with a ternary numeral of only 0's and 2's, as well as numbers whose ternary numerals contain 1's. Hence, every point in the Cantor set is an accumulation point, but none is an interior point. A closed set in which every point is an accumulation point is also called a perfect set in topology, while a closed subset of the interval with no interior points is nowhere dense in the interval.

For two points in the Cantor set, there will be some ternary digit where they differ — one d will have 0 and the other 2. By splitting the Cantor set into "halves" depending on the value of this digit, one obtains a partition of the Cantor set into two closed sets that separate the original two points. In the relative topology on the Cantor set, the points have been separated by a clopen set. Consequently the Cantor set is totally disconnected. As a compact totally disconnected Hausdorff space, the Cantor set is an example of a Stone space.

It is worth noting that as a topological space, the Cantor set is homeomorphic to the product of countably many copies of the space {0, 1}, where each copy carries the discrete topology, as can easily be shown using the ternary expansion used to prove its uncountability. This can be used to show that the Cantor set is homogeneous in the sense that for any two points x and y in the Cantor set C, there exists a homeomorphism f : CC with f(x) = y.

The Cantor set is also homeomorphic to the p-adic integers, and, if one point is removed from it, to the p-adic numbers.

The Cantor set can be characterized by these properties: every nonempty totally-disconnected perfect compact metric space is homeomorphic to the Cantor set. See Cantor space for more on spaces homeomorphic to the Cantor set.

The Cantor set is "universal in the category of compact metric spaces". This means that any compact metric space is a continuous image of the Cantor set. This fact has important applications in functional analysis.

## Variants of the Cantor set

Instead of repeatedly removing the middle third of every piece as in the Cantor set, we could also keep removing any other fixed percentage (other than 0% and 100%) from the middle. The resulting sets are all homeomorphic to the Cantor set and also have Lebesgue measure 0. In the case where the middle 8/10 of the interval is removed, we get a remarkably accessible case — the set consists of all numbers in [0,1] that can be written as a decimal consisting entirely of 0's and 9's.

By removing progressively smaller percentages of the remaining pieces in every step, one can also construct sets homeomorphic to the Cantor set that have positive Lebesgue measure, while still being nowhere dense.

## Historical remarks

This set would have been considered abstract at the time when Cantor devised it. Cantor himself was led to it by practical concerns about the set of points where a trigonometric series might fail to converge. The discovery did much to set him on the course for developing an abstract, general theory of infinite sets.