Trie

From Exampleproblems

Jump to: navigation, search
Image:Trie example.png
A trie for keys 'to', 'tea', 'ten', 'in', 'inn'.

In computer science, a trie is an ordered tree data structure that is used to store an associative array where the keys are strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of any one node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that happen to correspond to keys of interest.

The term trie comes from "retrieval". Due to its etymology some sources say it should be pronounced as "tree", while others encourage the use of "try" in order to distinguish it from the more general tree.

In the shown example, keys are listed in the nodes and values below them. Each complete English word has an integer value associated with it. A trie can be seen as a deterministic finite automaton, although the symbol on each edge is often implicit in the order of the branches.

Note that it is not necessary for keys to be explicitly stored in nodes. (In the figure, words are shown only to illustrate how the trie works). Additionally, a trie does not need to have nodes with a single child. The i node in the figure, for example, could point directly to the word inn in a table of keys without having the trie lose any of its capabilities. Trimming trie branches this way results in considerable space savings.

Contents

Advantages and drawbacks

The following are the main advantages of tries over binary search trees (BSTs):

  • Looking up keys is faster. Looking up a key of length m takes worst case O(m) = O(1) time; c.f. BST where O(ln(n)) time is required, because initial characters are examined repeatedly during multiple comparisons. Also, the simple operations tries use during lookup, such as array indexing using a character, are fast on real machines.
  • Tries (arguably) require less space because the keys are not stored explicitly and nodes are shared between keys.
  • Tries help with longest-prefix matching, where we wish to find the key sharing the longest possible prefix with a given key, efficient. They also allow one to associate a value with an entire group of keys that have a common prefix.
  • There is no need to keep a trie balanced.

Its disadvantages are the following:

  • Tries can give an ordering of the keys, but it must correspond to some alphabetical ordering.
  • Tries can be bigger in certain circumstances.
  • Trie algorithms are arguably more difficult to write.
  • It is not always easy to represent data as strings, e.g. complex objects.

Although it seems restrictive to say a trie's key type must be a string, many common data types can be seen as strings; for example, an integer can be seen as a string of bits. Integers with common bit prefixes occur as map keys in many applications such as routing tables and address translation tables.

Tries are most useful when the keys are of varying lengths and we expect some key lookups to fail, because the key is not present. If we have fixed-length keys, and expect all lookups to succeed, then we can improve key lookup by combining every node with a single child (such as "i" and "in" above) with its child, producing a Patricia trie. This saves space in maps where long paths down the trie do not have branches fanning out, for example in maps where many keys have a long common prefix or where many portions of keys are composed of characters all unique.

Clarification about performance

It is acceptable to consider trie search time O(1). However, this is not entirely correct because it assumes that the length of the keys is constant. Given N distinct keys, the lower bound of the length of the longest key is actually logkN where k is the size of the alphabet. It can therefore be demonstrated that trie search time is O(log N) strictly speaking, which would appear to be the same as that of BST search.

This observation nonetheless does not take away from the benefits of tries. The same could be applied to BSTs or binary search, i.e. key comparison time also depends on the length of the keys. Therefore, BST and binary search time is actually O(log2N). In practice, the second log N factor has a base so large that it would not be possible to notice it experimentally and is generally irrelevant.

A similar argument could be made in regards to radix sort, which is generally considered to be O(N).

Applications

As replacement of other data structures

As mentioned, a trie has a number of advantages over binary search trees. A trie can also be used to replace a hash table, over which it has the following advantages:

  • Average lookup speed is theoretically the same, but a trie is faster in practice.
  • Worst-case lookup speed in hash tables is O(N).
  • There are no key collisions.
  • Buckets are only necessary if a single key is mapped to more than one value.
  • There is no need to provide a hash function.
  • A trie can provide an alphabetical ordering of the entries by key.

Tries do have some drawbacks as well:

  • It is not easy to represent all keys as strings.
  • They are frequently less space-efficient than hash tables.
  • Unlike hash tables, they are generally not already available in programming language toolkits.

Dictionary representation

A seemingly obvious application of tries is that of storing dictionary words. Tries allow for very fast word lookup, insertion and deletion. Words also share nodes, so one might expect some space savings (which is not the case in practice due to space required by nodes). They are also well suited for the implementation of approximate matching algorithms in spell checking software, for example. If only the dictionary words need to be stored, however, without any additional information required to be stored along with each word, minimal acyclic deterministic finite automata are much more compact than tries.

Sorting

Alphabetic sorting of a set of keys can be accomplished with a simple trie-based algorithm as follows:

Theoretically this algorithm is just as fast as radix sort, but in practice it is likely slower due to the need to allocate tree nodes.

A parallel algorithm for sorting N keys based on tries is O(1) if there are N processors.

Full text search

A special kind of trie, called a suffix tree, can be used to index all suffixes in a text in order to carry out fast full text searches.

See also

External links

de:Trie

Argan Oil
Natural Skin Care
Organic Skin Care
visitor stats