Ternary search trees added to TODO

Mon Apr 23 12:26:32 UTC 2001

Forrest J Cavalier <mibsoft at epix.net> writes:

> I wasn't part of the discussion on the gcc mailing list.  But the
> advantages you list are a little too optimistic.

> Data structure may be simpler, algorithms are not.  Performance is
> worse, unless your hash table is too small with a bucket size of 1.

That's not clear; I'd like to at least try it.  A lookup in a hash table
is linear in the length of the key, so it's O(k) vs. O(lg n + k), which
makes the performance difference a little harder to judge.  Plus
unsuccessful lookup is frequently much faster, and the operation at each
character of the key can be quite a bit faster; it depends on whether
pointer chasing is faster than the mathematical operations the hash
function is performing.

> In a ternary search tree, the scalar comparison is much more costly than
> a tight strcmp loop.

But a hell of a lot faster than the hash function.

> Sizing and resizing will still be an issue. The order of entries in the
> active file must be preserved so that it can be MMAPed, and the active
> file is not necessarily sorted.  So the "sorting" that you get from a
> ternary search tree isn't actually useful.

The only thing we can really store in the search tree is a pointer to the
appropriate line in the memory-mapped copy of the active file; I don't see
any way of fundamentally changing the structure of the active file without
a lot more pain.

> Again, I don't object to that note in TODO, but it sure isn't clear to
> me that ternary search trees are "the" solution to use.

I'll clarify.  Thanks!

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>