[bind10-dev] Using AVL tree instead of red-black tree in DomainTree

Wed Mar 6 07:12:22 UTC 2013

At Sun, 3 Mar 2013 11:57:33 +0530,
Mukund Sivaraman <muks at isc.org> wrote:

> * There doesn't seem to be a lot of difference between red-black tree
>   and AVL implementations in query performance. A guess is that other
>   factors count more than the height of the tree during query
>   processing.
> 
> * Red-black trees performed slightly better than AVL trees when data was
>   given in random order. I am not sure if this was due to the actual
>   data itself being very suitable for a red-black tree, but this seems
>   strange. Can someone else run such tests?

I don't have any theory about the second point, but I wouldn't be
surprised that the difference between these two cases are generally
marginal.  After all both algorithms have the same order of lookup
complexity; the difference is just the constant factor for the worst
(or best) cases.  So, in average it's reasonable both basically run
equally fast.

But in any event, if we are still interested in the difference on the
lookup performance between these two algorithms, we should first
perform micro benchmarks.  As pointed out in the first bullet the
entire query processing includes other expensive operations such as
name compression, so comparison on the whole qps will make the results
obscure for the main focus in this context.

> * This is just a query performance test. We'd also need to measure
>   insert performance, though it's not as important as query
>   performance. AVL trees are expected to be slower than red-black trees
>   in this case, but it may not make a major difference in our case.

As usual actual measurement will be necessary, but in my gut feeling
full load from a master file or a DB-based data source won't show a
big difference because parsing a master file or DB access would be
even slower.

> * These are performance differences, but the major advantage AVL tree
>   brings us is in code.

Although I'm not necessarily opposed to switching to AVL trees per se,
if there's specific reason for that, I personally do not necessarily
buy this argument either.  It's true the deletion in RB tree is very
complicated, but implementing an AVL tree isn't like implementing a
linear search either, and in either case I'd generally avoid to try to
implement it from the scratch, but port existing proven implementation
(with tests to be very sure about that).  In that sense we already
have a RB tree implementation in BIND 9, and the internal data
structures of domain tree are very similar to the BIND 9
implementation (e.g. the concept of a null node is the same).  So,
personally, I'd rather port the BIND 9's RB tree deletion
implementation than trying to implement the entire AVL tree logic from
the scratch.

---
JINMEI, Tatuya