[bind10-dev] Using AVL tree instead of red-black tree in DomainTree

Wed Mar 6 09:44:25 UTC 2013

Hi Jinmei

Thank you for replying.

On Tue, Mar 05, 2013 at 11:12:22PM -0800, JINMEI Tatuya / 神明達哉 wrote:
> At Sun, 3 Mar 2013 11:57:33 +0530,
> Mukund Sivaraman <muks at isc.org> wrote:
> 
> > * There doesn't seem to be a lot of difference between red-black tree
> >   and AVL implementations in query performance. A guess is that other
> >   factors count more than the height of the tree during query
> >   processing.
> > 
> > * Red-black trees performed slightly better than AVL trees when data was
> >   given in random order. I am not sure if this was due to the actual
> >   data itself being very suitable for a red-black tree, but this seems
> >   strange. Can someone else run such tests?
> 
> I don't have any theory about the second point, but I wouldn't be
> surprised that the difference between these two cases are generally
> marginal.  After all both algorithms have the same order of lookup
> complexity; the difference is just the constant factor for the worst
> (or best) cases.  So, in average it's reasonable both basically run
> equally fast.

Though there is no difference in time complexity (maximum O(log N)), AVL
trees are better balanced and should be faster to lookup in general when
compared to red-black trees.  Unless some other code influences query
performance, AVL implementation should show better performance. This is
the case in the branch, so the results are suspicious (could it be a bug
in the AVL implementation?)

> But in any event, if we are still interested in the difference on the
> lookup performance between these two algorithms, we should first
> perform micro benchmarks.  As pointed out in the first bullet the
> entire query processing includes other expensive operations such as
> name compression, so comparison on the whole qps will make the results
> obscure for the main focus in this context.

The difference between AVL and RB tree implementations may not be
measurable, if other query code obscure performance as you say.

> > * This is just a query performance test. We'd also need to measure
> >   insert performance, though it's not as important as query
> >   performance. AVL trees are expected to be slower than red-black trees
> >   in this case, but it may not make a major difference in our case.
> 
> As usual actual measurement will be necessary, but in my gut feeling
> full load from a master file or a DB-based data source won't show a
> big difference because parsing a master file or DB access would be
> even slower.

Nod. Again, there's no difference in insert complexity between AVL and
RB trees, but RB insertions are faster in general due to the number of
operations performed compared to AVL. We will know how this influences
load performance when we measure it.

> > * These are performance differences, but the major advantage AVL tree
> >   brings us is in code.
> 
> Although I'm not necessarily opposed to switching to AVL trees per se,
> if there's specific reason for that, I personally do not necessarily
> buy this argument either.  It's true the deletion in RB tree is very
> complicated, but implementing an AVL tree isn't like implementing a
> linear search either, and in either case I'd generally avoid to try to
> implement it from the scratch, but port existing proven implementation
> (with tests to be very sure about that).  In that sense we already
> have a RB tree implementation in BIND 9, and the internal data
> structures of domain tree are very similar to the BIND 9
> implementation (e.g. the concept of a null node is the same).  So,
> personally, I'd rather port the BIND 9's RB tree deletion
> implementation than trying to implement the entire AVL tree logic from
> the scratch.

Here, I disagree. I have a reasonable argument for it.

If we were to use a black-box, such as a utility library which
implements a BST, we don't have to maintain that code. Somebody else is
responsible, and we blindly accept it. An example is our use of C++
standard libaries, and Boost.

But we have to understand and maintain any code in the BIND 10 tree
itself. Even if we migrate the BIND 9 code to the BIND 10 tree as-is, we
are still responsible for this code. To fix any bugs or even performance
issues, we will need to be able to work with it reasonably quickly.

The fact that the RB tree implementation has been tested in the BIND 9
codebase is a strong point in support of it. But that itself is just one
point among many that we should consider.

I spent some time cleaning up the insert code in the current RB tree
implementation last weekend. A few months back, I rewrote the code to
delete all nodes. Both implementations (the latter especially) were poor
code for a fresh developer to randomly pick it up and understand (even
if they have experience implementing RB trees).

It's not that all of us don't know what RB trees are, or how to
implement them. I created this AVL branch after implementing RB node
deletion last weekend and realizing how it would be cumbersome to
_maintain_ over time.

It has to be straightforward, and even the best C/C++ RB tree deletion
code is not.  There are many corner cases to consider. Imagine that we
port this code to our tree today. A year from now, assume a developer is
debugging a problem in this codepath. They'll have to analyze this code
after learning all the corner cases again (not simple) and working
through complicated code. Why pick this route, when a better choice is
available?

Do we want maintainable code in BIND 10? If we do, we should consider
whether another BST may offer a simpler solution.

		Mukund
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/bind10-dev/attachments/20130306/a27c8ba4/attachment.bin>