[bind10-dev] experimental results on the scalable in-memory approaches

Wed Jun 27 07:26:40 UTC 2012

I've been doing some experiments with prototype implementations of
the "scalable" (more memory-efficient) version of in-memory data
source as described in http://bind10.isc.org/wiki/InMemoryZoneDesign
with some benchmarking.

Overall, the result seems to be promising.  Regarding specifically
which approach we should take (at least initially), we need to
consider tradeoff between implementation cost, memory efficiency and
response performance.

I've developed 3 variants of prototype implementations:

- Codename "pine": this basically implements everything described in
  the wiki page
- Codename "bamboo": this is similar to pine, but doesn't optimize
  name compression
- Codename "plum": this doesn't optimize additional section processing
  or name compression.  It doesn't store names in RDATA as pointers.

All versions use offset pointers internally, so they are
"shared-memory ready".

These are available in the public repo in brach named
jinmei-inmem-ng3/ng2/ng respectively.

I did some quick benchmarks:
- Memory footprint after loading a pretty large zone containing about
  8.5 million records (an old snapshot of .net zone)
- Response performance measured by queryperf-like tool.  Using a
  snapshot of real root zone, both with an (old) real query sample and
  with artificial "www.example.com/A" query (which requires a lot of
  additional records and a lot of name compression).

The result is summarized in graphs available at:
http://bind10.isc.org/~jinmei/memory.png
http://bind10.isc.org/~jinmei/qps.png

In some sense, it's expected: bamboo and pine are more memory
efficient than plum; bamboo is faster than plum and pine is faster
than bamboo.

Comparing these to BIND 9 and the current version of BIND 10, they are
more-or-less BIND 9-equivalent (or in some cases much better), and in
terms of response performance generally compatible with the current
BIND 10 implementation (and in some cases a bit better).  Of course,
in terms of memory foot print any of new versions are very much better
than the current (it would require several GBs of memory to load this
zone, which would make my laptop so unstable so I didn't bother to
major it this time).

I actually expected "pine" would have much better response performance
than the current version, and in that sense it was a bit
disappointing.  But I guess with DNSSEC (which wasn't supported in
this experimental version) these versions will run better than in this
experiment.  We can also think other performance tuning as we really
implement it (although we'll also notice other necessary overhead that
was just skipped in the prototype).

Now, which approach should we take?  Maybe the biggest question is
whether we initially only try a "plum" equivalent or try to achieve
better results from the beginning.  "plum" is certainly easiest to
implement, and memory footprint should be acceptable, which not
ideal.  I thought it would run much worse regarding response
performance, but it seemed surprising better than i expected
(internally it should be closer to BIND 9 without "acache", but the
result showed plum was much better than that).  So, if we can accept
the degraded response performance that may be our first choice.  On
the other hand, if we eventually want to migrate to even better
versions, implementing the slower one might just a redundant detour
(implementing bamboo/pine shouldn't be a several month project, while
it won't be a single-sprint feature either).  We need to discuss this.

---
JINMEI, Tatuya