[bind10-dev] new in-memory zone data design

Thu Jun 21 15:39:20 UTC 2012

JINMEI,

On Tuesday, 2012-06-19 18:31:57 -0700,
JINMEI Tatuya / 神明達哉 <jinmei at isc.org> wrote:
> I've described a proposed design of the memory-efficient version of
> in-memory zone data in the following wiki page:
> http://bind10.isc.org/wiki/InMemoryZoneDesign
> (there's a link from http://bind10.isc.org/wiki/SystemComponents)
> 
> It's long, but there's an overview summary section at the top of the
> page.  I also drew several diagrams to help understand it.
> 
> For an executive summary, it will be:
> - very much more memory-efficient than the current version.  at least
>   BIND 9-equivalent, and probably even more.
> - shared-memory friendly
> - (hopefully) even more efficient in lookups and rendering
> 
> In some sense, I guess the proposal is straightforward.  But I also
> guess there are many open or controversial points to discuss before we
> start development tasks.

First, let me say that this is awesome. :)

I tend to think that you are correct and that it's not *that*
complicated. Certainly not more complicated than NSD, or (IMHO) some of
the update structures/algorithms used in BIND 9. Nobody ever said that
straightforward implementations would be efficient, or conversely that
you could make efficient implementations using only straightforward
techniques. :(

Okay, onwards to specific issues:

R/B Tree
----
Like Michal, I also wonder about the effectiveness of using red/black
trees. My understanding was that R/B trees are popular because they are
comprehensible by normal coders, rather than any other reason. :) A bit
of Wikipedia reading showed me the lookup/update trade-off already
discussed on this thread. I guess I'm happy if we start with R/B and
leave open the possibility of trying AVL at some point.

RdataEncode
----
This is a minor variant of Michal's concerns about complication. The
RdataEncodeSpec is a bit hard to read to me. I don't have any real
proposals to improve it, but it seems a bit unfriendly to coders:

GENERIC_NS_ENCODE_SPEC = {1, 1, {{NAME, COMPRESSIBLE | ADDITIONAL, 0}}};

I miss Python's named parameters:

GENERIC_NS_ENCODE_SPEC = { num_fields=1, num_names=1,
    {{type=NAME, attributes=COMPRESSIBLE | ADDITIONAL, len=0}}};

:)

RdataSet optimizations?
----
I find the idea of using the "many_rrs" interesting. In fact, I wonder
if we can use uint8_t rather than uint16_t, with the observation that
very few real-world RRSET have more than 63 RR (I've never seen one).

Another possible optimization is to take advantage of the fact that
almost every zone has only a very few number of TTLs that it uses. I
was thinking we could perhaps introduce the concept of a "zone TTL",
possibly stealing another bit from the num_rrs field:

struct RdataSet {
    offset_ptr<RdataSet> next; // for linked list
    uint16_t type; // RR type
    uint8_t is_signed : 1; // If this set contains RRSIGs
    uint8_t custom_ttl : 1; // TTL different from zone default 
    uint8_t many_rrs : 1;  // If the number RRs >= 2^5 = 32
    uint8_t num_rrs  : 5; // # of RRs, up to 31
};
// Data that follows:
// [uint32_t // TTL, if custom_ttl == 1]
// [uint16_t // # of RRs, if many_rrs == 1]

Again, I don't think many RRSET have more than 31 entries. The downside
is that you can't tell the TTL just by looking at the RdataSet. But,
since the actual RR consist of pointers to a tree, you can't really
know this anyway.

The upside is that for large zones with mostly the same TTL, you save 4
bytes per RdataSet. The downsides are that you have to figure out what
that TTL is, and also for zones with multiple TTL you don't get any
benefit. (Possibly you could try to optimize for zones with a small
number of TTL, which is probably most zones, but that probably goes too
far on the complexity level.)

Optimizing this way will probably only save a few percentage of memory
footprint though, so perhaps not worth the effort?

Wildcards, DNAME
----
Ah, wildcards. I'm quite happy if zones using wildcards are less
efficient than "normal" zones. I don't think we need to kill ourselves
to optimize for a feature not used *so* commonly, and not at all for
performance-critical zones, AFAIK.

DNAME... well, okay. DNAME in principle can be awesomely useful, for
people who need it. :(

--
Shane