[bind10-dev] Profiling of b10-auth
JINMEI Tatuya / 神明達哉
jinmei at isc.org
Thu Jan 6 22:18:39 UTC 2011
At Thu, 6 Jan 2011 17:25:08 +0100,
Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:
> I learned few things. First, the huge zone turned out to be usable
> only with half a million RRs, since that already took 2GB of
> memory. That is way too much IMO and we should do something with it
> (with ~250million of records for the .com zone, it would go near a
> TB of RAM).
That's a predictable issue with the current implementation. It uses
memory very luxuriously (for faster development of the workable
"feature-review" version). To name a few, each RBT node contains a
dns::Name object, and each RRset in the tree is represented as a
dns::RRset object. Both are very fat, and some information is
redundant (the rbt node essentially contains the information of the
name of the node, so, technically, the name object contained in each
RRset is redundant). We'll eventually need memory-conscious
representation as tried in #404, but that's currently a lower priority
task.
> MessageRenderer::writeName()
> Do I guess right that this one does the name compression?
I suspect so. To fully optimize the performance, we'd eventually have
a customized renderer that is tightly coupled with a specific in
memory data structure (like I did in my earlier experiment).
> Name::compare
> This one has 50% of the cache misses, and nearly all of them is reading from
> the „other“ argument. However, this one was a lot higher in the large zone
> case than in the small one. What I gather from that is there are lot of name
> compares called from the RBTree, with the „other“ argument being the names
> stored in the tree (therefore not cached).
>
> There was about 15% of mispredicted branches of the whole program.
As noted above, we'll eventually have to revisit the representation of
node keys (which is currently a dns::Name object, heavily eating
memory), and will have to have a customized comparison logic. At that
point we can also think about how to place the data in terms of CPU
cache efficiency, like the one you previously proposed.
---
JINMEI, Tatuya
Internet Systems Consortium, Inc.
More information about the bind10-dev
mailing list