[bind10-dev] Profiling of b10-auth
Michal 'vorner' Vaner
michal.vaner at nic.cz
Thu Jan 6 16:25:08 UTC 2011
I was curious about it for some time already and I had only little time today to
work, so starting a new task didn't make much sense, so I profiled the 10-auth
I used oprofile (profile that uses HW counters/events to statistically gather
samples ‒ list of events are specified and when it happens given number of
times, the culprit instruction is reported and marked down ‒ so events that
happen often have a high probability of getting reported many times) and asked
it to provide me with statistics for all instruction fetches, all instruction
executes, cache misses, mispredicted branches and locks.
I run three tests. One was queries with a small zone (under 10 records), one was
loading a huge zone into the b10-auth and one was queries against the huge zone.
All was with the in-memory data source. I didn't examine the load one yet.
I learned few things. First, the huge zone turned out to be usable only with
half a million RRs, since that already took 2GB of memory. That is way too much
IMO and we should do something with it (with ~250million of records for the .com
zone, it would go near a TB of RAM).
Second, I let oprofile create some reports for me, you can have a look at them
I found few „hot“ functions (in the number of instructions, that doesn't
necessarily mean they are slow, though, it depends on what the instructions do,
the difference of how an instruction takes might be like 100*):
This one had quite a few misspredicted branches.
Do I guess right that this one does the name compression?
This one is odd somehow. It has a lot of instructions, but nearly no cache
misses or misspredicts and it seems to be nicely inlinable.
This one has 50% of the cache misses, and nearly all of them is reading from
the „other“ argument. However, this one was a lot higher in the large zone
case than in the small one. What I gather from that is there are lot of name
compares called from the RBTree, with the „other“ argument being the names
stored in the tree (therefore not cached).
There was about 15% of mispredicted branches of the whole program.
Quite few cache misses.
So, there are two major conclusions from that. One is we need to identify what's
eating so much memory and do something with it (I have an idea, but I'll not put
it in this email). Another is, Name::compare is probably current biggest
bottleneck (while it doesn't have the biggest number of instructions executed,
it has the biggest number of cache misses, so it waits for memory a lot and has
many misspredicts), so we either need to speed up this one or think of a way how
not to call it so often.
I will think about some ideas and when I clear up my mind, I'll write another
mail considering what to do with it.
Have a nice day
This message is encrypted by double rot-13
Michal 'vorner' Vaner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: not available
More information about the bind10-dev