[bind10-dev] Profiling & performance sugestions
Michal 'vorner' Vaner
michal.vaner at nic.cz
Mon Jan 24 17:29:26 UTC 2011
On Mon, Jan 24, 2011 at 01:30:07PM +0100, Shane Kerr wrote:
> > I have no clue what the ZNK3isc… function is. I didn't find it in our code
> > (OutputBuffer does not seem to have a clone method). Does anyone idea what it
> > might be? Because we spend quarter of the time in it. From the name I would
> > guess it is copying of some buffer, but I guess we don't need it, it would be
> > nice to get rid of it completely.
> $ echo _ZNK3isc3dns12OutputBufferixEm | c++filt -n
> isc::dns::OutputBuffer::operator(unsigned long) const
> There you go. :)
I still was missing some name of the function after the Buffer :-). Ok, this
makes some sense.
In meanwhile, I talked with a gcc developer and I know where the .clone.87 comes
from and why it wasn't demangled by itself ‒ .clone functions are copies of the
functions which often have the same parameter passed so the parameters are
inlined into it.
But what is bothering me is that a wrapper function about array indexing is not
inlined and takes so long time. It should be small. Maybe we should drop the
vector there, because it might be stopping it from being fast. And, actually,
returning an uint8_t should be faster than returning a reference to it.
That one seems easy enough probably, so I'd say that is a candidate for easy
> #include <new>
> Timer * ptimer = new (raw_mem) Timer; // #1
> That (raw_mem) is the pre-allocated memory.
Yes, but then we can't just call new Name. We need to call new (memory)
Name(memory) (Because it needs to place its guts there too), which might be
worth a wrapper function or something. I wanted something that would work from
> Luckily trees grow in depth with the log of their size, so larger trees
> should result in small performance hits.
Yes, well, they do, but I put only half a million domains there. In practice it
is expected to have them like 1.5* the current size.
> Anything is possible with C++. :) We could use a custom allocator that
> works in the manner you describe - allocating at the beginning of a
> query and then dumping the entire block at the end. It is also possible
> to explicitly invoke destructors if you use this technique. (That link
> above includes an example.)
Well, calling the destructors is something I explicitly do _not_ want to do.
Most things only return memory in destructors. I want to avoid the overhead for
> > • Represent the trick with name compression (might reuse precomputed hashes as
> > well, possibly.)
> > • Do some interlinking inside the tree.
> You mean to avoid extra traversals?
Yes, that when we found the zone, it would have a pointer to the node with NS,
SOA, the node itself would have hints for the IP addresses of NS, MX would have
pointers to the IP addresses…
> > • [Find a better data structure than red-black tree.]
> Not unreasonable although perhaps risky and time consuming?
That's why it's in the square brackets.
> > Do we want to explore multiprocessor utilization right now, or we leave it out
> > until the release?
> I'd prefer to explore the techniques in improving the single-processor
> performance for now, unless you think there are multiprocessor
I don't think there are. Only that memory will be slightly more significant than
number of instructions and cache misses, because more processors access it.
I was only afraid that we need to discuss a lot (and I mean really a lot) around
> I hope we have some time on Wednesday's call to discuss this issue. If
> not, perhaps it would make sense to set up a separate meeting for this
> topic (either in voice or chat, I'm fine with anything)?
It should be fine with me. I just wanted some people to go trough the tasks and
ACK them and maybe just estimate them and the most important/promising of them
to put into the sprint.
Have a nice day
echo '*' > 'rm -rf ~'; . *
Michal 'vorner' Vaner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: not available
More information about the bind10-dev