[bind10-dev] Performance of Various Receptionist Designs

Thu Aug 5 10:05:09 UTC 2010

On 5 Aug 2010, at 00:51, David W. Hankins wrote:

> I think in the 'Contractor' you have modeled, that the 'Intermediary'
> does not perform a significant amount of work before simply
> transmitting the reply?  So it is really not all that surprising that
> its only effect is to slightly increase RTT; it is just an extra step.

You are right: the intermediary just receives the reply and returns it to the client.

> But I am curious how much the advantage/disadvantage would change if
> both the 'Intermediary' and 'Receptionist' (just for RX packets) were
> to perform some packet decoding/encoding ("DECCO?") service.  For
> example, the 'Receptionist' could discard received packets with a
> bogus encoding, and otherwise pass on the decoded markup.
> 
> I'm not sure if there's a way to model that, but I'm curious.

I could certainly add some processing to the receptionist and intermediary processes although, to be a fair comparison, the same processing would have to be added to the server process as well.

> I'm also very happy to hear that there doesn't seem to be a
> significant penalty in the receptionist model, because I think we
> are going to need/want a "packet sprinkler system" to direct stateful
> operations to specific nodes.

Although the overhead might be small on a per-query basis, the one penalty we will get is to lower the maximum query rate that any one box can handle.  Do we have any requirements as to the performance of BIND10?

If we go to a packet-sprinkler system, I think I should try testing the performance with a larger number of processes, each process using a realistic amount of virtual memory.  I've been looking at lmbench(8), a benchmarking suite, in particular lat_ctx(8) which measures context switching times.  (Additional documentation can be found at http://www.bitmover.com/lmbench) I'm a bit hesitant about placing too much weight on this because when I've played about with it the numbers have been all over the place; however, it does seem to indicate that on my dual-core system, the cost of a context switch rises sharply when there are three or more processes.  It also indicates that the size of the process is significant. 

Perhaps I could extend the receptionist test to multiple workers?  And like the lat_ctx test, have the worker processes allocate an amount of virtual memory and scan through it each time they become active (so trying to simulate the effect of exercising the caches).

Stephen