[bind10-dev] Performance of Various Receptionist Designs

Tue Jul 13 13:56:40 UTC 2010

On 13 Jul 2010, at 12:50, Shane Kerr wrote:

> All,
> 
> On Mon, 2010-07-12 at 11:49 +0100, Stephen Morris wrote:
>> I've made the first set of measurements with both client and server
>> processes running on the same machine.  The results can be found in
>> the attachment to ticket 245 (http://bind10.isc.org/ticket/245);
>> comments on both methodology and results are invited.
> 
> Interesting!
> 
> Thanks for the code, Stephen.
> 
> I ran the benchmarks on my laptop here (eventually remembering to turn
> off CPU scaling), and the results were different from what you came up
> with. I didn't do fully rigorous tests, but they seem to be more-or-less
> consistent across multiple runs.
> 
> I used packets of 256 and a count of 65536.
> :
> :
> 
> So, what we end up with is something like:
> 
> server       @ about 13 usec/query
> receptionist @ about 35 usec/query
> intermediary @ about 41 usec/query

Well, your results are certainly more consistent than mine; it makes me wonder whether something was running in the background when I did the runs.

> 
> Now, this is on a dual-core machine. We might see different results on a
> quad-core machine; depending on the exact way data is flowing between
> processes it may be possible to keep 3 cores busy (client, receptionist,
> and worker for example). In this case the receptionist may look more
> like the server in terms of delay, although overall system CPU usage
> will be up.

Bear in mind that each of the processes is single-threaded and synchronous, and the arrangement is set up to have minimal overlap between them (i.e. client sends all its packets to the receptionist, which then sends all packets to the worker, which then sends all packets back to the client).  So the advantage of multiple CPUs is small here - what you see is a measure of the additional overhead of the extra processing.  Asynchronous operation would make use of the multiple CPUs and the difference in query times might not be so large.

> 
> Now, in terms of absolute numbers this is the difference between 77k
> operations per second and 29k operations per second on my laptop. This
> is exactly the kind of performance difference that I was expecting, and
> what made me nervous about the receptionist model.
> 
> A different implementation probably would have different absolute
> numbers, but the Boost IPC thingy is a shared-memory based communication
> library that is likely to be reasonably efficient.
> 
> Lets discuss further, but I am leaning away from the receptionist model.

I think that the receptionist-worker model is out, but I can see a use for the intermediary-contractor model in the case of a recursive resolver.  The intermediary receives the queries and holds the cache.  If it can answer the query from cache it does so; if not, it passes it to the contractor along with any information from the cache that might be useful.  The contractor makes the query/queries and returns the result to the intermediary, which updates the cache and returns the response to the caller.

Although there is overhead associated with the intermediary model, that is only incurred when external queries are being made, in which case the response time should be dominated by the time taken to receive responses from authoritative servers. The benefit is that a complex program is broken down into two simpler parts.

Stephen