[bind10-dev] Receptionist experiment

Michal 'vorner' Vaner michal.vaner at nic.cz
Fri Mar 1 07:48:02 UTC 2013


Hello

On Thu, Feb 28, 2013 at 10:24:02AM -0800, JINMEI Tatuya / 神明達哉 wrote:
> - I'd like to keep each auth and resolver workable without
>   receptionist; ignoring the counter-intuitive benchmark results, the
>   additional overhead regarding the receptionist inevitably makes the
>   entire throughput lower, and the introduction of the additional
>   process and IPC makes the system less stable.  Since the hybrid mode
>   of auth and resolver is generally a discouraged operation, I want to
>   ensure others can run without it.

I didn't do any deep changes in auth, I modified one file - the TCP server in
asiodns. I guess we could just write another „receptionist server“ class, and
enable/disable it in addition to normal listening ports, without many changes.
There probably isn't much need to throw the current implementations away.

> - I wonder how it would work for xfrout and DDNS.  auth cannot forward
>   the FD to them any more, so this should be done by the receptionist.
>   This means the receptionist needs to validate/inspect the incoming
>   messages more intensively and also needs to have the ability of
>   forwarding the FD.

I don't think it needs to validate them, but it needs to inspect them little bit
more than single bit of them. But AFAIK, the messages differ by their opcode.
That can be easily masked out in similarly lightweight fashion as the RD bit.

Further, DDNS doesn't need any kind of fd forwarding, so it could act just much
like auth or resolver in that regard. I'd like to unify the protocol used
between auth and DDNS to be the same as between receptionist and whatever
server, so we wouldn't have so many implementations.

The xfrout is slightly more tricky. I see two approaches. One is, we really
teach the receptionist to forward file descriptors.

The other possible approach is, I don't think the protocol between receptionist
and the server must be query-answer oriented. Just that the receptionist sends
messages with some additional information to the server and the server sends
messages to the receptionist to be delivered to clients. If the TCP connection
wasn't closed too soon (eg. we keep it open in the receptionist for some time),
we can simply produce several messages from the xfrout to go to that connection.

> - What should happen if the query has the RD bit but matches an
>   authoritative zone?

In my understanding, it must go to the resolver anyway. I have a zone vorner.cz.
Now, let's say I create a delegation to sub.vorner.cz. A query for
x.y.z.sub.vorner.cz comes. Passing it to the auth server is wrong, because it
would just answer „look at that server over there“. The client wants it to be
resolved completely.

On the other hand, if it is sent to the resolver, the resolver will just produce
an upstream query for vorner.cz during the resolution, it'll send it to the
address of the authoritative nameserver for it (which is, by coincidence, the
same one as where the receptionist is sitting) and that one will not have the RD
set.

> - I've not looked into the code, but how does the receptionist
>   determine the client addresses to return the answer?  Does it keep a
>   state?

Currently, it assumes single client and it remembers a single address. This is,
of course, unusable for real receptionist.

I believe I need to send the source and destination port and address to the
server too, so the server can use it for logging and ACLs. So, the server would
just send the addresses back together with the response. The receptionist
wouldn't need to keep any state.

> - Some of the above seem to suggest the receptionist would have to
>   share a non negligible amount of features of auth, so I wonder we
>   might want to extract some part of request handling of auth into an
>   internal library and share it between them, or even provide the
>   receptionist feature as a hook for auth.

I'd like it to be just a very thin and simple wrapper around network sockets,
not really understanding DNS. I believe we can do that and I'd hate to have to
put some auth logic intu it.

> - I don't know what the double-m version of recvmmsg and sendmmsg do,
>   but if the goal is to exchange multiple memory regions between
>   processes at once, normal recvmsg and sendmsg seem to suffice.

I use ordinary send/recv for communication with auth, I just send/receive a
block of data. The recvmmsg and sendmmsg are used for talking to the clients.
The sendmmsg allows me to send multiple UDP messages (over the same socket, but
with possibly different recipient addresses) by a single call. Similarly,
recvmmsg can be used to receive multiple UDP messages at once.

Obviously, this can be done with multiple calls to sendmsg and recvmsg. I just
wanted to try them out when I had the opportunity, because they looked fancy
O:-).

> - We might want to be more careful about measuring the deviation of RTTs.

What do you mean by that?

> - as for the benchmark, I'd make sure the auth server is really busy
>   utilizing near-100% CPU time, while queryperf runs with a safety
>   margin.  Since queryperf can only use a single core, sometimes that
>   can be the bottleneck, not the measurement target.  (Depending on
>   the actual machine power) that's often the case at a very high query
>   rate like over 100Kqps.  Also, in that sense, I'd rather test a
>   single instance of b10-auth; in this context I don't think the total
>   performance using multi-core is not of the main concern because
>   contention overhead is less likely.

I'll retest the single-auth today (the results were similar, I just didn't save
them). I thought the multi-auth version is more telling, because the
receptionist takes some CPU too, possibly slowing down the auths, so I wanted to
account for that.

Anyway, my all 4 cores were 100% busy during the measurement. Yes, part of that
was taken by queryperf, but if queryperf wouldn't bee keeping up, the auths
would be idle and some cores wouldn't be completely used. To make sure the
queryperf keeps up, I run the auths and receptionist with lower priority. The
queryperf kept around 60% of CPU, so I believe there was a reserve.

With regards

-- 
No one is to look like a sock, understand?
			-- Archchancellor Ridcully

Michal 'vorner' Vaner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <https://lists.isc.org/pipermail/bind10-dev/attachments/20130301/3d12060e/attachment.bin>


More information about the bind10-dev mailing list