[bind10-dev] Receptionist experiment
Michal 'vorner' Vaner
michal.vaner at nic.cz
Mon Mar 4 08:28:12 UTC 2013
Hello
On Fri, Mar 01, 2013 at 09:25:30AM -0800, JINMEI Tatuya / 神明達哉 wrote:
> At Fri, 1 Mar 2013 08:48:02 +0100,
> Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:
>
> > > - I'd like to keep each auth and resolver workable without
> > > receptionist; ignoring the counter-intuitive benchmark results, the
> > > additional overhead regarding the receptionist inevitably makes the
> > > entire throughput lower, and the introduction of the additional
> > > process and IPC makes the system less stable. Since the hybrid mode
> > > of auth and resolver is generally a discouraged operation, I want to
> > > ensure others can run without it.
> >
> > I didn't do any deep changes in auth, I modified one file - the TCP server in
> > asiodns. I guess we could just write another „receptionist server“ class, and
> > enable/disable it in addition to normal listening ports, without many changes.
>
> So auth (or resolver) can still work without the receptionist?
Yes. My idea is to have 3 kinds of asiodns server:
• UDP server (both sync and async version)
• TCP server
• Receptionist receiver (both sync and async)
> > Further, DDNS doesn't need any kind of fd forwarding,
>
> Why not? We in fact use FD forwarding for it currently. Also
> remember DDNS can use either UDP or TCP, so at least if you think
> xfrout is tricky, the DDNS case should be equally tricky.
Well, we need to solve TCP anyway, because even auth and resolver needs to
handle queries on TCP. And if we can accept a query over TCP on receptionist,
forward to auth, and send the answer back over TCP, then the same can work with
DDNS.
The only difference with XfrOut is that for one query, it sends back multiple
messages (so it's not simple 1 query, 1 answer). But, I expect it could work
like this:
• On UDP, we have an (unconected) socket. If query comes, it is bundled with
address and sent over to whatever handling server.
• When the handling is done, it sends the answer back to receptionist, bundled
with the address.
• The receptionist uses the address to send the answer back.
• On TCP, a connection is made. Receptionist assigns some unique ID to it
(let's say a timestamp+FD+sequence number). An address and this ID is bundled
when sending it to the handler.
• When the answer is sent from the handler, the ID is used to find the correct
FD and check it is still the same connection (so we don't close it and accept
to the same FD, sending the answer to the wrong location).
• The TCP connection is closed if the client closes, or it is inactive for some
30s (for some value of 30).
The XfrOut could work with this as well, it would just send multiple packets
with the same connection ID. Now, there are some issues like what to do when the
client doesn't read fast enough, so we need to queue them or not read from that
given handler (which is probably problem only with XfrOut), but nothing that
would make it not work.
> Also, if the receptionist doesn't do TSIG validation, it's changing
> the validation assumption: we currently assume auth ensures the opcode
> is valid in terms of TSIG when it's TSIG-signed. The receptionist
> would break that assumption. That may not necessarily be incorrect as
> long as the end recipient validates it, but we should recognize we are
> going to change it, and consider its implication.
Yes, well, I would not like the receptionist to actually parse the packet at
all. I'd like it just to look at specific bytes (the opcode, RD bit, …) so it is
fast.
Therefore all this validation would be done in the handling server.
We could still do it in a very similar way to what we do now ‒ send the DDNS
query to auth and forward it from there (and we may start by doing that). But I
would think that would be a pity, because that wouldn't allow us to run DDNS &
XfrOut without the auth server, which I consider to be kind of a design bug of
the current version.
> Things like these seem to suggest the receptionist won't be able to be
> as super simple as we might hope. I'm not necessarily oppose to the
> approach yet, but if the "simplicity" is the reason you prefer it, we
> should be careful - the actual production-ready version often cannot
> be really simple because of many such details.
Actually, I believe the only way to make it flexible enough (if we want to use
it for other things like views as well, and possibly introducing future modules
for future DNS extensions), it needs to be just a very thing wrapper around recv
and send, all the work being done in the handlers (including validation). I
believe this can be done and we need to solve only communication-related
problems in the receptionist (what to do when the client on TCP sends queries,
but doesn't read, various timeouts, etc, …), but I'd like it to be completely
DNS agnostic. Actually, I believe it could be done to handle other protocols as
well without much change.
> The worst case RTT can be much larger with the receptionist due to the
> additional buffering. And there can be many more queries that have
> larger RTTs.
Ah, yes. This is mostly the problem with the current experimental version. What
I assumed was we send the buffer to auth/whatever whenever any of these happen:
• There are at least 100 queries.
• The oldest query in the buffer is 2ms old (or whatever value we tune it for).
We add at most the 2ms to the RTT, which shouldn't be a problem (or 1, half,
whatever is needed). When the server is busy, we send them sooner, because the
buffers are full and we get a throughput improvement. If the server is not busy,
we just send after the 2ms, even if the buffer is not full, which has worse
throughput, but that doesn't matter, because we don't need it at the moment.
We also could make it switch buffering off when there are only few queries (and
they would not fill the buffers, so adding RTTs).
With regards
--
Next sleep is scheduled after 1k lines of code
Michal 'vorner' Vaner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <https://lists.isc.org/pipermail/bind10-dev/attachments/20130304/239dd506/attachment.bin>
More information about the bind10-dev
mailing list