[bind10-dev] Resolver skeleton, location of logic, and plans/discussionstarter for that

Jelte Jansen jelte at isc.org
Thu Jan 20 17:02:08 UTC 2011

Hash: SHA1

(warning, bit of a long mail, sorry)

Ok, so in branch experiments/resolver I have tidied up my little experiment from
last week a bit, and since people have started working on related code, we might
want to get some progress on this. I'm not sure if we should review, merge, then
refactor, or the other way around.

I had a brief discussion with Stephen yesterday, and we seem to disagree on
where this code should actually be (though I think we both agree on the 'not
where it is now' part, as well as on that the current general layout is
somewhat... misfortunate)

Let me first describe how I read what it does, and why the resolver logic is in
the location it is now. That might provide a good starting point for either a
discussion or a refactor (or probably both).

The resolver binary (main.cc) sets up a Resolver object (from resolver.cc),
which in turn creates DNSServers (tcpdns.cc and udpdns.cc) and a DNSService
(asiolink.cc), as well as a RecursiveQuery object (asiolink.cc).

The servers handle the accepting of queries, the service does the processing, by
passing events on to a lookup callback or an answer callback, both from
resolver.cc again. The RecursiveQuery object is used to start recursive lookups.

The answer callback does some final polishing of the answer and renders it, then
control falls back to the DNSServer in question which sends it back to the client.

The lookup callback calls another function in the Resolver object
(processMessage() in resolver.cc) which does some sanity checks, and either
fails on those or starts a recursive lookup. In the failure case, it'll create
an error dns packet and let control fall back to the DNSServer object, similar
to the answer handler. In the case a lookup is started, the existing
RecursiveQuery object (one fixed object, this holds the actual settings for
queries) creates a RunningQuery object, which sends out the actual query, and
gets an event if there is either an answer or a timeout.

Now in the case of the forwarder mode, this answer is what it wants, and it'll
copy that, send a resume() to the DNSServer (which makes it continue into
calling the answer callback), and then deletes itself. In the case of a timeout,
it'll resend, until it runs out of retries, in which case it'll also tell the
DNSServer to resume, but to do nothing further regarding the query in progress
(and then it deletes itself).

The above describes how I understand the code to work.

So here's what I added/changed:

Since RunningQuery is the object doing the actual work for a lookup (i.e.
deciding where to send it, sending it, handling events regarding an answer or a
timeout), this seemed to me the most logical place to put the logic for now.

I've added a handleRecursiveAnswer() method, which operator() calls if it is in
recursive mode (i.e. upstream_ is empty). That method looks at the answer it got
back, and either causes a new send() (if it was a delegation; it updates the
list of servers to try, and then calls send again), or decides it is done.

Depending on the return value of that function, operator() will call resume() on
the server (if it's done), or simply do nothing until the next event comes in
(which should be cause by the new send() in handleRecursiveAnswer(), and can be
either a timeout or a new response).

I have added a MessagePtr answer_message to the mix, where the answer is to be
built. This ptr is created by the DNSServer (udpdns.cc), and propagated into
RunningQuery, that could for instance add CNAME records while it it looking for
the final answer. The buffer, though less useful for this specific purpose, is
still used in sending out queries and parsing responses (and left in at a few
places where it may not be essential anymore).

Soooo. That's the short of what it does right now in experiments/resolver.

There are a few problems, one of which is that is seems somewhat... complicated.

An obvious conclusion is that logic and control is kind of flying around in some
very different places right now. Part of the handling is done in resolver.cc,
part in tcp/udpdns.cc and part in asiolink.cc.

A second one is that RunningQuery does the actual recursive handling, which
seems to me the most logical place (right now, since that's the part getting and
creating events), but not everyone seems to agree :)

I do think we should get it out of asiolink though. And rename RecursiveQuery to
RecursiveQueryStarter (or Factory or something), or perhaps even remove it
completely. It really doesn't do much besides storing data and starting
RunningQueries. (another option, if possible would be to make RecursiveQuery do
the actual logic, but then RunningQuery would be reduced to send+timeout,
without even retries, in which case i'd probably argue for removing that one :))

We should also move all the handling to a common place, whether or not that is
in one or more files, and certainly out of asiolink. The 'hidden' tcpdns.h and
udpdns.h seem strange to me as well. I want the full resolver process to be
callable as a library call so some things that are now in resolver.cc may need
to move to that common place as well (and then resolver.cc should call that).

There's probably more, but this mail has been long enough for now.

I shall go hide under my desk until the chairs have stopped flying.


Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


More information about the bind10-dev mailing list