BIND 10 #1198: Split DatabaseClient::Finder::find
BIND 10 Development
do-not-reply at isc.org
Fri Dec 2 05:04:32 UTC 2011
#1198: Split DatabaseClient::Finder::find
-------------------------------------+-------------------------------------
Reporter: | Owner: jinmei
vorner | Status: reviewing
Type: | Milestone:
enhancement | Sprint-20111206
Priority: minor | Resolution:
Component: data | Sensitive: 0
source | Sub-Project: DNS
Keywords: | Estimated Difficulty: 4
Defect Severity: High | Total Hours: 0
Feature Depending on Ticket: |
Datasrc refactoring |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
'''General'''
> > What would really help is to have a general description of the
algorithm somewhere. I found it difficult to follow the logic in some
places - I was reverse engineering the documentation from the code.
> I've now done that and added more detailed comments to the code. At
this point I'm hoping that the reading the comments + code should allow
someone to understand what's happing without too much effort on their
part.
>
> >> (In case we agree on the further refactoring idea) I'd also like to
simplify the for loop of findWildcardMatch() using the same/similar
technique as find(). But this method may not be so unreadable in its
current form, so I didn't touch it.
> > I'll have a look at it.
> With the added comments I think it's now understandable enough without
further decomposition into separate functions.
I agree the added comments help understand what's happening (and in
some specific cases why), and they are very good for that purpose.
But at the same time I'm afraid the massive amount of in-method
comments rather reduces readability at a glance and would make it more
difficult to extend the code later with confidence. For example, if
we see the following organization:
{{{#!c++
if (some_condition) {
// this is one possible case.
return (something);
} else if (some_other_condition) {
// this is another case.
return (other_thing);
} else if (yet_another_condition) {
// this is yet another case.
return (yet_another_thing);
}
// A last resort case
return (last_resort_result);
}}}
and need to extend the "last resort" case, we can immediately be sure
that added code doesn't affect any of the first three cases because
it's clear at a glance that all cases have been done with return
before the flow reaches the "last resort" part.
Now, imagine that each comment provides very detailed explanation of
what is going on each case and why we do that, etc, and consumes the
half of "vertical screen size". Even if the essential code logic is
trivial, we'd need to scroll up and down to check whether it's really
safe to add code at the very end part of the function. I'm afraid
that's happening in the revised code of findWildcardMatch() and the
main find() method.
So, I'd first like to propose one thing: move the detailed explanation
to the head of the method, and keep the body of the method concise.
In order to make what I mean clearer, I experimentally did it for
findWildcardMatch(), and looked into the resulting organization of the
method...then I noticed the code logic is now almost duplicate of that
of find() and we could combine these two cases. So I extended the
experiment further and did it also. This additional refactoring was
not only for reducing duplicate code; we can now adopt the consistent
policy on what to do if we encounter multiple CNAMEs.
After these changes, the main find() has become unbelievably small
(considering its original length...with over 600 lines of code). So,
while it still contains some detailed in-method comments, it seems to
me sufficiently concise, so I'm okay with that. The extracted method,
findOnNameResult(), also contains in-method comments. The amount of
the comments is "borderline" in terms of readability to me, but for
now I didn't touch them.
And, I then noticed that findWildcardMatch() could be simplified more,
eliminating the need for some temporary mutable variables. So in my
experiment I did it also.
This time I'm attaching a diff to implement the idea (rather than
committing it). Please let me know if you also think it improves the
code.
'''reader_message_file.py'''
- removeEmptyLeadingTrailing and processFile should be named
remove_empty_leading_trailing and process_file?
'''findDelegationPoint'''
- About this:
{{{
// TODO: where does it say we can't return wildcard glue?
}}}
The rationale is this part of Section 4.3.3 of RFC1034:
{{{
- When the query is in another zone. That is, delegation cancels
the wildcard defaults.
}}}
'''findWildcardMatch'''
- About this TODO, rfc2672bis-dname-25 Section 3.3 basically
invalidates the case. So I think it's okay simply to say the
behavior of such a case is undefined (it would be a nice idea to
explicitly test it nevertheless).
{{{
// TODO What do we do about DNAME here?
}}}
- Personally, I'd use assert for the "sanity check" for FIND_GLUE_OK
because should this happen it's a pure internal bug within
database.cc (even a broken database table cannot make this happen).
- Why are there two blank lines here?
{{{
result_rrset = dresult.first_ns;
} else if (!hasSubdomains(name.split(i - 1).toText())) {
}}}
'''findNoNameResult'''
- I'm not sure "not exist in the zone" is correct, because a formal
definition of a "zone" could be the tree. I'd say "not in the
database".
{{{#!c++
// If so, return NXRRSET instead of NXDOMAIN (as although the name
does
// not exist in the zone, it does exist in the DNS tree).
}}}
- I'd remove this comment. I believe the detailed comment before this
provides sufficient explanation.
{{{
// pretend something is here as well.
}}}
- I suggest revising the !NO_WILDCARD case as follows:
{{{#!c++
// It's not an empty non-terminal and wildcard matching is not
// disabled, so check for wildcards. If there's some wildcard
match
// (i.e., all results except NXDOMAIN) return it; otherwise fall
// through to the NXDOMAIN case below.
const ZoneFinder::FindResult wresult =
findWildcardMatch(name, type, options, dresult);
if (wresult.code != NXDOMAIN) {
return (FindResult(wresult.code, wresult.rrset));
}
}
}}}
This makes the code flow a bit more complicated, but we can unify the
NXDOMAIN cases, and (more important) make sure it's logged.
'''find()'''
- DNAME applies only to subdomains of the DNAME's owner name.
(Remember the whole chaos of BNAME, etc discussion. One reason for
that is this property of DNAME).
{{{#!c++
// TODO: Why is DNAME ignored?
}}}
Or, if we want to refer to an authentic reference, see Section 2.3 of
rfc2672bis-dname-25:
{{{
Unlike a CNAME RR, a DNAME RR redirects DNS names subordinate to its
owner name; the owner name of a DNAME is not redirected itself.
}}}
- Regarding the note about the "resolver":
{{{#!c++
// We are not searching for a CNAME but nevertheless we have found
one
// at the name we are searching so we return it. (A resolver could
have
// originated the query that caues this result. If so, it will
restart
// the resolution process with the name that is the target of this
// CNAME.)
}}}
I don't think it a matter of this level of lookup (a single search,
rather than how we respond to query from a resolver). That's a
discussion for, e.g., auth::Query::process(). So I suggest removing
this note. Or, if you actually meant the caller of the find()
method by "a resolver", I'd revised it to, e.g.:
{{{#!c++
// We are not searching for a CNAME but nevertheless we have found
one
// at the name we are searching so we return it. (The caller
// may want to continue the lookup by replacing the query name
// to the canonical name with the original RR type).
}}}
- As for this "TODO":
{{{#!c++
// TODO: Check that throwing an exception here is correct.
}}}
In general, throwing an exception from find() is okay. The basic
design assumption is that while we generally expect the underlying
data source is sane in terms of the DNS protocol (e.g., there should
normally be no duplicate CNAMEs, and tools like b10-loadzone is
expected to ensure that), it's not 100% guaranteed, and the data
source implementation is allowed to react to such
unexpected-but-possible case with an exception. In the case of
a database backend-ed data source, it's also possible that a
connection to the database fails in the middle of a lookup. Such a
case can also be signaled with an exception. (As part of the API
contract) Applications are expected to be thrown DataSourceError for
these events and to handle it appropriately in their application
context. In the case of b10-auth, it would be to return SERVFAIL to
the client, and it indeed behaves that way:
{{{#!c++
try {
const RRType& qtype = question->getType();
const Name& qname = question->getName();
auth::Query(*memory_client_, qname, qtype, *message).process();
} catch (const Exception& ex) {
LOG_ERROR(auth_logger, AUTH_PROCESS_FAIL).arg(ex.what());
makeErrorMessage(message, buffer, Rcode::SERVFAIL());
return (true);
}
}}}
Note also that this is not the only place we throw from find() and
its helpers. If we worry about the exception in this specific case,
we'll need to revisit many other parts, too (but as explained above we
shouldn't worry about it)..
- A related note: in answering the previous point I noticed the
description of the base class API (ZoneFinder::find())is stale:
{{{#!c++
/// A derived version of this method may involve internal resource
/// allocation, especially for constructing the resulting RRset, and
may
/// throw an exception if it fails.
/// It throws DuplicateRRset exception if there are duplicate rrsets
under
/// the same domain.
/// It should not throw other types of exceptions.
}}}
Most important, the last sentence is now false. I'd suggest
revising this part within this ticket too:
{{{#!c++
/// \exception std::bad_alloc Memory allocation such as for
constructing
/// the resulting RRset fails
/// \exception DataSourceError Derived class specific exception, e.g.
/// when encountering a bad zone configuration or database connection
/// failure. Although these are considered rare, exceptional events,
/// it can happen under relatively usual conditions (unlike memory
/// allocation failure). So, in general, the application is expected
/// to catch this exception, either specifically or as a result of
/// catching a base exception class, and handle it gracefully.
}}}
'''datasrc_messages.mes'''
- DATASRC_DATABASE_FOUND_CNAME: the same comment about the "resolver"
above applies.
- DATASRC_DATABASE_FOUND_NXRRSET_NSEC: the longer description doesn't
seem to be correct:
{{{
DNSSEC information
has been requested, but there is no NSEC record corresponding to the node.
}}}
Actually, in this case NSEC was found and returned.
- Note: I've only checked these two messages, assuming the others were
just reordered.
'''Other points'''
- I made a few trivial cleanups, and directly committed the changes to
the branch.
- About blank lines:
> The idea is to make it easier to read - divide the code into
"paragraphs" that do some particular task. I find that easier to
comprehend than one solid block of code and comments (even with syntax
colouring). I admit it's a bit inconsistent - for example, a two lines
each containing a brace already gives enough white space separating the
"paragraph" above from the one below. They are separated more on visual
effect than any particular policy.
Hmm, I'd like to respect your policy, but I guess we need some
consistent guideline. To me, these lines just seem to make the code
(vertically) longer without improving readability, so I'd tend to
remove them as part of cleanup when I work on code including them.
If and when you work on that part of the code next time, you'll then
notice the need for blank lines and will insert them; and next time
I work on it....so, I'd suggest discussing it project-wide to
have some consistent policy (admittedly the same thing could happen
to other blank lines for readability, but in practice I believe
we share a quite consistent sense of which is readable and which is
redundant in most of other cases).
--
Ticket URL: <http://bind10.isc.org/ticket/1198#comment:19>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list