Good BIND design. was RE: PLEASE READ: BIND 8.2.2 problem

Fri Apr 28 20:10:57 UTC 2000

If you intend to respond to this post, I would politely ask you to read the
entire post carefully, then think about it for a few moments, before
responding.

Mr. Reid and the rest admittedly have more experience than I do, which I
respect a great deal. But I think what Mr. Keves suggests is not a bad idea,
and I think the reaction was a bit too strong.

I regularly monitor the logs of my servers and errors still get through.
Even the most vigilant of people makes mistakes, it just happens. The
latency from the time I reload a zone to the time I see my error scroll by
to the time I correct the problem can be minutes or more. And that is time
that bad data (meaning data that is out of date or inaccurate because it is
the previous zone data, or even no data at all) is being propagated (granted
it is with a non-authoritative flag in this case, but the point stands even
so). So anything that can mitigate the damage done by flawed zones is a good
thing.

It is not unreasonable to have, as Mr. Keves suggests, a settable option
that would indicate how strictly the zone data should be interpreted. I
think an improvement on his idea would be allowing the point at which the
zone is completely rejected to be settable. For example, reject-on-warning
3; which would reject on 3 recoverable errors or reject-on-warning 0; which
would allow as many errors as could be recovered. Those who want very strict
checking would simply not use the option, much like how check-names behaves.

It may be that there are more important things to implement than this kind
of sanity checking. If I had to choose I would rather have the "views"
feature implemented than this one, and I am sure there are many other
features/fixes that need to be done that would take precedence. But the
*idea* still has merit. Anything that makes the system more robust and our
jobs a little less stressful is A Good Thing.

Regardless of the likelihood of it being implemented, here are my personal
thoughts on how it should work. I would love to hear useful comments and
good technical criticism. Flames can be directed to /dev/null.

The checking should be similar to compiler warnings and errors. Log files
would show something like:

28-Apr-2000 08:57:43.072 owner name "#spaghetti.example.com" IN (primary) is
invalid - rejecting record
28-Apr-2000 08:57:43.072 example.com.zone:23: owner name warning
28-Apr-2000 08:57:43.072 example.com.zone:23: Database warning near (A)
28-Apr-2000 08:57:43.185 master zone "example.com" (IN) has errors (serial
2000042800). Continuing with remaining records.

Which shows how a simple commenting mistake (which I would be surprised if
any of us have *not* made, given the different commenting requirements in
the BIND configuration files) can be handled gracefully and still provide
authoritative name services for a zone while we go back and fix the mistake.

It is already done for other types of errors as Mr. Keves has already
pointed out, and it is not unreasonable to request that it be done on easily
recoverable mistakes as well. Of course a zone would be rejected outright if
the SOA or primary NS records were invalid. A zone has certain requirements
to function at all, and these would be handled in the same manner as they
are currently. But mistakes like my comment error above, or Mr. Keves CNAME
problem can be handled much more elegantly than by rejecting an entire zone
outright.

Where do you draw the line? Well, wherever it makes sense. The questions you
ask ("...stupid refresh/expire intervals? Or broken/missing SIG, NXT and KEY
records?...") had to be answered before. Before the question was "Is this
record valid?" If the answer was "No" the entire zone was rejected. Mr.
Keves is simply suggesting that we ask another question after the first,
"Can I continue without this record?" Many times the answer will be "No" as
Mr. Reid points out. But there exists a class of errors (which are very
common judging from many of the messages seen on the list) where the answer
is "Yes". It has been done before, it is done now for different classes of
"errors", it can be done for this class of errors.

Yes it is true that it is not hard to make good DNS data, but as stated
before, humans make mistakes (computers do too, but that is usually easier
to fix). Do we currently reject zones for not properly setting the $TTL? No.
We gracefully recover by making a sane assumption about how the directive
should be set. Is it better that we reject them completely for this kind of
error? Maybe, but probably not.

Do we currently allow stupid values for expires and refreshes? Well, yeah we
do, kind of (depending on your current value for "stupid" and depending on
what you are trying to accomplish).

Do we reject the zone for underscores in hostnames? Well, that is up to the
administrator. We trust the administrator to make the call if he wants to
break standards or not, because sometimes it makes good business sense to do
so. Here again, we make it possible for the server to handle human error,
and give the humans time to clean up their mistakes while still allowing the
services we provide to function albeit in part.

Going through "The Options Statement" section of the man page and just
counting the number of settable options that are there simply to support
"broken", out of date, or otherwise "bad" things, I see auth-nxdomain,
fake-iquery, has-old-clients, multiple-cnames, rfc2308-type1, use-id-pool,
check-names, treat-cr-as-space, and transfer-format.

Why do we "...kludge the name server to tolerate practices that are illegal
or downright broken..." in these instances? Because we try to help ease the
burden on humans when we can, and mitigate the damage when they make
mistakes or even do things The Wrong Way. We do it because it is The Right
Thing.

Good design (be it software hardware or anything else) makes life better for
everyone. BIND is an excellent example of software that gently helps people
fix mistakes.

If it is hard to implement, why not put this on the feature request list for
v10.1.2p4? If it is easy, do it sooner.

But don't reject the idea completely.

Thanks for listening,
	Adam Augustine
	Global Network Manager
	Morinda, Inc.

Jim Reid [mailto:jim at rfc1035.com] wrote:

>>>>> "Brian" == Brian Keves <- NCS UAI Contractor <keves at synopsys.com>>
writes:

    >> Well there's a message in the log saying that the zone has been
    >> rejected so the name server's hasn't exactly "quietly become
    >> non-authoritative".

    Brian> It is if no one looks at it. Strictly an internal problem I
    Brian> know, we just don't have resources to do this stuff
    Brian> manually.

Well if you can't/won't monitor your DNS logs you *really* have
problems. How many other error and trouble reports are you ignoring?

    Brian> Will need to put in something to monitor this and
    Brian> page someone.

swatch is your friend. You can even make the name server log the
severity of each message it generates so that swatch arranges for the
serious error reports to get acted on immediately.

    Brian> There are circumstances in large companies with large
    Brian> domains that we don't always control. Illegal data is one
    Brian> of those.

Fine. So leave those zones to wallow in their own self-inflicted
cesspit. That's the beauty of delegation. :-)

    Brian> Why not something like reject-on-errors no;?

Where do you draw the line? Would the above proposal allow zones that
had no SOA or NS records? Or stupid refresh/expire intervals? Or
broken/missing SIG, NXT and KEY records? Or names that belong in
another zone? Or syntax errors? Or how about missing RR data?

The bottom line is that it is not difficult to generate correct DNS
data. [And check the name server logs.] So why not do things right
instead of kludging the name server to tolerate practices that are
either illegal or downright broken?