administering 1,000 zone files

Fri Dec 31 12:07:48 UTC 2004

>>>>> "Michael" == Michael van Elst <mlelstv at serpens.de> writes:

    Michael> The named.conf on the slaves is split into a general
    Michael> section and many include files.
    >> This is not a good idea. Having "many include files" is a
    >> recipe for needless complexity and brittle DNS administration.

    Michael> Why ?  Do you thing anyone with a mind would deal with
    Michael> 200000 zones manually ?

I didn't say anything about whether the files were maintained by hand
or not. That's beside the point. Splitting a server's configuration
across lots of files does create needless complexity and makes things
much more brittle. When the number of components in a system increase,
the probability of a breakage increases. It also makes subtle and
unexpected interactions between the components mucu more likely. This
is simple common sense.

    Michael> There is hardly any difference between reading one single
    Michael> large file or a few dozen small files. Not in complexity
    Michael> and not in time consumed. After all the server reads
    Michael> hundreds of thousands of zone files, why should it have
    Michael> problems with a few configuration files ?

You miss the point completely. It's the fact that the server's
configuration is split across many files that's the problem.

    Michael> If one file gets lost or corrupted then the same can
    Michael> happen with the one and single configuration file,
    Michael> destroying everything instead of just a part.

IIUC entropy increases exponentially as the number of components in a
system increases linearly. It's at best an n-squared problem.

    Michael> However, it is unlikely that a file gets lost or
    Michael> corrupted.  There are backups and the files are
    Michael> automatically checksummed.

This is getting more baroque by the minute. I pity your successor.

    >> -- and check the referenced zone files are current. So you
    >> needlessly make more work for the name server and yourself
    >> ploughing through a named.conf file split across say 20 1 Mbyte
    >> include files than there would be parsing a single named.conf
    >> file of 20 Mbytes.

    Michael> But the program dealing with the files had the advantage
    Michael> that instead of ploughing through a few ten MBytes it
    Michael> only had to work on a few hundred kilobytes, making it
    Michael> 10-20 times as fast.

That's a curious design choice. IMO it's the wrong trade-off. It
should be more important to have a simple, stable DNS configuration
than a faster way of generating fragments of named.conf. Which doesn't
appear to be "faster" BTW: it only generates around 1% of named.conf
in 5-10% of the time to generate the whole file. What's the point of
generating snippets of named.conf "faster" when that makes it harder
to debug the server's configuration and troubleshoot problems?

    Michael> The slave servers (running BIND8 at that time!) use
    Michael> configuration and zone files stored on a local disk. Even
    Michael> BIND9 cannot do much better (unless you add the DLZ
    Michael> patches), but why should I rely on this, really huge,
    Michael> complexity ? 

BIND-DLZ isn't needed to solve this. Neither is BIND for that matter.

    >>  Yuk! What if that small program fails?

    Michael> What if a large big program fails ? I don't see a point
    Michael> here.  If any program fails that modifies the nameserver
    Michael> configuration then the result can be disastrous.

You have many more components in your setup. Some of these appear to
be autonomous, independent agents on the slave servers. This means
there's a much higher probability of a failure and subtle interactions
between these components. And let's not forget the complicated
synchronisation issues that have been introduced.

    Michael> So how do you suggest to modify the configuration files?
    Michael> Manually?  Do you suggest to run a database replica on
    Michael> each slave server and believe that this will never fail,
    Michael> but insist that the small program does?

Generate the 2 config file centrally: one for the stealth master and
one for the slaves. I don't care how you do this. Though with 200K
zones, some tooling is advisable. Once they have been checked, copy
the files to their destination. Then activate them during the
designated service window(s).

    >> Wouldn't it be better to generate the configuration data in one
    >> place?

    Michael> The configuration data can be easily regenerated (let's
    Michael> say if the slave servers had both system disks fail at
    Michael> the same time).  In fact, this is absolutely
    Michael> necessary. How would you (== some automated system) know
    Michael> that a slave server needs to be configured when there is
    Michael> no data available ?

Name servers should do what they are told. IMO they have no business
fetching or generating their configurations. This should be a push
operation, not a pull.

    Michael> BIND doesn't store the zones in a single file
    Michael> either, why do you insist on storing the configuration in
    Michael> a single file?

Looking after 1 file is a whole lot easier than looking after 10 or
100. BTW, all zones are unique (by definition) so they should all have
their own zone files to define their content.

    Michael> Be assured that the configurations do drift, for some
    Michael> time, because the slave servers aren't always
    Michael> reachable. Of course this hardly matters, DNS zone data
    Michael> also "drifts" because zone transfers do not happen
    Michael> instantly or servers are not reachable. The only thing
    Michael> that matters is that the differences are automatically
    Michael> corrected.

Eventually. Maybe. There's also a big difference between zone
coherency and server configuration coherency. Please don't confuse
these. Or assume that because it's tolerable for zone contents to be
inconsistent between servers for the zone refresh interval, that it's
OK for a set of servers that should have the same configuration to be
inconsistent. 

    Michael> The on-disk configuration is always more up to
    Michael> date. Whenever you modify it, the nameserver doesn't know
    Michael> it until reloaded. So where is the problem ?

You're making this interval longer as a deliberate design decision.

    >> It can also mean lame delegations because the master server
    >> (say) knows about a newly-added zone while one of that zone's
    >> slaves knows nothing about it.

    Michael> Lame delegations happen when a nameserver is not
    Michael> configured for a zone but which is delegated to it. I
    Michael> don't understand how this is related.

Suppose this setup of yours provides DNS service for the newly created
serpens.de zone. It's served by ns[123].serpens.de. The master server
loads this zone with its 3 NS records. The slave servers ns2 and ns3
are still waiting for some program of yours to wake up and give their
name servers a kick. Until they re-read their configuration files and
load the zone, they don't know anything about serpens.de. They're lame
for the zone. If this program on the slaves dies or fails, they could
be lame for a long time.