possible interoperability issue with Win2K3 name-server

Tue Sep 19 03:42:52 UTC 2006

One word of advice when dealing with Microsoft: try to reproduce the 
problem in an environment that is as Microsoft-y as possible. If you go 
to them with this problem and say that the primary master for the zone 
data in question is BIND, then as moths are drawn to a flame, they'll 
immediately latch onto that as the source of the problem, i.e. their 
reasoning being that if you ran all of your DNS infrastructure on MS-DNS 
then the problem wouldn't exist. If it's not feasible to set up a test 
zone on a MS-DNS server and reproduce the problem that way, then 
_at_least_ get some network-level traces to bolster your case -- they're 
never going to take the output of dig as gospel, especially if you've 
hacked it ever-so-slightly from the original BIND default, in order to 
get some reasonable output. Ideally, you should capture both the 
original query/response transaction between your BIND box to the MS-DNS 
resolver, and then the bogus query/response transaction between an 
ordinary DNS client (Wintel of course) and the MS-DNS resolver.

Interoperability issues: hmmm... I don't see any particular prohibition 
in the RFCs against "chaining" compression pointers, and therefore no 
limitation on how many "links" such a chain may have. It does seem to 
me, however, like implementations which chain compression pointers 
deeply, however, are programmed for failure since RFC 1035's description 
of how resolvers are implemented clearly anticipates that they will 
"limit the amount of work which will be performed for [any given] 
request" (Section 7.1) and deeply-chained compression pointers obviously 
represent per-request workload. So implementations cannot expect to 
chain to an arbitrary depth without running into interoperability 
problems. In that respect, I think BIND's limit of 16 is reasonable. Of 
course, all of this is more-or-less moot anyway, since, regardless of 
compression pointers or the chaining thereof, duplicates of the same SOA 
RR in a given response is flagrantly RFC-non-compliant (see, for 
example, RFC 2181, Section 5.5). *Illegal* responses are likely to cause 
interoperability problems, sure.

Likelihood of a patch: hahaha. It took them -- what? -- 3 years to 
finally fix the "inconsistent serial numbers" problem for AD-integrated 
zones. I wouldn't hold my breath waiting for them to fix this latest 
problem.

- Kevin

Danny Thomas wrote:
> While this message describes an apparently bogus response from the
> Microsoft Windows 2003 DNS server, there are two points relevant
> to bind
>   1) bind9's dig refuses to print the response (more a curiosity)
>   2) while I've only seen such responses from cached records, without
>      knowing the full scope of the problem there exists the potential for
>      interoperability issues with bind
>
> I'd be grateful if anyone else can shed light on this behaviour or
> knows an effective way to raise the issue with Microsoft, e.g. to
> identify
>   1) that it is a problem
>   2) whether the scope of the problem might extend beyond cached
>      records, i.e. possible interoperability issues if bind
>      ignores records with more than 16-odd copies of the SOA record
>      in the authority section
>   3) the likelihood of a patch
>
>
>
>
> BACKGROUND
> =========================================================================
> I've written a script to survey name-servers running on our network,
> which include many from a default install of ActiveDirectory.
> Unfortunately these often have their own separate version of zones,
> though I was pleasantly surprised to find nearly all forwarding
> through our central name-servers (mainly by checking whether rfc1918
> reverse zones come from the IANA blackholes).
> NB one motivation from the survey was to identify MS name-servers
>    so they can be shutdown. But it's not that simple as disabling the
>    name-server as that can result in domain logins taking 10 minutes.
>    We'll need to get our MS sysadmins to resolve the slow logins
>    before we can start shutting them down en mass.
>
> Part of the survey uses fpdns (http://www.rfc.se/fpdns/) to fingerprint
> the name-server software, but fpdns fails for all name-servers
> exhibiting the following problem NB fingerprinting fails for quite
> a few non-Microsoft name-servers too. While a few Microsoft systems
> seem to be successfully fingerprinted, only NT and Win2K versions
> are reported. The apparent problem fingerprinting Win2K3's name-server
> is something I'll take up on the fpdns list, but nmap OS fingerprinting
> indicates the following problem happens on Win2K3 systems.
>
>
> THE PROBLEM
> =========================================================================
> An SOA query is done for the zones in the master named.conf, and many
> of the MS servers return a truncated response for most of the 1,400
> odd zones. Curiously, doing an ANY query works fine. While bind-8.3
> has no problem printing the response, the bind9 dig reports:
>   ;; Truncated, retrying in TCP mode.
>   ;; Got bad packet: too many hops
>   1884 bytes
> followed by a hex dump of the response. Using bind-9.4.0b1's dig
> after increasing DNS_POINTER_MAXHOPS in lib/dns/include/dns/name.h
> from 16 -> 64 prints out similarly to bind8's dig:
>
> bin/dig/dig @130.102.198.22 awmc.uq.edu.au soa
> ;; Truncated, retrying in TCP mode.
>
> ; <<>> DiG 9.4.0b1 <<>> @130.102.198.22 awmc.uq.edu.au soa
> ; (1 server found)
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26092
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 50, ADDITIONAL: 0
>
> ;; QUESTION SECTION:
> ;awmc.uq.edu.au.                        IN      SOA
>
> ;; ANSWER SECTION:
> awmc.uq.edu.au.         2282    IN      SOA
>   noddns.cc.uq.edu.au. hostmaster.uq.edu.au. 2006091502 10800 1800 3600000 3600
>
> ;; AUTHORITY SECTION:
> cc.uq.edu.au.           2256    IN      SOA
>   noddns.cc.uq.edu.au. hostmaster.uq.edu.au. 2006091501 10800 1800 3600000 3600
> cc.uq.edu.au.           2256    IN      SOA 
>   noddns.cc.uq.edu.au. hostmaster.uq.edu.au. 2006091501 10800 1800 3600000 3600
> <48 more copies of this SOA record>
>
> ;; Query time: 4 msec
> ;; SERVER: 130.102.198.22#53(130.102.198.22)
> ;; WHEN: Sun Sep 17 08:30:20 2006
> ;; MSG SIZE  rcvd: 1892
>
> I'm not suggesting DNS_POINTER_MAXHOPS should be increased as I expect
> there were reasons/experience to suggest 16 was adequate.
>
> NB the SOA query seems to behave properly when un-cached (reponse
> has aa and full TTL), and (sometimes?) another SOA query works
> properly with the result coming from the cache (no aa and reduced
> TTL) before subsequent responses have this 50 SOA records in the
> authority section)
>
> Danny
>
>