Warning: ID mismatch:

Ladislav Vobr lvobr at ies.etisalat.ae
Tue Sep 14 03:51:16 UTC 2004


u will definitely experience problems when you have 100% utilization on
the inside forwarding servers. The forwarding might be the reason, other
thing you mentioned might be some reachibility problems, bind gets very
busy when some domains are completely unreachable, and also have problem
responding since it's queue gets full following up with all these
unreachable servers several times for each such request. There is a tool
called dnstop I think from caida.org it can show you real-time traffic
going out and coming to your dns server sorted by the traffic, it might
give you a hint what are your top talkers or your top destinations your
name sever is trying to reach and you might be surprised what your
nameserver is trying to do in the background.

How much difficult would be to remove the forwarding and perhaps try a
state full firewall, letting even the internal servers, follow up
directly, there is really not many advantages in the forwarding, but it
can be source of lot of confusion. Generally imho people try to avoid
it, if there is some other choice.

Ladislav

Maria Iano wrote:
> I agree with you on this - the ID mismatch error was a red herring. I'm won=
> dering now if there is some issue with unexpectedly high memory use.
> 
> I am still experiencing this issue, now almost daily during the work week. =
> At the times of day when we get the most lookups (lunchtime when everyone s=
> tarts surfing) one or the other of the servers stops responding to queries.=
>  In the debugging it looks like when this happens, it receives queries, and=
>  forwards them to the outside resolver, but doesn't recognize the reply fro=
> m the outside server. I can see the packets returning from the outside serv=
> er. The broken piece seems to occur at that point.
> 
> One thing I have noticed about these inside resolvers is that they are runn=
> ing at about 100% memory use (1 Gb of RAM on each) at all times. Everything=
>  else in the system is fine. The load reports as 0. Things like UDP socket =
> use, and all sorts of data from sar, are all fine. The outside resolvers th=
> at they forward to are identical builds on identical hardware, yet they run=
>  at about 50% memory use. The outside resolvers are also used by about 180 =
> other locations, and get at least 10 times the number of queries, yet they =
> are the ones doing fine.
> 
> There are really two differences between the servers that are fine (the out=
> side ones), and the servers that keep ceasing to resolve (the inside ones).=
>  The outside ones resolve queries in the usual iterative way. The inside on=
> es resolver queries by forwarding to other servers. The other difference is=
>  that the inside servers get a lot of reverse 1918 queries which forward to=
>  other internal (Windows) servers, and those servers sometimes don't answer=
> . In fact those servers sometimes forward the queries back out, but thankfu=
> lly I don't see a loop occurring, so the inside resolvers seem smart enough=
>  to drop thing there. I am about to get this issue fixed, in that the Windo=
> ws servers are about to be told they own all of that space, it has just tak=
> en a week to get the process accomplished for this to happen. Last week I a=
> lso created a lot of dummy zones for the reverse space on our inside resolv=
> ers, so the servers could answer right away. I'm not convinced that will fi=
> x the issue anyway.
> 
> I am trying to determine why the inside servers run at 100% while the outsi=
> de servers run at about 50% memory usage. I'm also building an updated repl=
> acement for one of the inside resolvers to use fedora in place of RH8, and =
> to no longer use the grsecurity patch, to see if that helps.
> 
> Thanks,
> Maria
> 
> -----Original Message-----
> From: bind-users-bounce at isc.org [mailto:bind-users-bounce at isc.org] On
> Behalf Of Ladislav Vobr
> Sent: Friday, September 10, 2004 7:39 PM
> Cc: BIND Users Mailing List
> Subject: Re: Warning: ID mismatch:
> 
> sometimes, when you try to query unreachable domains, you recursive
> servers tries to retry several times to all of the remote name severs
> and  most of the time there is no reply from your caching servers before
> the dig time-out, sometimes there is a SERVFAIL reply later than the
> time-out.
> 
> so if you repeat the dig command, several times for the same domain, you
> might get the first reply for the second dig you have issued, thus
> seeing this message (ID Mismatch) and it is perfectly valid, but came in
> the wrong time :-). Nothing wrong with your firewall or server itself.
> 
> So you have to think little bit about the situation :-) I remember using
> nslookup once and it is so stupid, it doesn't even check the source ip
> address in the reply packets, I was troubleshooting it through the
> firewall, with misconfigured NAT and nslookup keeps working even when
> the reply came from different ip :-) than you sent it. (But the server
> obviously not :-) Somebody did really poor job with nslookup. But this
> is different story :-)
> 
> Ladislav
> 
> 
> Maria Iano wrote:
> 
>>This same issue is recurring! This time it is on res1 again. res1 has
> 
> address 172.21.0.100 and res2 has address 172.21.0.200. Below I have
> pasted in the series of dig commands I ran on res2 sending queries to
> res1. Below that I have pasted in the tethereal output during those
> commands.
> 
>>=20
>>Since this issue seems to only be a problem for data which isn't
> 
> cached, I wonder if there is any connection with the thread with subject
> 'Weird named act!'. So I also issued this command suggested in that
> thread:
> 
>>=20
>>res1 in:  bind$ ps -flp 24708
>>Warning: /boot/System.map has an incorrect kernel version.
>>  F S UID        PID  PPID  C PRI  NI ADDR    SZ  WCHAN STIME TTY
> 
> TIME CMD
> 
>>140 S bind     24708     1  0  74   0    -  3596 14372d Sep07 ?
> 
> 00:00:55 [named]
> 
>>=20
>>This server has a non-modular kernel with the grsecurity patch. In
> 
> case it's relevant here is the output of uname -a:=20
> 
>>res1 in:  bind$ uname -a
>>Linux ent-mocux15.moc.gci 2.4.20-grsec #3 Tue Mar 25 09:21:41 EST 2003
> 
> i686 i686 i386 GNU/Linux
> 
>>=20
>>Thanks in advance for any help!
>>Maria
>>=20
>>###################################################
>>Commands issued on res2
>>###################################################
>>=20
>>res2 in:  bind$ dig @res1.moc.gci www.silver.com
>>=20
>>; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.silver.com
>>;; global options:  printcmd
>>;; connection timed out; no servers could be reached
>>res2 in:  bind$ dig @res1.moc.gci www.silver.com
>>;; Warning: ID mismatch: expected ID 56696, got 10590
>>;; Warning: ID mismatch: expected ID 56696, got 10590
>>=20
>>; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.silver.com
>>;; global options:  printcmd
>>;; Got answer:
>>;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56696
>>;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0
>>=20
>>;; QUESTION SECTION:
>>;www.silver.com.                        IN      A
>>=20
>>;; ANSWER SECTION:
>>www.silver.com.         86400   IN      A       205.150.176.184
>>=20
>>;; AUTHORITY SECTION:
>>silver.com.             259200  IN      NS      ns1.ktrafic.com.
>>silver.com.             259200  IN      NS      ns2.ktrafic.com.
>>=20
>>;; Query time: 2716 msec
>>;; SERVER: 172.21.0.100#53(res1.moc.gci)
>>;; WHEN: Wed Sep  8 12:19:43 2004
>>;; MSG SIZE  rcvd: 92
>>=20
>>res2 in:  bind$ dig @res1.moc.gci www.gold.com
>>=20
>>; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
>>;; global options:  printcmd
>>;; connection timed out; no servers could be reached
>>res2 in:  bind$ dig @res1.moc.gci www.gold.com
>>=20
>>; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
>>;; global options:  printcmd
>>;; connection timed out; no servers could be reached
>>res2 in:  bind$ dig @res1.moc.gci www.gold.com
>>=20
>>; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
>>;; global options:  printcmd
>>;; connection timed out; no servers could be reached
>>res2 in:  bind$ dig @res1.moc.gci www.purple.com
>>;; Warning: ID mismatch: expected ID 58216, got 51960
>>;; Warning: ID mismatch: expected ID 58216, got 51960
>>;; Warning: ID mismatch: expected ID 58216, got 36737
>>;; Warning: ID mismatch: expected ID 58216, got 36737
>>;; Warning: ID mismatch: expected ID 58216, got 20208
>>;; Warning: ID mismatch: expected ID 58216, got 20208
>>=20
>>; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.purple.com
>>;; global options:  printcmd
>>;; connection timed out; no servers could be reached
>>res2 in:  bind$ dig @res1.moc.gci www.gold.com
>>=20
>>; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
>>;; global options:  printcmd
>>;; Got answer:
>>;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46790
>>;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 0
>>=20
>>;; QUESTION SECTION:
>>;www.gold.com.                  IN      A
>>=20
>>;; ANSWER SECTION:
>>www.gold.com.           86313   IN      CNAME   gold.com.
>>gold.com.               86311   IN      A       198.70.201.51
>>=20
>>;; AUTHORITY SECTION:
>>gold.com.               86311   IN      NS      extns1.jewels.com.
>>gold.com.               86311   IN      NS      extns2.jewels.com.
>>=20
>>;; Query time: 1 msec
>>;; SERVER: 172.21.0.100#53(res1.moc.gci)
>>;; WHEN: Wed Sep  8 12:21:41 2004
>>;; MSG SIZE  rcvd: 109
>>=20
>><performed rndc flush on res1>
>>=20
>>res2 in:  bind$ dig @res1.moc.gci www.gold.com
>>=20
>>; <<>> DiG 9.2.3 <<>> @res1.moc.gci www.gold.com
>>;; global options:  printcmd
>>;; connection timed out; no servers could be reached
>>=20
>>###################################################
>>Output of tethereal during those commands
>>###################################################
>>=20
>>  0.000000 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.blue.com
> 
>>  0.000124 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME blue.com A 216.91.187.86
> 
>>  4.991126 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP Who has 172.21.0.200?
> 
> Tell 172.21.0.100
> 
>>  4.991493 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP 172.21.0.200 is at
> 
> 00:02:55:7b:a4:a3
> 
>>  6.320441 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.silver.com
> 
>> 11.318427 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
> 
> Tell 172.21.0.200
> 
>> 11.318438 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
> 
> 00:02:55:7b:a6:69
> 
>> 11.328548 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.silver.com
> 
>> 24.820791 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.silver.com
> 
>> 27.536065 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
> 
> 205.150.176.184
> 
>> 27.536121 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
> 
> 205.150.176.184
> 
>> 27.536184 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
> 
> 205.150.176.184
> 
>> 36.446784 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.gold.com
> 
>> 41.449517 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.gold.com
> 
>> 49.777125 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.gold.com
> 
>> 54.769991 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
> 
> Tell 172.21.0.200
> 
>> 54.770002 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
> 
> 00:02:55:7b:a6:69
> 
>> 54.779985 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.gold.com
> 
>> 61.418983 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.gold.com
> 
>> 66.420344 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.gold.com
> 
>> 76.502267 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.purple.com
> 
>> 77.687081 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME gold.com A 198.70.201.51
> 
>> 77.687142 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME gold.com A 198.70.201.51
> 
>> 77.687208 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME gold.com A 198.70.201.51
> 
>> 77.687263 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME gold.com A 198.70.201.51
> 
>> 77.687328 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME gold.com A 198.70.201.51
> 
>> 77.687382 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME gold.com A 198.70.201.51
> 
>> 81.510874 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.purple.com
> 
>> 82.684071 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP Who has 172.21.0.200?
> 
> Tell 172.21.0.100
> 
>> 82.684293 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP 172.21.0.200 is at
> 
> 00:02:55:7b:a4:a3
> 
>> 96.508164 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
> 
> 153.104.63.227
> 
>> 96.508232 172.21.0.100 -> 172.21.0.200 DNS Standard query response A
> 
> 153.104.63.227
> 
>> 96.508587 172.21.0.200 -> 172.21.0.100 ICMP Destination unreachable
>> 96.508589 172.21.0.200 -> 172.21.0.100 ICMP Destination unreachable
>>101.501576 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
> 
> Tell 172.21.0.200
> 
>>101.501587 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
> 
> 00:02:55:7b:a6:69
> 
>>145.126659 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.gold.com
> 
>>145.127129 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME gold.com A 198.70.201.51
> 
>>150.123148 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
> 
> Tell 172.21.0.200
> 
>>150.123159 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
> 
> 00:02:55:7b:a6:69
> 
>>=20
>><performed rndc flush on res1>
>>=20
>>229.285189 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.gold.com
> 
>>234.276056 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
> 
> Tell 172.21.0.200
> 
>>234.276067 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
> 
> 00:02:55:7b:a6:69
> 
>>234.286050 172.21.0.200 -> 172.21.0.100 DNS Standard query A
> 
> www.gold.com
> 
>>269.304469 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME gold.com A 198.70.201.51
> 
>>269.304526 172.21.0.100 -> 172.21.0.200 DNS Standard query response
> 
> CNAME gold.com A 198.70.201.51
> 
>>269.304821 172.21.0.200 -> 172.21.0.100 ICMP Destination unreachable
>>269.304822 172.21.0.200 -> 172.21.0.100 ICMP Destination unreachable
>>274.297311 Ibm_7b:a4:a3 -> Ibm_7b:a6:69 ARP Who has 172.21.0.100?
> 
> Tell 172.21.0.200
> 
>>274.297324 Ibm_7b:a6:69 -> Ibm_7b:a4:a3 ARP 172.21.0.100 is at
> 
> 00:02:55:7b:a6:69
> 
>>On Wed, Sep 08, at 10:58%P so wrote Ladislav Vobr
> 
> (lvobr at ies.etisalat.ae):
> 
>>=20
>>=20
>>
>>>Maria Iano wrote:
>>>
>>>
>>>>I have two caching servers, res1 and res2, running BIND 9.2.3 on Red
> 
> Hat Linux release 8.0 (Psyche). They sit inside a firewall, and forward
> queries to four different caching servers on the outside, as well as
> some internal servers authoritative for internal zones.=20
> 
>>>>Last week res2 starting being slow and failing resolution
> 
> intermittently. Dig queries sent from res2 to the outside resolvers
> worked correctly. Dig queries sent from res2 to res1 worked correctly.
> However, dig queries from res1 to res2 produced error messages like
> this:
> 
>>>>;; Warning: ID mismatch: expected ID 3325, got 34596
>>>>
>>>>with various different IDs produced from different queries. It was
> 
> late at night (I had been paged) so I went ahead and rebooted res2. This
> cleared up the issue.
> 
>>>>Now, a week later, this same issue is occurring on res1. res1 is slow
> 
> to respond to queries and intermittently failing to resolve names. digs
> issued on res1 pointing to the outside resolvers work fine. Digs issued
> on res1 pointing to res2 work fine. Digs issued on res2 pointing to res1
> produce the ID mismatch errors again.
> 
>>>>I suspect that if I reboot it the error will clear up again, but
> 
> before I do that I want to try and work out what is going on.
> 
>>>>Any advice?
>>>
>>>You might possibly use a packetsniffer to see what you send and what=20
>>>other side received and similiarly for the reply. On linux you can use
> 
> 
>>>tcpdump or ethereal for example. I faced once these messages, when I
> 
> was=20
> 
>>>using query-source port 53 on my recursive nameserver, and I patched
> 
> dig=20
> 
>>>to use port 53 as a source port as well, than I got lot of these=20
>>>everytime I issued such a command from the recursive server prompt,
> 
> but=20
> 
>>>it was understandable, since regular replies coming to my nameserver=20
>>>confused dig.
>>>
>>>
>>
>>=20
>>=20
> 
> 
> 
> 
> 
> ----- End forwarded message -----
> 



More information about the bind-users mailing list