SERVFAIL for some domains on some servers

Oliver Henriot Oliver.Henriot at imag.fr
Tue Mar 2 20:17:32 UTC 2010


Dear Kevin,

Dans sa grande sagesse, Kevin Darcy a écrit, le 02/03/2010 20:08 :
> 1. DNS uses UDP primarily.

My mistake, telnet is not sufficient as a test.

> 2. The nameservers for labanquepostale.fr appear to be on the same 
> subnet. If true, then there are likely to be one or more points of 
> failure (e.g. switch, router, WAN link) for the whole zone. What you 
> might be seeing are relatively-normal short, temporary outages, which 
> become fatal because of poor planning/design. Diversity is good.

Yeah, but that's odd nonetheless as some of my servers always fail (and 
have been doing so for weeks now... oops) whereas others never fail. If 
it were a short temporary outage on their side I'd be getting 
simultaneous failures on all of my servers. The observed failures seem 
more likely to indicate a problem on my network.

Best regards,

Oliver

>
>                                                                     
>                                                                     
>     - Kevin
>
> On 3/2/2010 4:57 AM, Oliver Henriot wrote:
>> Dear Sten,
>>
>> I didn't give the domain I'm encountering problems with because it 
>> seemed irrelevant to me.
>>
>> As Stéphane Bortzmeyer says in his message of 01/03/10 11:44, it's 
>> best to give names, so here goes :
>> x.fr is labanquepostale.fr
>> "1" is imag.imag.fr
>> "2" is brahma.imag.fr
>> "3" is isis.imag.fr
>> "4" is cosmos.imag.fr
>>
>> As to a possible firewall problem, how could this be if the servers 
>> encountering problems don't have any access problems on TCP port 53?
>>
>> Thanks.
>>
>> Oliver
>>
>> Dans sa grande sagesse, Sten Carlsen a écrit, le 27/02/10 19:06 :
>>> Since you don't tell which domain is the problem and at least I get
>>> perfect answers for imag.fr (my only possible guess) from all listed
>>> servers, I can have no clue.
>>>
>>> Best guess is still some firewall doing something stupid.
>>>
>>>
>>> Oliver Henriot wrote:
>>>> Dear list users,
>>>>
>>>> Maybe you can help me out here. Please bear with me if I'm stating the
>>>> obvious, but my computing skills are scarce and I still have a lot to
>>>> learn.
>>>>
>>>> I have a series of name servers, some of which fail to resolve hosts
>>>> in other domains whereas others don't have any problem.
>>>>
>>>> My setup is as follows :
>>>> - server "1" : master for my domain, recursion disabled for all except
>>>> localhost. Setup is BIND 9.5.1-P2 on SunOS 5.9.
>>>> - servers "2", "3" and "4" : slaves for my domain, recusrion allowed
>>>> for all, official resolvers for my clients, same configuration on all
>>>> 3. Setup is DiG 9.3.6-P1 on CentOS 5.4.
>>>>
>>>> Servers "2" and "4" fail to resolve domain x.fr whereas "1" and "3"
>>>> have no problem (if interrogated locally for "1" of course). The error
>>>> I get is :
>>>>
>>>>
>>>> dig -t A @"2" www.x.fr
>>>>
>>>> ;<<>>  DiG 9.3.6-P1-RedHat-9.3.6-4.P1.el5_4.2<<>>  -t A @"2" www.x.fr
>>>> ; (1 server found)
>>>> ;; global options:  printcmd
>>>> ;; Got answer:
>>>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 37397
>>>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>>>>
>>>> ;; QUESTION SECTION:
>>>> ;www.x.fr.                IN      A
>>>>
>>>> ;; Query time: 4622 msec
>>>> ;; SERVER: "2"#53("2")
>>>> ;; WHEN: Sat Feb 27 18:20:07 2010
>>>> ;; MSG SIZE  rcvd: 40
>>>>
>>>>
>>>> The behavior is the same for "4" and for any host in domain x.fr (and
>>>> the domain itself).
>>>>
>>>> It's not a network problem, I can telnet on port 53 of the name
>>>> servers for domain x.fr from "2" (obviously using the ip address as
>>>> the name can't be resolved by the server).
>>>>
>>>> Also, reverse queries for hosts in domain x.fr from "2" do not fail.
>>>>
>>>> Finally, even more strange, if I use dig's +trace option servers "2"
>>>> and "4" do not fail any more and can resolve www.x.fr (although the
>>>> query lags quite a bit when doing the last bit of resolving, from x.fr
>>>> to www.x.fr).
>>>>
>>>> Here's the output :
>>>>
>>>> dig www.x.fr @"2" +trace
>>>>
>>>> ;<<>>  DiG 9.5.1-P3<<>> www.x.fr @"2" +trace
>>>> ;; global options:  printcmd
>>>> .                       518400  IN      NS      F.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      G.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      H.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      I.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      J.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      K.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      L.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      M.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      A.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      B.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      C.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      D.ROOT-SERVERS.NET.
>>>> .                       518400  IN      NS      E.ROOT-SERVERS.NET.
>>>> ;; Received 500 bytes from "2"#53("2") in 2 ms
>>>>
>>>> fr.                     172800  IN      NS      E.EXT.NIC.fr.
>>>> fr.                     172800  IN      NS      B.EXT.NIC.fr.
>>>> fr.                     172800  IN      NS      F.EXT.NIC.fr.
>>>> fr.                     172800  IN      NS      A.NIC.fr.
>>>> fr.                     172800  IN      NS      C.NIC.fr.
>>>> fr.                     172800  IN      NS      G.EXT.NIC.fr.
>>>> fr.                     172800  IN      NS      D.NIC.fr.
>>>> fr.                     172800  IN      NS      D.EXT.NIC.fr.
>>>> ;; Received 444 bytes from 192.58.128.30#53(J.ROOT-SERVERS.NET) in 
>>>> 44 ms
>>>>
>>>> x.fr.     172800  IN      NS      ns1.x.fr.
>>>> x.fr.     172800  IN      NS      ns2.x.fr.
>>>> ;; Received 108 bytes from 193.176.144.6#53(E.EXT.NIC.fr) in 33 ms
>>>>
>>>> www.x.fr. 300     IN      A       xxx.xxx.xxx.xxx
>>>> x.fr.     300     IN      NS      ns2.x.fr.
>>>> x.fr.     300     IN      NS      ns1.x.fr.
>>>> ;; Received 124 bytes from xxx.xxx.xxx.xxx#53(ns1.x.fr) in 0 ms
>>>>
>>>>
>>>> I'm at a loss as to what's going on (or wrong) here and what I can to
>>>> do to solve the problem. Any help would be greatly appreciated.
>>>>
>>>> Thanks in advance.
>>>>
>>>> Oliver
>>>>
>>>>
>>>> _______________________________________________
>>>> bind-users mailing list
>>>> bind-users at lists.isc.org
>>>> https://lists.isc.org/mailman/listinfo/bind-users
>>>
>>
>>
>> _______________________________________________
>> bind-users mailing list
>> bind-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/bind-users
>
>
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20100302/8247401a/attachment.html>


More information about the bind-users mailing list