<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<title>Re: Bind9 stops responding for some clients</title>
<style><!--
/* Font Definitions */
@font-face
{font-family:"MS Mincho";
panose-1:2 2 6 9 4 2 5 8 3 4;}
@font-face
{font-family:"MS Mincho";
panose-1:2 2 6 9 4 2 5 8 3 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:"\@MS Mincho";
panose-1:2 2 6 9 4 2 5 8 3 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#8064A2;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-AU" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#8064A2">Congratulations on finding the cause.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#8064A2"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#8064A2">Sometimes, it's the simplest of things.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#8064A2"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#8064A2">Stuart<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#8064A2"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> bind-users [mailto:bind-users-bounces@lists.isc.org]
<b>On Behalf Of </b>Gregory Sloop<br>
<b>Sent:</b> Thursday, 6 June 2019 12:37 PM<br>
<b>To:</b> bind-users@lists.isc.org<br>
<b>Subject:</b> Re: Bind9 stops responding for some clients<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:9.0pt;font-family:"Courier New"">Thanks for the idea.<br>
I did resolve this a day or two ago.<br>
<br>
The story is; <br>
This server was a fairly recent replacement for an older Ubuntu setup. The new server as well as the old one are/were VM's - yet on different VM platforms. The old VM was turned off, and was marked never to start except unless manually started. [There were
a few other things on the VM host that had yet to be migrated - so we didn't want it entirely off quite yet.]<br>
<br>
The problem happened again in the last day or two - and packet captures showed that no packets were even arriving at the new VM.<br>
Since there really wasn't anything that should be blocking that traffic, I checked the arp table on a problem client.
<br>
The arp table showed an "incorrect" MAC address for the current BIND server. [The MAC in the arp table didn't match the MAC for the new VM.]<br>
<br>
While I didn't have the MAC address for the "old" deactivated server handy, it was the first obvious problem/solution to check.<br>
Sure enough, after connection to the VM hypervisor, I could see that the "old" BIND vm was active.<br>
<br>
I killed it, and service returned to normal.<br>
<br>
So, the "solution" was pretty routine.<br>
What made it more "interesting" and perhaps odd is how seemingly randomly the problem would crop up.
<br>
And it would only impact some clients, not all. There was no pattern that seemed to explain why some got the current/correct BIND server and others didn't. [The arp poisoning certainly wasn't anywhere nearly universal.]<br>
And why was it so infrequent - it would go many days between issues.<br>
I have to assume the bad VM had been up for some time, at least since the problems started.<br>
There are quite a number of odd-ish other things too, but not worth detailing.<br>
<br>
Probably it's just one of those "undefined" situations where you can't anticipate some predictable order to what happens when you screw it up. Rather than burn additional time trying to grok what was going on - it's simply best to say "don't do that - bad things
happen, though I can't say what bad things will happen and in which logical order. They just will - so DON'T DO THAT!"<br>
<br>
[And yeah, I obviously knew all about not doing that. But it happened anyway, in spite of specific steps to prevent it. I'm still not sure why.]<br>
<br>
In the end, it's a somewhat complicated story with a very obvious cause - but it wasn't so clear at the outset.<br>
<br>
TLDR version; <br>
Don't run your old and new bind servers on the same IP address - ether by accident or intentionally. Bad stuff will happen!
<br>
It might be really odd, or it might be plain as day - but in either case it won't be good! :)<br>
<br>
Thanks all for the suggestions! <br>
Here's hoping I don't need to ask for BIND assistance for another 20 years! :)<br>
<br>
-Greg<br>
<br>
</span><o:p></o:p></p>
<table class="MsoNormalTable" border="0" cellpadding="0">
<tbody>
<tr>
<td width="2" style="width:1.5pt;background:blue;padding:.75pt .75pt .75pt .75pt">
</td>
<td style="padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Courier New"">I just randomly spotted this post, and thought I would toss in 2ยข<br>
<br>
How many nics and how many it's are on the servers? Are the failing clients on the same subnet as the server?<br>
<br>
--</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>