Esoteric question

Gregory Sloop gregs at sloop.net
Tue Sep 17 15:56:48 UTC 2019


Top posting

I don't have captures on Eth1 - though that's probably a good idea. Hard though, because it's a site that is in production like 7x12+ - so a PITA to go onsite (for the fourth time now) to grab some more data...

The potential of an interface with an overlapping subnet on Eth1 was raised and that's a good idea, I think.
But I certainly can't see anything in my config that would do that. I've stripped the config down the the very basics; just, essentially, defining the two Eth interfaces, the NAT/MASQ, DNS & NTP - in an effort to make sure there wasn't something somewhere in the config that was inadvertently causing the issue.

A Question, if anyone knows the answer.
If it's doing a full handshake on Eth0 currently, doesn't that indicate that it believes that Eth0 is the proper interface for that subnet declaration - and so, why would it also be doing it on another interface too? [I get why it would be good to verify by doing some packet-caps - but asking for my own knowledge/education.]

As for cloud-mgmt/call-home - no there's none of that.

Thanks for the thoughts so far.

-Greg

gsuca> Hi Greg,

gsuca> A very interesting problem... I've heard good reports about both those
gsuca> vendor's hardware, so sounds like a reasonable choice.

gsuca> What do you get if you snoop eth1 while connected to the different WAN
gsuca> devices? I wonder if dhcpd is trying to talk to something else upstream
gsuca> (no idea why it would do that).

gsuca> Does the Ubiquiti have some form of cloud management or call home setup?

gsuca> Best of luck.

gsuca> regards,
gsuca> -glenn

gsuca> On 2019-09-17 09:20, Gregory Sloop wrote:
>> So, this is kind of a wild goose-chase for some direction - but
>> thought there might be some useful answers here.

>> [But I know it's way out there and I'm not going to get direct help on
>> solving the issue on the platform I'm having issues with - just bear
>> with me and see if you have any helpful ideas.]

>> Let me set the background.

>> I'm using specific device hardware - in this case, a Mikrotik RB450G
>> [currently in place] and moving to a Ubiquiti EdgeRouter lite.
>> They're multi-ethernet interface routers - based on Linux.
>> The RB450G works fine and simply needs replacement. [The two devices
>> are configured as identically as I can. They're very different, so
>> we're talking "functionally" identical, not literally with the same
>> conf files.]

>> I'm having issues with DHCPd on the new device. [And queries at
>> Ubiquiti are going nowhere fast. It IS an unusual problem, so I'm not
>> terribly surprised.]

>> Lets assume Eth0/LAN is 10.0.0.1/24
>> DHCPD is setup to hand out addresses for 10.0.0.20-100, say.
>> 14440 second leases.
>> Clients are connected directly to a switch that's directly connected
>> to ETH0. [No DHCP relay etc.]

>> Eth1/WAN is a static /30 - connected directly to a Comcast Modem/BSG.
>> Lets say 1.2.3.5/30
>> The gateway [not that it matters is 1.2.3.6]

>> We're masquerading traffic [NAT] from the local RFC1918 [10.0.0.0/24]
>> network to the static public IP on the WAN.

>> ---
>> So, here's what happens/happened.

>> I went in to swap out the 'Tik box for the new hardware.
>> Plug it in, and none of the clients on the LAN get DHCP addresses. All
>> the DHCP clients time out.
>> After several passes at testing here's what I find.

>> I can't find any configuration problems on the replacement hardware.
>> The *old* 'Tik hardware/software works perfectly.

>> If we have the WAN connected to a simple live ethernet port on the
>> *new hardware,* [EdgeRouter] DHCP works fine for the LAN side. Totally
>> fine.
>> Only when we plug in the Comcast gateway/modem into the WAN port on
>> the new hardware does DHCP fail/timeout. [Remember just plugging it
>> into a regular ethernet switch works fine. It won't pass traffic,
>> because the static IP assignment isn't right - but the LAN side DHCP
>> server works perfectly.]

>> If we take a client on the LAN and plug in a static IP [rather than
>> DHCP], traffic flows out to the internet perfectly fine.

>> Packet caps from the new router show that the router/DHCP server IS
>> seeing all the DHCP protocol handshake. [When it's having the
>> "problem."]
>> The client does a DISCOVER
>> Server responds with OFFER
>> The client responds with REQUEST
>> Then there's a LONG pause. [like 90s+ worth.]
>> The Server responds with ACK. [It actually appears to send several
>> ACKS. I probably cut my captures too short, so I only have about 2m of
>> capture in my largest one. But that's what I see in what I have.]
>> However, the client [Windows in this case] has timed out, and never
>> gets the ACK.
>> And while I'm not 100% certain, the times I've looked, the device
>> believes it's handed out a lease. [I believe it's in the leases file.]
>> But because of the long delay, the client never actually got the
>> lease.

>> Again,
>> -simply unplugging the Comcast modem from the router, and DHCP
>> immediately starts working again.
>> -Plugging Eth1 into a live ethernet port [so that interface is seen as
>> up] also works fine.
>> -It's only when connected to the Comcast gateway/modem that it fails.

>> On the LAN side of the network, we've tinkered replacing the switches
>> - dumb, identically configured managed switches, different manged
>> switch, or no switch at all - simply plugged directly into a single
>> client. No changes on the LAN side make the slightest difference
>> either.

>> Since we're doing NAT/MASQ from LAN->WAN no WAN traffic should leak
>> into the LAN - but I've also explicitly defined rules that prevent
>> anything from the WAN getting to the LOCAL or LAN interfaces - other
>> than established/related traffic.

>> So, I'm not asking for you to solve the issue on this particular
>> hardware. What I'm asking for is some plausible explanation that might
>> have these symptoms. I'm completely at wits end. I've spent a lot of
>> hours trying a whole host of troubleshooting things - but I can't
>> think of any possible way this could be happening. But clearly it is.

>> IMO, either we have some very weird hardware physical layer problem
>> that only impacts DHCP [and not traffic routing] or there's something
>> I'm missing. I'd normally imagine that I'm missing something - but
>> can't figure out what, if anything.

>> I've tried to closely define the setup, but I'm sure I've forgotten
>> something - perhaps lots of somethings - just ask and I'll try to
>> clarify any missing pieces.

>> Given how awesome people on this list are, I'm hopeful someone will
>> have something that might jiggle loose something useful!

>> TIA
>> -Greg
>> _______________________________________________
>> dhcp-users mailing list
>> dhcp-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/dhcp-users

-- 
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: gregs at sloop.net
http://www.sloop.net
---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20190917/af4430da/attachment.html>


More information about the dhcp-users mailing list