Esoteric question

Patrick Trapp ptrapp at nex-tech.com
Tue Sep 17 14:24:45 UTC 2019


This is way over my head, but with your thorough description, a question comes to mind - did you by chance take a network capture of the WAN side, just to verify that the new device isn't mistakenly sending the requests out that port when it is available?

Patrick
________________________________
From: dhcp-users <dhcp-users-bounces at lists.isc.org> on behalf of Gregory Sloop <gregs at sloop.net>
Sent: Monday, September 16, 2019 6:20 PM
To: dhcp-users at lists.isc.org <dhcp-users at lists.isc.org>
Subject: Esoteric question


CAUTION: This email originated from outside of the company. Do not click links or open attachments unless you recognize the sender and know the content is safe.

So, this is kind of a wild goose-chase for some direction - but thought there might be some useful answers here.

[But I know it's way out there and I'm not going to get direct help on solving the issue on the platform I'm having issues with - just bear with me and see if you have any helpful ideas.]

Let me set the background.

I'm using specific device hardware - in this case, a Mikrotik RB450G [currently in place] and moving to a Ubiquiti EdgeRouter lite.
They're multi-ethernet interface routers - based on Linux.
The RB450G works fine and simply needs replacement. [The two devices are configured as identically as I can. They're very different, so we're talking "functionally" identical, not literally with the same conf files.]

I'm having issues with DHCPd on the new device. [And queries at Ubiquiti are going nowhere fast. It IS an unusual problem, so I'm not terribly surprised.]

Lets assume Eth0/LAN is 10.0.0.1/24
DHCPD is setup to hand out addresses for 10.0.0.20-100, say.
14440 second leases.
Clients are connected directly to a switch that's directly connected to ETH0. [No DHCP relay etc.]

Eth1/WAN is a static /30 - connected directly to a Comcast Modem/BSG.
Lets say 1.2.3.5/30
The gateway [not that it matters is 1.2.3.6]

We're masquerading traffic [NAT] from the local RFC1918 [10.0.0.0/24] network to the static public IP on the WAN.

---
So, here's what happens/happened.

I went in to swap out the 'Tik box for the new hardware.
Plug it in, and none of the clients on the LAN get DHCP addresses. All the DHCP clients time out.
After several passes at testing here's what I find.

I can't find any configuration problems on the replacement hardware.
The *old* 'Tik hardware/software works perfectly.

If we have the WAN connected to a simple live ethernet port on the *new hardware,* [EdgeRouter] DHCP works fine for the LAN side. Totally fine.
Only when we plug in the Comcast gateway/modem into the WAN port on the new hardware does DHCP fail/timeout. [Remember just plugging it into a regular ethernet switch works fine. It won't pass traffic, because the static IP assignment isn't right - but the LAN side DHCP server works perfectly.]

If we take a client on the LAN and plug in a static IP [rather than DHCP], traffic flows out to the internet perfectly fine.

Packet caps from the new router show that the router/DHCP server IS seeing all the DHCP protocol handshake. [When it's having the "problem."]
The client does a DISCOVER
Server responds with OFFER
The client responds with REQUEST
Then there's a LONG pause. [like 90s+ worth.]
The Server responds with ACK. [It actually appears to send several ACKS. I probably cut my captures too short, so I only have about 2m of capture in my largest one. But that's what I see in what I have.]
However, the client [Windows in this case] has timed out, and never gets the ACK.
And while I'm not 100% certain, the times I've looked, the device believes it's handed out a lease. [I believe it's in the leases file.] But because of the long delay, the client never actually got the lease.

Again,
-simply unplugging the Comcast modem from the router, and DHCP immediately starts working again.
-Plugging Eth1 into a live ethernet port [so that interface is seen as up] also works fine.
-It's only when connected to the Comcast gateway/modem that it fails.

On the LAN side of the network, we've tinkered replacing the switches - dumb, identically configured managed switches, different manged switch, or no switch at all - simply plugged directly into a single client. No changes on the LAN side make the slightest difference either.

Since we're doing NAT/MASQ from LAN->WAN no WAN traffic should leak into the LAN - but I've also explicitly defined rules that prevent anything from the WAN getting to the LOCAL or LAN interfaces - other than established/related traffic.

So, I'm not asking for you to solve the issue on this particular hardware. What I'm asking for is some plausible explanation that might have these symptoms. I'm completely at wits end. I've spent a lot of hours trying a whole host of troubleshooting things - but I can't think of any possible way this could be happening. But clearly it is.

IMO, either we have some very weird hardware physical layer problem that only impacts DHCP [and not traffic routing] or there's something I'm missing. I'd normally imagine that I'm missing something - but can't figure out what, if anything.

I've tried to closely define the setup, but I'm sure I've forgotten something - perhaps lots of somethings - just ask and I'll try to clarify any missing pieces.

Given how awesome people on this list are, I'm hopeful someone will have something that might jiggle loose something useful!

TIA
-Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20190917/2661567e/attachment.html>


More information about the dhcp-users mailing list