HP xw6400 Network Boot DHCP failure - no client DHCPREQUEST after server DHCPOFFER

Mon Sep 12 13:16:19 UTC 2011

so, you're looking at two completely different dhcp clients. One is part 
of the operating system (windows, linux, etc) and the other is built in 
to the BIOS of the computer (usually PXE).

Seeing that the broadcast address is not sent, I might guess that the 
PXE client is not requesting it, arguably a bug. I bet other dhcp 
servers always include this option, even when it is not explicitly 
requested.

Fortunately it is easy to add this to the list of requested paramters on 
the server. Usually most PXE boot settings are handled by checking for 
the vendor options, eg this class can include any specific settings you 
want for PXE booting:

# PXE boots for jumpstarting x86 boxes
class "PXE" {
   match if substring(option vendor-class-identifier, 0, 9) = "PXEClient";
   next-server drill.example.com.au;
   filename "pxegrub.I86PC.Solaris_10-1";
   # 10 minutes should be long enough for PXE
   max-lease-time 600;

   # append option 28 (1c in hex) to the requested list
   option dhcp-parameter-request-list =
     concat ( option dhcp-parameter-request-list, 1c );
}

I have no idea why this may have been working earlier.

regards,
-glenn

On 09/11/11 20:15, Masao Kitamura wrote:
> Hello everyone,
>
> I'm having the same problem this person (Howard Wang) had back in 2006:
>
> https://lists.isc.org/pipermail/dhcp-users/2006-May/000763.html
>
> I will try to answer all of Simon's questions based on that thread.
>
> Here is some background:
>
> Our network had a DNS hardware failure recently, which caused us to
> reorganize the network slightly, but everything has been restored
> (Internet access, DHCP, DNS, and a bunch of other services) except for
> workstations' ability to network boot.
>
> Before the failure (a few weeks ago), the DNS and DHCP servers were
> working fine, giving out IPs for normal release/renewals and also
> allowing successful network boots (leading to fully-automatic
> Kickstart installations).
>
> Some background on our network setup:
>
> - There are basically two relevant subnets: 70.0/28 and 71.0/24
> - Running ISC DHCP (isc-dhcpd-V3.0.6) on Ubuntu Server 8.04.4 at
> ###.###.71.254 (this server is also the router)
> - Running BIND 9.7.3 on Ubuntu Server 11.04 at ###.###.70.1
> - TFTPD-HPA server (aka. web101) is at ###.###.70.3  (we changed the
> IP after network failure, but the TFTPD server config has not changed)
> - All workstations and servers (both subnets) currently have IPs from
> the DHCP server and are able to release/renew normally
> - All workstations and servers (both subnets) are able to query the
> DNS server and access the Internet normally
> - No dynamic DNS, only static: DHCP MAC-to-host, and DNS host-to-IP
>
> This is the current network booting process (and failure point):
>
> - User instructs workstation to network boot (at bootup), which then
> asks for an IP using DHCP
> - Workstation (client) sends a broadcast DHCPDISCOVER packet
> - DHCP server (70.14 or 71.254, depending on the workstation subnet)
> sends a broadcast DHCPOFFER packet
> - Workstation does not respond, but instead, loops back and sends
> another DHCPDISCOVER packet
>
> The strange part:  when the same workstation is fully booted into
> Ubuntu Desktop (manual install), it can release/renew just fine
> (discover, offer, request, ack).
>
> Here are the subnet declarations from dhcpd.conf:
>
>    subnet ###.###.70.0 netmask 255.255.255.240 {
>      option routers ###.###.70.14;
>      default-lease-time 4400;
>    }
>
>    subnet ###.###.71.0 netmask 255.255.255.0 {
>      always-broadcast true;
>      default-lease-time 4400;
>      option routers ###.###.71.254;
>      option broadcast-address ###.###.71.255;
>
>      pool {
>        max-lease-time 300;
>        range ###.###.71.230 ###.###.71.253;
>        allow unknown-clients;
>      }
>    }
>
> We use "use-host-decl-names" so DHCP maps MAC-to-hostname, and DNS
> maps hostname-to-IP:
>
>    use-host-decl-names on;
>
> The syntax used in all of these config excerpts should be OK since it
> was working fine like this a few weeks ago.
>
> A couple workstations on the 70.0/28 network (one for Wireshark, one
> for doing the network boot and release/renewal tests)
>
>    host linux301 { hardware ethernet 00:1b:78:a9:4a:ae; fixed-address
> linux301; next-server web101; }
>    host linux303 { hardware ethernet 00:1b:78:a9:49:35; fixed-address
> linux303; next-server web101; }
>
> A couple workstations on the 71.0/23 network (again, one for
> Wireshark, one for doing the network boot and release/renewal tests)
>
>    host linux107 { hardware ethernet 00:1b:78:a9:4b:5a; fixed-address
> linux107; next-server web101; }
>    host linux204 { hardware ethernet 00:1b:78:a9:4b:44; fixed-address
> linux204; next-server web101; }
>
> "web101" (a web server) is also the TFTPD-HPA server, which was
> working fine with network booting before the network failure.
>
> So, I ran wireshark and I compared packets from the "OK case" against
> the "FAIL case", explained here:
>
> OK case = fully-booted workstation release/renew: discover, offer,
> request, and ack packets
> FAIL case = network boot attempt: discover, offer, (no response, then
> looping), discover, offer...
>
> Specifically, I diff'ed the DHCPOFFER packets in both cases (full text
> exports from Wireshark of each packet):
>
> These are six bytes only found in the DHCPOFFER packet in the OK case:
>
> +    Option: (t=28,l=4) Broadcast Address = ###.###.71.255
> +        Option: (28) Broadcast Address
> +        Length: 4
> +        Value: 9DF247FF
>
>   (1 byte for Code, 1 byte for Length, +4 address bytes = 6 bytes
> total,  according to RFC2132)
>
> ...which explains this other part of the same diff:
>
> -    Length: 316
> -    Checksum: 0x1a7c [validation disabled]
> +    Length: 322
> +    Checksum: 0xa5b7 [validation disabled]
>
> So, I figured maybe differences in the DISCOVER packets were causing
> the differences in the OFFER packets.
>
> So, diff'ed the DHCPDISCOVER packets (in both cases, sent from the
> same client, running wireshark on the DHCP server):
>
> Notable differences (maybe irrelevant, but they stood out), broadcast
> on FAIL case, unicast on OK case:
>
> @@ -56,10 +56,10 @@
>       Hardware type: Ethernet
>       Hardware address length: 6
>       Hops: 0
> -    Transaction ID: 0x7aa94480
> -    Seconds elapsed: 8
> -    Bootp flags: 0x8000 (Broadcast)
> -        1... .... .... .... = Broadcast flag: Broadcast
> +    Transaction ID: 0x6d9ad047
> +    Seconds elapsed: 6
> +    Bootp flags: 0x0000 (Unicast)
> +        0... .... .... .... = Broadcast flag: Unicast
>           .000 0000 0000 0000 = Reserved flags: 0x0000
>       Client IP address: 0.0.0.0 (0.0.0.0)
>       Your (client) IP address: 0.0.0.0 (0.0.0.0)
>
> Also, the parameter request list is different (FAIL case packet is
> much shorter, but OK case requests a "Broadcast Address"):
>
>           Value: 01
> -    Option: (t=55,l=24) Parameter Request List
> +    Option: (t=50,l=4) Requested IP Address = ###.###.71.123
> +        Option: (50) Requested IP Address
> +        Length: 4
> +        Value: 9DF2477B
> +    Option: (t=12,l=8) Host Name = "kickseed"
> +        Option: (12) Host Name
> +        Length: 8
> +        Value: 6B69636B73656564
> +    Option: (t=55,l=13) Parameter Request List
>           Option: (55) Parameter Request List
> -        Length: 24
> -        Value: 01020305060B0C0D0F1011122B363C438081828384858687
> +        Length: 13
> +        Value: 011C02030F06770C2C2F1A792A
>           1 = Subnet Mask
> +        28 = Broadcast Address
>           2 = Time Offset
>           3 = Router
> -        5 = Name Server
> +        15 = Domain Name
>           6 = Domain Name Server
> -        11 = Resource Location Server
> +        119 = Domain Search [TODO]
>           12 = Host Name
> -        13 = Boot File Size
> (more FAIL case packet requested parameters, about 20 more)
> ....
> (and now the final five OK case requested parameters)...
> +        44 = NetBIOS over TCP/IP Name Server
> +        47 = NetBIOS over TCP/IP Scope
> +        26 = Interface MTU
> +        121 = Classless Static Route
> +        42 = Network Time Protocol Servers
>       End Option
>       Padding
>
> I still strongly believe the problem is the missing 6 bytes from the
> DHCPOFFER packet in the FAIL (network boot DHCP) case.
>
> So, one of the following probably needs to happen:
>
> 1. The DHCP client needs to specifically request a broadcast DHCPOFFER, or...
> 2. The DHCP servers needs to be forced to add those 6 bytes (Option:
> (28) Broadcast Address) somehow...
>
> #1 is probably not the case since the clients were network booting
> just fine (a few weeks ago) before the network failure.
>
> Does anyone know how to accomplish #2?
>
> I've also tried (from the DHCP handbook):
>
> - forcing the DHCP server to always-broadcast, no effect
> - setting up a DHCP relay agent on Router/DHCP server (not needed, but
> just tried it anyway)
>
> I've also checked the ARP cache tables on both the server and client
> sides (both sides successfully cache the correct IP->MAC mapping).
>
> Just to clarify, I ran these tests on both subnets (70.0/28 and
> 71.0/24) just to make sure crossing subnets wasn't the problem.
>
> The syslog or /var/log/messages does not offer anything more of
> interest besides the basic DHCP log lines.
>
> Any help would be greatly appreciated,
>
> Masao
> _______________________________________________