BIND 10 #1264: Design document for DHCP benchmarking utility

Fri Oct 21 10:51:43 UTC 2011

#1264: Design document for DHCP benchmarking utility
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  johnd
  stephen                            |                Status:  reviewing
                       Type:  task   |             Milestone:  Sprint-
                   Priority:  major  |  DHCP-20111026
                  Component:  dhcp   |            Resolution:
                   Keywords:         |             Sensitive:  0
            Defect Severity:  N/A    |           Sub-Project:  DHCP
Feature Depending on Ticket:         |  Estimated Difficulty:  0
        Add Hours to Ticket:  0      |           Total Hours:  0
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by stephen):

 * owner:  stephen => johnd

Comment:

 Additional comments can be found in [ticket:1263#comment:2 comment 2] and
 [ticket:1263#comment:3 comment 3] of #1263.

 Replying to [ticket:1263#comment:3 comment 3 johnd]

 > I'm thinking, for the initial version, from these suggestions adding:
 > :
 > * Explicit description of what the exit status means. Perhaps just
 status 1 if any dropped packets, unless other pass criteria options are
 added.
 For now, I think that would be sufficient.

 > Should any of the other suggested options be in the initial version?
 See below.

 Replying to [comment:5 johnd]

 > once free-formatted output is settled.
 My initial thought was date/time but given the difficulty of parsing
 various formats, I think that for each event, a simple time (in seconds)
 since the start of the test expressed as a floating-point number would be
 sufficient.  So the output format would be:
 {{{
 perfdhcp version
 Command line options
 Date/time test started
 send_time,receive_time[,send_time_packet_2,receive_time_packet_2]
 send_time,receive_time[,send_time_packet_2,receive_time_packet_2]
 :
 }}}
 (with the date in yyyy-mm-dd format to avoid confusion between dd/mm/yy
 and mm/dd/yy.)

 The last two columns are optional, being absent if only the initial packet
 exchange is measured.  And if a packet is lost, put -1.0 in the receive
 field (and in the other two fields if a full 4-way packet exchange is
 being measured).

 I've included the version and other data in the output as the first few
 lines; Tomek's point about missing information when reproducing a problem
 is very pertinent.

 Replying to [ticket:1263#comment:2 tomek]

 > I have couple of comments.
 Hmm... this is a definition of "couple" of which I was previously unaware
 :-)

 > In DHCPv6 we need 3 transmission modes:
 That sounds reasonable.  Re-reading the perfdhcp command line, the use of
 the command-line argument is inconsistent: for IPv4, it is the address
 ''to'' which packets are sent, in IPv6 it is the interface ''from'' which
 packets are sent.

 I suggest that the interface be specified with the -l option.  This
 already sets the local hostname/address for a IPv4 packet exchange - it
 could specify the local interface for an IPv6 exchange.  The target to
 which packets are should be the argument to the command line.  To simplify
 things for V6 use, as well as allowing an IPv6 address, the program should
 also recognise the strings "all" (for All_DHCP_Relay_Agents_and_Servers)
 and "servers" (for All_DHCP_Servers)

 > There is also rapid-commit option that, when supported by both server
 and client, will cause SOLICIT to be answered immediately with REPLY. That
 is not needed in first version, but it is something that we should plan to
 implement later.
 Agreed.  Ticket #1334 has been raised for it and put on the general
 backlog.

 > Regarding the -r option, it is useful, but it is not enough.
 > :
 > To meet those usages, -time (or -t) option should be added that
 specifies duration...
 Agreed.  Since r * t = n, the command parser should accept any two and
 calculate the third (objecting if all three are given).  I would suggest
 that the default for r be something like 10/second; if neither t nor n is
 specified, assume a value of n equal to a 2^32^ - 1, i.e. essentially
 unlimited.

 (As an aside, allowing very large values for n complicates the mapping of
 packet ID to information about the exchange as a simple pre-allocated
 array cannot be used.  However some form of double-buffer - where a buffer
 can be reused once a the time equal to (time last packet using this buffer
 was sent + packet drop time) has passed - should work.)

 > Other things that we should consider at a later date is turning this
 into stress testing. Let's call it --torture or similar. It starts sending
 data at some rate and increses it slowly until server starts dropping.
 That is the maximum rate the server can handle.
 Agreed.  Ticket #1335 has been raised for it and put on the general
 backlog.

 > There should be option to conclude (fail) the test if there is a single
 drop. We don't want to wait 12 hours to see that 5 seconds after test
 started something broke. Not sure how to implement this in the most
 convenient way. Maybe --drop-threshold that specified acceptable amount of
 dropped traffic? It seems useful to have it specified in both percentage
 and absolute numbers.
 I suggest that it be specified as simple packets for now with something
 like "-t<lost-packets>".  If not specified, there is no limit to the
 number of dropped packets.

 > Besides of using dhcperf as manual tool, it will also be used as
 automated test. In that case it should have clearly state if specified
 pass criteria are met or not. Something that could be easily parsed by
 automated environments.
 >
 > Make sure that return code will specify status.
 This sounds useful, although I'm not clear what you mean here.  In any
 case, I think it is something that can be added later.  Could you raise a
 ticket for it?

 for now, as suggested above, the return code should be 1 if any packets
 were dropped.

 > For automated test tools it is very convenient to print out command-line
 parameters. That's a practical experience. I received many logs that were
 useless because it was not possible to reproduce the problem due to
 missing information about used parameters.
 See above when the "-o" option is specified.  Do you think there is a need
 to echo them if used interactively, at a terminal?

 > There is no --version parameter. Tool should also print out its version
 when started. See above comment about reproduction concerns.
 Agreed.  I would make the "-v" switch do this.

 As to verbose option, I suggest this be merged with the debug option.  In
 a small program such as this it is probably more different areas of the
 code you want debugging information for than different levels of debug
 information.  If, instead of a "debug-level", the argument to the -x
 switch were a "debug-mask", different pieces of debug information could be
 output by setting bits in the mask value.  Displaying the packet contents
 and communication would then require using -x with a value that has the
 appropriate bits set.

 > Another feature that could make this tool much more powerful is the
 ability to specify additional options. While it would be great to have
 custom option definition framework, for now we can do something much
 simpler. A command-line option that specifies extra payload that is
 appended to the message. A proper warning "it is user's responsibility to
 take care a proper format". For example, to specify that I want to send
 option type 100 with length 2 containg 0xabcd, I could do: --extra-data
 00:64:00:02:ab:cd. This seems simple enough (parse command-line + a single
 memcpy will do the trick)
 >
 > Another useful thing would to be to specify which options client should
 request. That is also not too difficult. This is just adding 8 bit(v4) or
 16bit (v6) integers to PRL or ORO, respectively. Usage could be simple:
 --option 45 --option 5.
 Added #1336 to the backlog.

 > It would be useful to elaborate on reply verification. V4 server
 responding with NACK is ok or not? What about v6 server sending REPLY with
 status-code=no-addrs-avail? That is another thing we could eventually add
 as a feature. In some scenarios negative response as considered a proper
 one (test passed) and in others it is not (test failed). Make sure that
 the verification could be tuneable. For now it can be simple, but it will
 be more complex later.
 Added #1337 to the backlog.

 The options suggested for immediate inclusion do not seem too much work,
 and tickets have been raised for the more complicated stuff.  I suggest
 the design document is updated with these suggestions then we close the
 ticket and start work on implementation.  Given our time constraints for
 the work, we need to do this ASAP.

-- 
Ticket URL: <http://bind10.isc.org/ticket/1264#comment:7>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development