BIND 10 #2974: Implement Hooks Framework (part 1) - all but library loading and unloading

Thu Jun 6 14:50:09 UTC 2013

#2974: Implement Hooks Framework (part 1) - all but library loading and unloading
-------------------------------------+-------------------------------------
            Reporter:  stephen       |                        Owner:
                Type:  task          |  stephen
            Priority:  medium        |                       Status:
           Component:  Unclassified  |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-DHCP-20130606
         Sub-Project:  DHCP          |                   Resolution:
Estimated Difficulty:  0             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by tomek):

 * owner:  tomek => stephen

Comment:

 Reviewing all changes on brach up to commit
 64bc2cfab72d67bf9e4f3cd19fd6ebb586208e67.

 This is a bit of unusual review. I did review the code and then had a
 discussion with Stephen on jabber. Many of the points below were
 agreed on jabber and they will cause the code to be refactored. This
 means that in many cases tests review is not needed at this stage as
 the tests will be refactored as well.

 == Alternative design proposal ==

 My idea for hooks was as discussed yesterday on jabber: For each hook
 point (around 25 defined for DHCPv4 and DHCPv6) there should be a list
 (or any other container that can be iterated over cheaply) of
 installed callouts. The hook execution code on the server-side should
 look like this:

 {{{
 if (list for a given hook point is not empty) {

   CalloutHandle handle;
   handle.setArgument("packet", pkt6);
   handle.setArgument(..., ...);

   for (each callout on the list) {
      status = execute_callout(handle);
      if (status != SUCCESS) {
        // log issue or skip status code here;
        break;
      }
   }

   clean up parameters in callouthandle; // if needed
 }
 }}}

 !CalloutHandle is the only structure that is exposed to callouts. All
 other structures are kept on the server-side and are never seen or
 accessible on user side.

 That looks like an optimal solution from the performance
 perspective. Note several aspects: arguments are prepared only if
 there are callouts that will use them. Arguments are prepared only
 once for each hook point. Having 1000 libraries that do not install
 anything does not introduce any performance penalty. Having 1000
 libraries that install callouts for the same hook point slows down only
 that particular hook point.

 The callouts container is never exposed to the callouts, so callout
 separation is maintained (one library cannot unistall another's
 library callout). This design allows passing modified data between
 callouts. For example you have 2 libraries: one handles client
 classification and second handles logging. The first library looks at
 the packet and decides to assign it to class foo (using setArgument()
 or modifying passed Pkt6Ptr). The second library will be able to use
 that extra argument and log it.

 The code under review implements many features that are not needed in
 my opinion. Here's my recommendation for the user library designer:
 - if you need to store something for the whole lifetime of the
 library, allocate it in load() and deallocate in unload()
 - if you need to store something for packet lifetime, allocate it in
 constext_create callout and deallocate in context_destroy callout.
 - if you need to share something with other libraries, use
 setArgument(). All callouts called after yours will have access to
 that information

 == Other general comments ==

 Please move all code to the src/lib/hooks directory. This is an equal
 feature compared to log or exceptions, so it warrants a separate
 directory. Also, the hooks code being at the same directory level as
 tests directory is confusing.

 There is no Developer's Guide documentation. Please
 add .dox file that explains the big picture. It doesn't
 have to be long. This may be a separate ticket if you like.

 We did discuss the hook point names registration procedure yesterday.
 I'd prefer to have it as a separate ticket, but it is also ok to have it
 as part of this (already large) ticket.

 == Specific file comments ==

 '''src/lib/util/hooks/server_hooks.h'''

 !ServerHooks class description: "points at in the server" (please remove
 'at').

 registerHook() - this method should be renamed to avoid confusion with
 the hooks registered by the user library. I would call it
 registerHookType(), registerHookPoint() or registerHookName(). The
 comment should also explain that this method introduces a hook type,
 e.g. "PACKET_PARSED" or similar, not the callout for a given hook
 (that was the first thing that came to my mind when I saw
 "registerHook". It would be helpful to have an illustrative
 example. If there are 10 user libraries that want to do something with
 received packet, registerHook() will be called only once. In fact, I
 think it would be great if the Developer's Guide could describe the
 process. Please pick one simple hook, and walk through the 3 things.
 First is about initialzing hook names. The second is about registering
 callouts from libraries. The third one is about actual packet
 processing.

 ServerHooks() constructor: Do we have a naming convention for hook
 names? From what you used, I assume that it is lower case with
 underscores and digits. I would prefer capital letters (due to
 resemblance to C/C++ constants), but I can live with lowercase.

 Since !ServerHooks is created just once and is kept around until the
 server is shut down, it should be a singleton. I don't like the idea
 of using shared_ptr (e.g. in !LibraryHandle constructor). This is
 against the philosophy of shared_ptr that is expected to a garbage
 collector. No such thing is needed for !ServerHooks.

 getIndex()
 You are mixing exceptions and negative values for reporting errors.
 I'm ok with that, but is there a reason for that?

 '''src/lib/util/hooks/server_hooks.cc'''

 We need to have a list of defined hooks. It should be part of the
 Developer's guide documentation. In my opinion it should be done as
 soon as possible ("now"), otherwise we'll start adding hooks and they
 will not be documented properly. Filling in the gaps later will be
 much more difficult and we'll never be sure if we covered
 everything. That list should be updated every time a new hook is
 added. For now, we have only two hooks, but the list will grow fast
 very soon.

 I would prefer to have a separate header file with hook names. That
 header file should have little more than a list of static consts with
 explanations. It would be useful for at least 2 purposes. First,
 it would detect any misspellings at compilation time. Second, it could
 be heavily extended with comments for each hook point. I plan to
 leverage that when writing documentation.

 For each hook, we need to specify at least the following:
 - hook name (e.g. context_create)
 - hook description (what does the hook do, what callout are and are
   not allowed to do. This is very important. In some cases various
   operation will not be allowed or at least will not make sense, e.g.
   when processing options it doesn't make sense to modify incoming
   packet buffer - it was already parsed and is really useless at this
   stage)
 - hook parameters (what parameters are passed)
 - allowed hook return parameters (some hooks will allow different set
   of return codes, e.g. SKIP will sometimes not be allowed)

 I don't have any specific recommendation, but the format we define
 here should be consistent and used in all hook types. I was thinking
 about a macro around ServerHooks::registerHook() with those parameters
 that would be ignored by the compiler, but could be harvested with
 clever grep or a short perl script.

 I don't like the CONTEXT_CREATE and CONTEXT_DESTROY names. They
 are too vague. Which context you have in mind? I would rename them
 to DHCP6_PKT_CONTEXT_CREATE (if you meant DHCPv6 only),
 DHCP4_PKT_CONTEXT_CREATE (if you meant DHCPv4 only),
 DHCP_PKT_CONTEXT_CREATE (if you want them to be shared between both
 DHCPv4 families, which is not a good idea in my opinion).

 Finally, this part of the code is supposed to be shared between DNS
 and DHCP, so introducing DHCP at this stage is not a good idea.  If
 you decide to rename it to PKT_CONTEXT_CREATE (called before receiving
 any packet, DHCPv4, DHCPv6 or DNS), it would be somewhat ok to keep it
 here, but I have doubts if such callout would be useful as you don't
 really know at this stage what type of packet you're about to get, so
 you user library can't really prepare for it.

 '''src/lib/util/tests/server_hooks_unittest.cc'''
 DuplicateHooks test.
 "duplcate" => "duplicate"

 There is no test that checks that index returned for missing hook is -1.

 '''src/lib/util/hooks/library_handle.h'''

 LibraryHandle description: "asssociated" - please remove one 's'.

 I have 2 comments about setContext(), getContext(), deleteContext()
 and similar. First, what type of information can be stored here? This
 data will be stored per-library, right? So that info in stored/shared
 by the whole library. Why do we need it? Each library can handle this
 on its own. Do you expect the server code to call it every time before
 every callout to set up packet pointers, interface names, options and
 other objects that should be passed? If that is so, I strongly object
 to their existence. If we have 10 user libraries, such a thing should
 be called once, not 10 times. And I'm not talking about a wrapper
 method - I'm concerned about performance.

 If this is for library developer to keep some data, then my
 recommendation is to get rid of it. It is not needed. It's a bit like
 saying "here's vector<int> for you. You may use it if you need to
 store integers". The library implementer can store his own data on his
 own. He doesn't need such mechanism.

 deregisterCallout() should return bool - a status if the
 deregistration was successful. Why does it take callout parameter? Do
 we allow one library to install more than one hook for a given hook
 point?

 deregisterAll() - the description should be clear that it deregisters
 all hooks from *this* library. Hooks intalled by other libraries will
 not be affected. I thought that deregisterAll() would deregister all
 callouts from all hook points.

 getHookIndex() - why do we need this method here? Is it because you
 want to allow the library to register its own hooks? If there isn't
 any strong reason for it, the method should be moved to ServerHooks
 class and in my understanding is that it will always return the same
 response for all libraries.

 If move to the design I proposed, registerCallout() and
 deregisterCallout() methods should be moved to ServerHooks and be
 declared static. deregisterAll(), calloutsPresent() and callCallouts()
 should be removed.

 More details why I don't like this current approach: In general, I
 don't like the data layout. Each library has its own set of callouts
 that need to be inspected during every callout and that operation will
 need to be repeated many, many thousand times per second. Let me give
 you some estimates. We currently are able to support 8000 leases/sec
 or 16000 transactions. We had 26 hooks defined, but we know there will
 be more. Some of the hooks are v4, but most are v6. Let's say there
 will be 20 v6 hooks. With 16000 transactions and 20 hooks, this code
 will be exercized 320000 times per second. This is non-trivial
 performance hit.

 If we remove the (not needed in my opinion) capabity for one library
 to have multiple hooks, the whole structure will be a lot simpler. I
 also think we should revisit the way LibraryHandle stores the
 callouts. Right now it imposes performance penalty even when there is
 no callout installed! That is not acceptable.

 Here's how I imagine the callouts would work. For each hook point,
 there is a list of installed callouts (from all libraries). The server
 prepares data (couple calls to setArgument()) once and then calls
 registered callouts by iterating over the list. This is the best
 solution as far as I'm aware from performance perspective and I have 2
 very strong examples to back my claim up. The first one is how
 packet handlers are implemented in Linux kernel. The other example is
 protocol handlers in ANVL. Although ANVL runs on many platforms
 (e.g. Linux), it does not use Linux stack. It has a concept of
 protocol handlers. You can install protocol handler on any layer
 (Ethernet, IP, UDP, DHCP etc.) and do additional checks or
 modifications.

 My comments are often just suggestions. This one is not. If you force
 this design, then you'll get major performance hit and once you tell
 me to start improving performance, the first thing I'll do is to
 rewrite it as explained above.

 Comment for CalloutPtr: I understand why the return type is int (there
 are no enums in C), but we need well defined return types (SUCCESS,
 ERROR, SKIP, etc.). Please add them to some header file. I'm not sure
 which header would be best for it. I would argue that keeping
 CalloutPtr in server_hooks.h is something to consider. User library
 developer
 could then import just one header. But I don't insist on it - keeping
 it in several different headers is fine as well. On the other hand,
 we don't want the library developer to start calling ServerHooks
 methods, so perhaps it is not that good idea after all.

 '''src/lib/util/hooks/library_handle.cc'''

 What is the reason to split checkHookIndex() and getHookIndex()? It is
 inefficient to call checkHookIndex() if you need to call
 getHookIndex() later anyway.

 callCallouts() calls all callouts, stopping if the "skip" blag is set.
 Skip flag was supposed to mean "skip the next operation to be
 conducted by the server". The code uses it as "skip any other callouts
 registered by this library". We never even agreed to have more than one
 callout registered by the same library. And I strong object to it.
 If user wants to have multiple functions called, let him handle this
 in his callout function. We discussed COMPLETE status code (or a
 flag), which means that no other callbacks are called after this
 callback. Is it implemented anywhere?

 I believe callCallouts() method has a bug in it. I thought "skip"
 meant "callout did the work, server will skip the next p rocessing
 step". If I understand the code correctly, currently skip means "skip
 other call outs installed by this library".

 If you agree with the design I proposed, this method could be moved to
 ServerHooks and it would be called by the DHCP server code.

 '''src/lib/util/hooks/callout_handle.h'''

 NoSuchArgument comment:
 "attempt is made access an argument" =>
 "attempt is made to access an argument"

 I have strong objection to the lifetime of the CalloutHandle object
 from the performance perspective. The description says that it is
 create per packet and per library. How are libraries are supposed to
 pass data between them? If I understand the life span of all the
 objects, they can't. This is a must-have capability.

 How/when do you envision this handle to be created and later
 destroyed? According to the description, CalloutHandle would have
 exactly the same lifetime as Pkt4 or Pkt6 objects, but will be
 multiplied by the number of libraries installed. This has serious
 issues. Can you describe CalloutHandle object life-cycle?
 In particular, if you have 10 libraries and 20 hook points. How many
 CalloutHandle objects (including initialization) you need? 200?
 If that is true, that's a perfromance killer.

 CalloutHandle class description explains Arguments, per-packet context
 and per-library context. If you agree with my suggestion in my
 previous paragraph, most of it is no longer needed.

 I think the CalloutHandle should have modified lifespan. It is per
 packet, per hook point, shared among all libraries that installed a
 callout for a given hook point.

 I do think that ContextCollection, context_collection_ and all
 associated methods that operate on it are not needed. The library can
 store such information on its own if it is needed. See my comments
 about life span above.

 I did not review unit-tests for callout_handle and library_handle as
 it is not clear at this point which part of the code will be kept
 after refactoring and which will be discarded.

 I'm sorry if the review is harsh, but it is essential that there are
 no uncecessary performance hits here. Currently performance for
 memfile is 16000 transactions/sec. There are 27 hook points planned
 for DHCPv6 now, but we know that we are missing some (e.g. select
 subnet). In the typical deployment, there will be at least two
 libraries installed (one for client classification and second for host
 reservation). We are talking about code that is being exersized at
 least 864 thousand times per second. This is really performance
 critical section. Every little "if" statement counts. Otherwise we
 could spend a lot of time removing the bottlenecks from the code next
 year.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2974#comment:4>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development