BIND 10 #2974: Implement Hooks Framework (part 1) - all but library loading and unloading

Mon Jun 10 19:43:54 UTC 2013

#2974: Implement Hooks Framework (part 1) - all but library loading and unloading
-------------------------------------+-------------------------------------
            Reporter:  stephen       |                        Owner:  tomek
                Type:  task          |                       Status:
            Priority:  medium        |  reviewing
           Component:  Unclassified  |                    Milestone:
            Keywords:                |  Sprint-DHCP-20130606
           Sensitive:  0             |                   Resolution:
         Sub-Project:  DHCP          |                 CVSS Scoring:
Estimated Difficulty:  0             |              Defect Severity:  N/A
         Total Hours:  0             |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by stephen):

 * owner:  stephen => tomek

Comment:

 Wow, where to start? I'll take this a bit at a time, but first I'd like to
 clear up some terminology:

 * !Hook/Hook Point - used interchageably, this is a point in the code at
 which a call to user-written functions is made.  Each hook has a name and
 each hook can have any number (including 0) of user-written functions
 attached to it.

 * Callout - a user-written function.  This is so-named because the server
 "calls out" to the library to execute a user-written function.

 > My idea for hooks was as discussed yesterday on jabber: For each hook
 point (around 25 defined for DHCPv4 and DHCPv6) there should be a list (or
 any other container that can be iterated over cheaply) of installed
 callouts. The hook execution code on the server-side should look like
 this:

 As agreed on Jabber, the list of callouts attached to each hook is a
 single place (the !CalloutManager), instead of being in per-library
 objects.

 > The code under review implements many features that are not needed in my
 opinion. Here's my recommendation for the user library designer:
 >
 > * if you need to store something for the whole lifetime of the  library,
 allocate it in load() and deallocate in unload()
 >
 > * if you need to store something for packet lifetime, allocate it in
 context_create callout and deallocate in context_destroy callout.
 >
 > * if you need to share something with other libraries, use
 setArgument(). All callouts called after yours will have access to that
 information

 Yes, that's the philosophy behind it.

 > Please move all code to the src/lib/hooks directory.

 I'd come to that conclusion myself when considering the hook code, mainly
 because it will need to include some logging calls.  I'll do that as the
 first task of #2980.

 > There is no Developer's Guide documentation. Please add .dox file that
 explains the big picture. It doesn't have to be long.
 I'll split #2982 into user-documentation (how to write a hook) and
 maintainer documentation (how the hooks framework works).  Regarding the
 former, I'll check with Jeremy as to whether this should be a .dox file or
 an Docbook guide.

 > !ServerHooks class description: "points at in the server" (please remove
 'at').
 Done.

 > registerHook() - this method should be renamed to avoid confusion with
 the hooks registered by the user library.
 The current naming is that !ServerHooks contains "registerHook", which
 makes a hook known to the hooks system.  Libraries use "registerCallout"
 which takes as argument the name of a hook.

 > !ServerHooks() constructor: Do we have a naming convention for hook
 names?
 No - they are just strings.  We can name them as needed.  However, we may
 want to take into account the fact that we plan for "standard callouts" to
 have the same name as the hooks to which they will be attached, and that
 the standard for C/C++ function names is to use lower0case.

 > Since !ServerHooks is created just once and is kept around until the
 server is shut down, it should be a singleton. I don't like the idea of
 using shared_ptr (e.g. in !LibraryHandle constructor). This is against the
 philosophy of shared_ptr that is expected to a garbage collector. No such
 thing is needed for !ServerHooks.
 I'll probably do that, but I'd like to defer it to #2980 to avoid making
 any more changes than absolutely needed at the moment.

 > getIndex()
 > You are mixing exceptions and negative values for reporting errors. I'm
 ok with that, but is there a reason for that?
 No you're right, it should be reported via an exception and this has been
 changed.

 '''src/lib/util/hooks/server_hooks.cc'''[[BR]]
 > We need to have a list of defined hooks.
 The hooks will be different for the different modules (resolver, auth,
 dhcp4 etc.)  Regarding DHCP, I thought we did have a [wiki:DhcpHooks list
 of DHCP hooks].

 > I would prefer to have a separate header file with hook names. That
 header file should have little more than a list of static consts
 withexplanations.
 Hooks are registered by the server, so it is up to the server author how
 they do that.  The updated version of !ServerHooks offers the possibility
 of distributed registration functions.  However, if you want to register
 all the hooks up fron in one module, that's OK as well.

 > It would be useful for at least 2 purposes. First,
 > it would detect any misspellings at compilation time.
 As hook names are strings, I'm not sure how misspellings would be
 detected.

 > For each hook, we need to specify at least the following:
 Yes, we need documentation, agreed.  But this documentation could start
 getting very complex and may be better in a !DocBook.  In part, there are
 three audiences for documentation:

 1. Administrators
 2. User-library programmers
 3. BIND 10 maintainers

 The question is whether the documentation for (2) should be mixed with
 that for (3).

 > I don't like the CONTEXT_CREATE and CONTEXT_DESTROY names. They are too
 vague. Which context you have in mind? I would rename them to
 DHCP6_PKT_CONTEXT_CREATE ...
 These hooks are general hooks managed by the hooks system.  The idea is
 that when a !CalloutHandle is created, CONTEXT_CREATE is automatically
 called: when the handle is destroyed, CONTEXT_DESTROY is called.  A
 !CalloutHandle is supposed to be associated with a "request" that the
 server is processing.

 (I say "request" and not "packet" because in the DNS code, a packet only
 exists when it is received, and when a packet is created before being
 sent.  In the meantime, information about the request is passed round the
 system in a different class.)

 > I have doubts if such callout would be useful as you don't really know
 at this stage what type of packet you're about to get, so you user library
 can't really prepare for it.
 Possibly in DHCP, but for packets received by the DNS resolver (say), all
 packets are requests (except for those that are rejected as being
 invalid).

 '''src/lib/util/tests/server_hooks_unittest.cc'''[[BR]]
 > !DuplicateHooks test.
 > "duplcate" => "duplicate"
 Done.

 > There is no test that checks that index returned for missing hook is -1.
 Well spotted.  Test for missing hook now added.

 src/lib/util/hooks/library_handle.h

 > !LibraryHandle description: "asssociated" - please remove one 's'.
 Done.

 > I have 2 comments about setContext(), getContext(), deleteContext() and
 similar. First, what type of information can be stored here? This data
 will be stored per-library, right? So that info in stored/shared by the
 whole library. Why do we need it? Each library can handle this on its own.
 That's a fair point, removed.  I got carried away by the usefulness of
 boost::any.

 > deregisterCallout() should return bool - a status if the deregistration
 was successful.
 Done.

 > Why does it take callout parameter? Do we allow one library to install
 more than one hook for a given hook point?
 Yes, it is allowed for a library to install multiple callouts on a hook.

 > deregisterAll() - the description should be clear that it deregisters
 all hooks from *this* library. Hooks intalled by other libraries will not
 be affected. I thought that deregisterAll() would deregister all callouts
 from all hook points.
 Fair point, documentation updated.

 > getHookIndex() - why do we need this method here? Is it because you want
 to allow the library to register its own hooks?
 Internally it was used to convert the -1 returned by the
 ServerHooks::getIndex() to an exception.  Since that object will now throw
 an exception, it is not needed and has been removed.

 > If move to the design I proposed, registerCallout() and
 deregisterCallout() methods should be moved to !ServerHooks and be
 declared static. deregisterAll(), calloutsPresent() and callCallouts()
 should be removed.
 !ServerHooks is an object only associated with hook names.  A modified
 !LibraryHandle stays to restrict the registration functions to modifying
 only callouts associated with that library.

 > If we remove the (not needed in my opinion) capabity for one library to
 have multiple hooks, the whole structure will be a lot simpler.
 It will, but the current system allows flexibility.  As we don't know what
 user libraries will be required to do in the future, we should not close
 off options.  For example, a user library may itself dynamically load
 another library and want to install some of those functions as callouts.

 > Here's how I imagine the callouts would work. For each hook point, there
 is a list of installed callouts (from all libraries). The server prepares
 data (couple call  to setArgument()) once and then calls registered
 callouts by iterating over the list.
 That's what is now done.

 > Comment for !CalloutPtr: I understand why the return type is int (there
 are no enums in C),
 Actually there are enums in C.

 > but we need well defined return types (SUCCESS, ERROR, SKIP, etc.).
 SUCCESS and COMPLETE are now public values in !CalloutHandle.  I'll leave
 all positive values as errors - we may decide to log a returned error
 value, so don't want callouts to feel constrained as to what error codes
 they return.

 '''src/lib/util/hooks/library_handle.cc'''[[BR]]
 > What is the reason to split checkHookIndex() and getHookIndex()? It is
 inefficient to call checkHookIndex() if you need to call getHookIndex()
 later anyway.
 This should now be fixed.

 > callCallouts() calls all callouts, stopping if the "skip" blag is set.
 Skip flag was supposed to mean "skip the next operation to be conducted by
 the server".
 I'm afraid that I confused myself here :-(

 '''src/lib/util/hooks/callout_handle.h'''
 > !NoSuchArgument comment:
 > "attempt is made access an argument" =>
 > "attempt is made to access an argument"
 Done.

 > I have strong objection to the lifetime of the !CalloutHandle object
 from the performance perspective. The description says that it is create
 per packet and per library.
 Where did it say that? (See below for an explanation.)

 > How are libraries are supposed to pass data between them? If I
 understand the life span of all the objects, they can't. This is a must-
 have capability.
 As you wrote above, libraries can pass information via arguments.
 However, if two libraries are so tightly-coupled that they pass private
 (i.e. nothing to do with the server) information between them, they
 perhaps should be one library.

 > How/when do you envision this handle to be created and later destroyed?
 According to the description, !CalloutHandle would have exactly the same
 lifetime as Pkt4 or Pkt6 objects, but will be multiplied by the number of
 libraries installed. This has serious issues. Can you describe
 !CalloutHandle object life-cycle? In particular, if you have 10 libraries
 and 20 hook points. How many !CalloutHandle objects (including
 initialization) you need? 200? If that is true, that's a perfromance
 killer.
 I think there's been a misunderstanding.

 There is one !CalloutHandle per request processed by a server.  When
 created, internally it creates a number of "context" objects capable of
 storing name/value pairs.  The number of these objects is equal to the
 number of libraries loaded. (Actually it is lazy creation - they are only
 created when a callout in a particular library creates packet-specific
 context.)

 When the !CalloutHandle is passed to a callout in the first library - any
 callout registered by that library, whatever the hook - the
 getContext/SetContext methods will set and retrieve data from the first
 context object.  When a callout from the second library is called - again,
 any callout registered by the library, whatever the hook -
 getContext/setContext will access the second context object.

 In this way each library has its own context "namespace".  The first
 library can pass per-request information between callouts by setting value
 named "foobar", secure in the knowledge that even if a second library
 (possibly written elsewhere) also uses "foobar", that use will not affect
 the first library's operation.

 > I think the !CalloutHandle should have modified lifespan. It is per
 packet, per hook point, shared among all libraries that installed a
 callout for a given hook point.
 I strongly disagree.  We need a way to pass information between callouts
 in the same library. As I note above, if two libraries coupled enough need
 to pass information between one another, the functions should perhaps be
 in the same library. (Alternatively, they can use {set,set}Argument().)

 > I do think that !ContextCollection, context_collection_ and all
 associated methods that operate on it are not needed. The library can
 store such information on its own if it is needed. See my comments about
 life span above.
 This would make it much more complicated to pass information between
 callouts in the same library.  Yes, a library could do it - but it would
 have to store information keyed by some identification and every library
 requiring such context would have to provide its own mechanism.  That
 identification would still have to be passed from hook to hook throughout
 the server. (A packet address might not be enough - for example, a DNS
 resolver can generate multiple separate queries when processing a request.
 Added to that, unlike the DHCP server, the resolver can be processing many
 separate requests at the same time.  The DNS team have already identified
 a need to track what downstream requests are made when a given request
 appears: the hooks mechanism - with per-request context - provides a
 ready-made way of doing this.)

 Regarding the DHCP server itself, it processes one packet at a time.  If
 this approach is kept, it would be entirely possible for it to use a
 singleton !CalloutHandle and just modify it at every call. (This may mean
 adding some methods to clear out all data, but that is easily done.)

-- 
Ticket URL: <http://bind10.isc.org/ticket/2974#comment:5>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development