BIND 10 #2974: Implement Hooks Framework (part 1) - all but library loading and unloading

Thu Jun 13 11:57:48 UTC 2013

#2974: Implement Hooks Framework (part 1) - all but library loading and unloading
-------------------------------------+-------------------------------------
            Reporter:  stephen       |                        Owner:
                Type:  task          |  stephen
            Priority:  medium        |                       Status:
           Component:  Unclassified  |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-DHCP-20130620
         Sub-Project:  DHCP          |                   Resolution:
Estimated Difficulty:  0             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by tomek):

 * owner:  tomek => stephen

Comment:

 Replying to [comment:8 stephen]:
 > > Not really. Two real life examples:
 > > 1) library A does client classification, library B does logging
 > Taking (1) as an example, the point I was trying to make is that if the
 server doesn't provide classification information, how does library B know
 where to look for the information provided by library A?
 >
 > In any case, they can do it through the argument list.  We may want to
 set a convention that BIND 10 guarantees that no argument name will start
 with a special character (e.g. a "$"), leaving a namespace that inter-
 library communication can use.
 We haven't been talking about the classification, but I have some
 experience from Dibbler that I'd like to apply here. My idea was that the
 server will set the packet class by default as "default" or
 "unclassified". The libraries could then update that value (e.g. to
 "voipphone"). The server would allow to specify that certain subnets (or
 pools within that subnet) are allowed/forbidden to certain class.

 Also, the logging library could also use getArgumentName(). It is still
 not perfect (because the types are not known), but doable.

 > From comment:7
 >
 > > The design does not allow for extra libraries to be loaded at runtime.
 This is something we need to be ready for. You start a server, then
 configure it to use one library, then configure it to use another one. You
 can't assume that the number of libraries is constant.
 > That is handled via reconfiguration (not yet written).  The intention is
 for the configuration to specify an array of libraries: these are read and
 the libraries loaded one at a time.  On a reconfiguration, libraries are
 unloaded and reloaded.
 I don't like that approach. It is equally flawed as the current
 reconfiguration. When you add a new subnet, we toss all existing subnets
 and the put them back in CfgMgr. Anyway, we'll get back to that discussion
 once we start thinking about hooks reconfiguration. I consider this a very
 advanced use case, so it is still far away.

 > > Who is intended user of the public methods in !CallOutManager: server-
 side code or user library? I assume it is server code, right? Again, I
 have trouble understanding how the intended usage model looks like.
 > It's described in the [wiki:Bind10HooksFramework design document].
 !CalloutManager is a server-side construct.  !LibraryHandle - which is a
 restricted interface into !CalloutManager - is made available to callouts.
 Ok.

 > > Who is going to call setLibraryIndex()? I'm concerned with the fact
 that the registerCallout() may be called by the user library, right
 > As described in the [wiki:Bind10HooksFramework design], the library
 index is set by the server-side code when calling the callouts - each
 callout has the library index associated with it.  However, because only
 the !LibraryHandle is made available to the callout, the callout cannot
 see or access that index.
 Ok. That solves my concerns.

 > LibraryHandle::registerCallout() can be called by user-written code in
 one of two places:
 >
 > 1. in the library's load() function (if registering non-standard
 callouts).  In this case, the index is set by the library loading code.
 > 1. in a callout.  User callouts are called by
 CalloutManager::callCallouts(), and this sets the index before calling a
 user callout.
 >
 > > The functions in .h file are getting bigger and bigger.
 > They're not that big - typically where the code is little more than an
 "if" statement.  If you really want, I can move them to the .cc file.
 Let's keep the existing code as it is now, but please write the new code
 in .cc.

 > > CalloutManager::registerCallout() Isn't the
 checkLibraryIndex(current_library_) check redundant? It was already
 checked when setting up current_library_.
 > As noted, it's a sanity check to guard against a programming error.  The
 idea is to get a meaningful message if (for example), this call is made
 after all libraries have been removed.
 Fair enough. Let it stay there then.

 > > Why do you require to have the callouts from the same library be
 grouped together? Michal gave specific example when he wants them to be
 out of order: the encryption + signing
 > The order of callouts is the order that the libraries are specified in
 the configuration.  The extra check is for dynamic addition of callouts -
 if the order is specified as library A then library B, and library A
 dynamically adds a callout on a hook, we guarantee that it will be called
 before any callouts registered by library B for that hook.
 Ok. I would have done it differently (execute them in registration order),
 but I can live with the solution you implemented.

 > > There's 27 hook points for DHCPv6, multiplied by 16000 (current number
 of transactions per second) gives you 432 thousand copies per second.
 > I think you're being overly dramatic.  First off, I doubt if we ever
 will have all 27 hooks with callouts attached to them: if you are checking
 whether there is a callout present before calling it (to avoid overhead
 attached to adding argument information to the !ContextHandle), the copy
 will never be called in those cases.  Secondly, does every packet pass
 through all 27 hook points?  Finally, it's a copy of a few bytes (8 bytes
 * number of callouts on a 64-bit system); that overhead is likely to be
 dwarfed by the code involved any callout called (and the more callouts
 that are called on a hook, the more that overhead is amortised between
 then).
 But I still think there will be logging library that will install a
 callout to every single hook point. I suppose it won't be used every time
 - perhaps as a debugging tool. The solution I have proposed is faster. We
 don't need to implement it now, but let's write it down somewhere. We can
 get back to it once you ask the question "how can we improve performance".
 Sooner or later you'll ask that question :)

 > I suggest we leave it until we do a performance measurement. We'll load
 a library with null callouts on every hook point and measure the overhead
 with the copy in and with it out.  If there is a significant difference,
 I'll remove it.
 Agree. On a related note, I think we should create a wiki page called
 Performance improvement ideas and start adding things there.

 > > deregisterCallout() - that's a tricky removal code. I think I
 understand how it works, but isn't there a simpler way to remove something
 from a vector?
 > I can't think of one other than effectively doing the same:
 > 1. Working through the the vector comparing elements against my test
 element (the equals_to())
 > 1. Moving non-matching elements to lower vector indexes (the
 remove_if())
 > 1. Truncating the vector (erase()).
 > I could use a std::list instead, but I don't think it's worth it for the
 simplification of one function - especially as this particular construct
 is documented in a book for which the reference is supplied.
 Ok. It was just a comment, not really a strong suggestion to rewrite this
 function.

 > '''src/lib/util/tests/callout_manager_unittest.cc'''[[BR]]
 > There are some valid points here that need code modification.  These
 will be addressed when all reviews have ended.  However...
 >
 > > !CalloutManagerTest.!CallCalloutsError: I'm somewhat uneasy with the
 fact that the first library that returns an error will cause other
 libraries to not be called. What if the first library is logger and it
 reports error, because partition for logs is full? Then the code will stop
 calling libraries for access control, host reservations etc. That would
 essentially disable hooks and the server would continue to run in its
 "vanilla" form. This would make a very fragile system, with easily
 triggered domino effect. The first library that falls over will trip all
 others as well.
 > That is a pertinent observation and I'm in two minds here.
 >
 > On the one hand, you are correct with your example in that a failing
 callout could effectively disable the hooks system.  On the other, perhaps
 a callout started modifying the data passed to it, failed and left the
 data in an inconsistent state?  That inconsistency could cause a crash in
 another callout.
 >
 > Allied to this is the question of whether we should have a COMPLETE
 status at all? The original idea was to allow a callout to have the last
 word in whether processing was complete.  But a poorly-written callout -
 for example, returning an unitialised status - could cause random skips.
 >
 > Also, it struck me that the callout manager needs some form of logging
 precisely to record what library (if any) returns an error.  That will be
 added in the next ticket (which adds code to load the libraries); the
 reason it is being delayed is because any error message should include the
 name of the library, not the index.
 >
 > Perhaps the easiest way forwards is to get rid of COMPLETE and to log
 errors returned by callouts but otherwise ignore them.
 Agree.

 The idea that you can have access control hooked into the customer data
 that suddenly stops working and default to basic operation (i.e. server
 happily assigns leases to whoever asks), because some other minor library
 tripped over, is scary.

 The ticket is back with you.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2974#comment:10>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development