BIND 10 #192: Data source hotspot cache

BIND 10 Development do-not-reply at isc.org
Wed Jun 16 04:15:55 UTC 2010


#192: Data source hotspot cache
-------------------------+--------------------------------------------------
 Reporter:  each         |        Owner:  jinmei                                        
     Type:  enhancement  |       Status:  reviewing                                     
 Priority:  major        |    Milestone:  05. 3rd Incremental Release: Serious Secondary
Component:  b10-auth     |   Resolution:                                                
 Keywords:               |    Sensitive:  0                                             
-------------------------+--------------------------------------------------

Comment(by jinmei):

 Review is ongoing, but I'll make my comments cache.{cc,h} with a couple of
 general comments first.


 general:
  - I've made some trivial (I think) cleanups directly on the branch.

  - documentation is much better than other datasrc files (btw we need it
 for them also), but IMO it's still not sufficient.  See specific points
 below.  Also, I suggest you actually convert the doxygen markups into HTML
 files and browse the results.

 cache.h

  - CacheEntry class
    -  why is this defined in the header file?  This seems to be an
 internal type only used within the CacheNode class.  If so, this should be
 hidden from others (move it to .cc).  Plus, although this largely a matter
 of taste, this looks more like a "struct" rather than a "class".  If this
 must be defined in a public header file, I strongly recommend hiding
 member variables as private.  This should be pretty obvious, but to be
 redundant: as we can see in my experimental rbt patch, hiding the details
 will help us when we want to customize it further, without worrying about
 breaking compatibility.
    - documentation: commenting about the actual implementation would also
 be useful, but I'd first like to see a conceptual description of "what
 this is".

  - Cachenode class
    - I normally expect more general documentation about a class of this
 size (overall design, any non trivial design decision, etc).  I'd also
 expect exception considerations for each public member function.
    - isValid()/getRRset/getFlags() aren't explicitly tested.  they should.
    - I'm not sure if copying/assignment is expected to be allowed for
 CacheNode objects, but if not, they should be explicitly prohibited. (it
 seems to be CacheNodes are generally passed as (shared) pointers, and can
 be non-copyable)
    - especially if this can be non copyable, I suggest you consider using
 pimpl for this class to reduce dependency on the DNS related class
 implementations in the header file.  I know it's a common controversial
 argument of ours.  And I also understand this may be a case where the
 benefit of pimpl may not outweigh its overhead because this class is for
 performance optimization.  But since this is cached information and the
 overhead of creation time should be relatively minor, I still personally
 think we should use pimpl here.  See also the related comment on HotCache.
 Note also that you've already partially adopted pimpl for this class via
 the entry_ member variable.
    - We should really, really, seriously reconsider the current design of
 making some of the class member variables public (from prev to qtype).
 This is something we should always avoid unless there's a very strong
 reason to do so, and I don't see it in this case.  Actually, this is
 mostly private information only used by HotCache, so it's much better to
 encapsulate such implementation specific information in CacheEntryPtr,
 define it within .cc, thereby hiding the existence of such notion from
 others in the first place.

  - QTuple class
   - AFAICS, this is only used within the cache implementation. it should
 be hidden in the implementation file.  I noted the TODO, but unless/until
 we really do this it's better to hide it.
   - On the other hand, if we really need a public class like QTuple,
 dns::Question is almost exactly what's needed here.  Why not reusing it?
 (we could add operator< if that's the problem).
   - Constructor: why passing object copies (which is expensive), rather
 than references?
   - document operator<.  why we need this etc.  note also that
 Name::operator< is quite expensive (it calls the generic compare()
 method).
   - (maybe infeasible if we accept the first comment but) we need tests
 for this class.
   - is this (intended to be) copyable?  If not, make it non copyable
 explicitly.

  - HotCache class
   - We tend to abbreviate things, but isn't it better to name it more
 clearly, e.g., HotSpotCache?
   - Class description: describing internal implementation (using
 std::map/double linked list, etc) is good, but I'd separate the
 documentation for the high level interface from that for the actual
 implementation.  For example, entries are managed using a LRU-based
 algorithm is high level, while it's implemented via a doubly linked list
 is an implementation detail.  I personally separate the latter into a
 different paragraph, beginning with "note to developers".
   - Class description: like Cachenode, I'd add more design decisions, non
 trivial points, etc.  That would include why the default lifetime is 30
 seconds, cache and negative cache, and the rationale of using std::map,
 what are positive and negative caches, etc.
   - As for design decision, I guess we need to consider multi-thread
 considerations.  Although we are currently not using threads, and I'm not
 sure if we'll do in our auth server in future versions, our implementation
 so far is (in my understanding) generally thread safe, except, perhaps
 within sqlite3/ASIO library implementations.  So the access to this cache
 is going to the first place that is not thread safe.  We should at least
 note that, and describe our future plans on that point.  Note also that
 the data source library is public, and some external developers may want
 to use it with threads.
   - Class description: I'm not sure if it's a good idea to mention
 DataSrc::doQueryTask() here, because the cache may be used in other
 contexts.  I'd rather make the description generic, while noting
 doQueryTask() uses the HotCache in its own description.
   - Design: I don't think making this as a static is a good idea.  One
 practical reason for this is that we may actually want to have multiple
 different caches if/when we support views.  And, of course, use of
 static/global objects should be avoided at all costs.  In this specific
 case, I don't see an inevitable reason why we must define it as a non
 local static object.  The cache isn't needed unless query processing
 happens, so it should be reasonable to assume the application can prepare
 it by the time it needs the cache.  I strongly suggest you reconsider this
 design.
   - A related point: I saw comments about the initialization of
 Query::cache(0).  Trying to avoid the static initialization order fiasco
 just by noting it in a comment isn't a good practice.  There are many non-
 trivial pitfalls around this beast, so non-trivial that we can't avoid
 them just by hoping the developers read the comment and do the right
 thing.  If we really, really, really need it as a static object, we must
 avoid the fiasco more explicitly using language features.  But again,
 please reconsider the design in the first place before trying to make it
 work with complicated tricks.
   - Design: now the controversial point:-) I strongly suggest pimpl be
 used for this class.  This class is a perfect example of the case where
 the advantages of pimpl outweigh its overhead: having an external
 definition (std::map) inside the implementation, and the overhead of
 constructing/destructing the object is negligible.  I now hope we've all
 learned how advantageous it is if we minimize dependencies on other
 implementations in our public header files via the turmoil with the non
 boost ASIO, and I hope my argument is now more convincing.
   - I believe the HotCache class is intended to be non-copyable.  Please
 explicitly make it so.
   - description for member functions in general: same comments as that for
 Cachenode.
   - naming: if we follow our method name convention, cache() is better
 named setCacheEntry(), s/ncache/setNegativeCacheEntry()/,
 s/retrieve()/getCache, etc?  Maybe a matter of taste, just a comment.
   - description about cache(): it should describe what happens if there's
 already an entry that matches the key.  Also describe what if it's a
 negative cache.  These case should also be tested.
   - description about ncache().  Likewise.
   - methods like retrieve() should normally be definable as a const member
 function.  Hmm, okay the implementation explains why this can't be a
 const.  The method description should explain that.
   - retrieve() signature: why passing object copies?
   - setSlots description: it states "up to the number of RRsets in the
 data source".  I don't understand this.  Which data source is it?
 Whatever it means, since there can be negative cache entries, this
 description still doesn't make sense to me.
   - constant of 30: I'd define it as a static const class member with
 doxygen description, and use it other places consistently, so that the
 magic number appears only once (and we make sure it's described).
   - count_ might be redundant because it should be identical to
 map_.size().
   - LRU list: why using in-house list, rather than std::list? (see also
 the comment about the implementation).

 cache.cc
  - CacheNode::~CacheNode(): if you use the default destructor you can
    omit the declaration and the definition.
  - HotCache::~HotCache(): I don't understand what the for loop does.
    Does this do something correct?  Also recall my other comment about
    why using in-house list.  If we use std::list, we don't even have
    to bother to clean up these things.
  - HotCache::retrieve(): map::operator[] creates a new entry if it
    doesn't have one for the key.  It's probably not necessarily
    incorrect, but I suspect it's error-prone behavior.  For example,
    consider the case where the caller continuously calls retrieve()
    for different non-existent keys.  I suggest you use map::find()
    instead.
  - HotCache::insert(): why do we need to have a while loop?
    if count_ > slots_ > 0, count_ must be slots_ + 1, and we should
    only need a single remove operation.  if this doesn't hold it's an
    internal bug, and I'd rather let it fail with assert (which I
    prefer) or exception than preparing for the "impossible case" with
    a complicated logic.
  - HotCache::ncache(): the condition of ANY()s isn't documented (and
    not intuitive).

-- 
Ticket URL: <http://bind10.isc.org/ticket/192#comment:6>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list