BIND 10 #2850: Define and implement ZoneTableSegmentMapped

Mon May 6 08:28:52 UTC 2013

#2850: Define and implement ZoneTableSegmentMapped
-------------------------------------+-------------------------------------
            Reporter:  jinmei        |                        Owner:
                Type:  task          |  jinmei
            Priority:  medium        |                       Status:
           Component:  data source   |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130514
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  5             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |  shared memory data source
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by muks):

 * owner:  muks => jinmei

Comment:

 Hi Jinmei

 Replying to [comment:12 jinmei]:
 > '''zone_table_segment_mapped_unittest.cc'''
 >
 > - I guess some comments on the main code would require more tests
 >   (checking exception guarantee, invalid mode, etc)

 These have been added as far as possible. See the end of this comment
 for comments about `READ_WRITE` mode.

 I have not added tests for `ResetFailedAndSegmentCleared`, as it's not
 possible to hotwire (and corrupt) an already open segment for the same
 underlying mapped file.

 > - reset test: it generally seems to lack (some) necessary checks for
 >   each mode:
 >   - For CREATE: we need to confirm it really creates a new file or at
 >     least removes any existing content, and the resulting segment is
 >     writable.
 >   - For READ_WRITE: in case the segment already exists we need to
 >     confirm it doesn't change existing content (except for the
 >     checksum), and the resulting segment is writable.
 >   - For READ_ONLY: we need to confirm it doesn't change any existing
 >     content, and the resulting segment is not writable.

 These are now implemented.

 > - I believe the previous cases can be tested by inspecting/tweaking the
 >   underlying memory segment.  Also, I'd separate these three cases.
 >   Each case is complicated enough, so mixing them would tend to cause
 >   missing cases, and even if it's comprehensive it's not clear for the
 >   reader of the test code (= me).

 Nod. These have been implemented separately above.

 > - As for corrupted data, I think it's better to test it at some
 >   realistic level.  Data corruption will certainly happen some time,
 >   while quite rare, so it's better we can be sure it works as we
 >   expect when it finally happens.

 This is tested for checksums by corrupting it. Also generally, written
 data is validated for equality with what was written before, when the
 same mapped file is opened in read/read+writex mode again.

 Replying to [comment:14 jinmei]:
 > Some comments I've noticed for the code, but this is far from
 > comprehensive or detailed review:
 >
 > - this what() message doesn't seem to be very helpful:
 > {{{#!cpp
 >                         "Error in resetting zone table segment to use "
 > }}}
 >   and repeated multiple times.

 In these cases, the specific error (returned by helper methods) is
 included in the `what()` messages. They are thrown separately due to the
 different exception types.

 > - openReadOnly: this doesn't match the content of the method:
 > {{{#!cpp
 >     // In case there is a checksum mismatch, we throw. We want the
 >     // segment to be automatically destroyed then.
 > }}}
 >   As it doesn't compare checksums.

 This has been updated.

 > - The "load" is probably misleading for `reset()` because it sounds
 >    like loading zone data into memory (parsing zone files or iterating
 >    in some data source, etc).  `reset()` is basically expected to be
 >    non time consuming operation.
 > {{{#!cpp
 >     /// \brief Unload the current memory store (if loaded) and load the
 >     /// specified one.
 > }}}

 The API documentation has been updated to use open and close.

 > I'm not sure I should continue the rest of the review.  For now, I'm
 > only answering the bigger questions (below) and giving the ticket
 > back to you.
 >
 > > I am still not entirely satisfied with the branch. Some points:
 > >
 > > * I think the checksum should be stored separately from the segment,
 and
 > >   there should be no possibility that the memory area for the checksum
 > >   could itself influence its computation. The current code of patching
 0
 > >   during checksum computation is hackish and opens the possibility of
 > >   broken code. Why do this when we can cleanly separate the checksum
 > >   storage?
 >
 > Such an idea simply didn't occurred to me before.  I'm open to this
 > point based on pros and cons considerations, but where else would we
 > store it?  In a separate file?  In that case there's another types of
 > issues such as maintaining two sets of files in a consistent way.  I'm
 > not necessarily saying that's obviously worse than the current
 > approach, but at least it doesn't seem to be obviously better.  Or do
 > you have something different in your mind?

 I had the same idea of storing the checksum in a separate file
 (`".checksum"` appended to the mapped filename). It should be possible
 to maintain consistency using the same advisory lock on the main file,
 within `MemorySegmentMapped`.

 > >   Another option is to remove checksums altogether. It only checks one
 > >   byte of each page for corruption, and I wonder what value there is
 to
 > >   do this, esp. now that we have code to mutually exclude writer and
 > >   reader instances. If it is to check for memory corruption, we don't
 > >   checksum many other things that may get corrupted in the auth
 server.
 >
 > Actually, other important intent of this checksum stuff is to make
 > sure that all pages are really "touched" once (in practice) the memory
 > manager opens a mapped file, so b10-auth won't have to worry about
 > severe page fault overhead in the case of cold start like a restart
 > from rebooting the host.  The intent is documented for the method,
 > although it may be easily overlooked.

 I had read this during review but forgot about it. I am sorry.

 > For this purpose, however, it doesn't have to be a checksum, so one
 > other option is to just do a simple scan of the pages
 > (add a new `MemorySegmentMapped::scan()` method instead of
 > getCheckSum() and use it in `ZoneTableSegmentMapped::reset()`, maybe
 > if specified so in params).

 In this case, are you thinking of removing the checksum functionality
 when adding the `scan()` method?

 If we sequential scan the mapped file in `scan()`, it may be useful to
 use `posix_fadvise()` on the fd (if we have access to the fd).

 > > * It seems that most of the functionality of
 > >   `ZoneTableSegmentMapped::reset()` can be abstracted out to the base
 > >   class so it works for all implementations of
 > >   `ZoneTableSegment`. Methods like `isWritable()`, `getHeader()`,
 > >   etc. also need not be implemented outside the base
 `ZoneTableSegment`
 > >   then. Without this, a caller would have to know about specialization
 > >   details, such as not to call `reset()` on a
 > >   `ZoneTableSegmentLocal`, etc.
 >
 > Like by pulling up the main part of reset() to the base class,
 > delegating `openReadWrite()`, etc, to derived classes as a protected
 > method?  Hmm, possibly.  I'm not necessarily opposed to that idea, and
 > I see the possibility that it will help future extensions, such as
 > supporting other types of shared memory.

 Shall I implement this, or create another ticket for it? I think another
 ticket would be better for it as `ZoneTableSegmentLocal` tests can also
 reuse common tests, and this would involve a bit of refactoring.

 > I don't necessarily think, however, "a caller would have to know about
 > specialization details", at least in our expected usage.
 > `ConfigurableClientList::resetMemorySegment()` (#2852) wouldn't have
 > to care about it; if the call to reset() throws, it can simply let it
 > be propagated.  b10-auth won't have to care about it either: it only
 > has to call `resetMemorySegment()` when it was told to do so by
 > memmgr, and memmgr ensures `reset()` will be called only for the right
 > type of segments.  The memmgr has to be aware of these types of
 > segments, but that's exactly what it's expected to do.
 >
 > > * A question about the design: why support `reset()`?  Is it for
 > >   performance that we re-use a `ZoneTableSegment` instead of
 > >   `reset()`ing during construction and simply creating another object
 in
 > >   case the underlying storage changes?
 >
 > I don't understand what you mean here.  At least it has nothing to do
 > with "performance" in my mind.  If you mean we don't need the
 > `reset()` API, specifically what would you be suggesting?

 I was thinking that we could simply create a new `ZoneTableSegment` with
 `create()` taking a `ConstElementPtr` that would contain all the details
 (type of backend: local/mapped, the filename, etc.). If we need to
 reset, we destroy the old one and create a new `ZoneTableSegment` with
 `create()`.

 I missed talking about two more issues:

 * The `setNamedAddress()` interface could be problematic in practice as
   it may relocate addresses in the segment. When we `allocate()`, the
   addresses are anonymous with no associated name and when we pass it to
   `setNamedAddress()` there is a possibility the memory could be "lost"
   with no pointer to it if the address is relocated. To workaround this
   issue, we reserve space first in `ZoneTableSegmentMapped` with `NULL`,
   but such code is error-prone and may not be so straightforward to do
   in other use-cases. In general, a better `setNamedAddress()` interface
   will guarantee that the address being passed will not be lost. Such
   things may seem trivial at this layer, but could cause problems when
   we mis-use these interfaces in higher layers, and in this case, hard
   to debug too. It's better to have a robust interface here.

   However, I don't have any good ideas on how to fix it. :)

 * If there is a missing checksum or header (by corruption or programming
   mistake), `READ_WRITE` mode will not warn (it will think the file was
   freshly created). Is there a reason why we'd want to add `CREATE`
   functionality to this mode too?

 Some other unrelated but necessary commits were made. Please check these
 too.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2850#comment:16>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development