[bind10-dev] recover from memmgr restart

Thu Jun 13 06:03:09 UTC 2013

While working on #2853, I realized I was naive about how to recover
from a restart of memmgr (whether it's scheduled restart or automatic
restart from a crash, but especially for the latter):
http://bind10.isc.org/wiki/SharedMemoryIPC#a5.5WhentheMemmgrDies

"In the 3rd diagram of Section 5.1" referenced in the above URL, it's
implicitly assumed "auth-1" doesn't use any mapped file or
zone-mapped.1, but it's not guaranteed.  It may have been using
zone-mapped.0, in which case the memmgr wouldn't even make initial
updates to it in the first diagram.  Also, depending on the timing of
the previous termination of memmgr, there can even be multiple reader
processes using different mapped files (e.g., if the memmgr died in
the middle of sending update messages).  So we'll need more explicit
synchronization.

There can be several ways to address this issue, but I'd suggest
maybe-suboptimal but simpler solution: when memmgr starts up, it tells
all pre-existing readers to clear all segments they are using (if
any).  When memmgr will get acknowledgments from all such readers, it
can now start making initial updates.

The major downside of this approach is that readers would return
SERVFAIL while the memmgr is preparing the mapped file image.  But the
memmgr basically shouldn't easily die or be restarted by hand, so at
least in principle this should be considered a pretty rare event and
would be acceptable.

If this makes sense, I'll create corresponding development tickets.
Whatever approach we take, I think this can be deferred until a later
stage of development (initially we'll complete the implementation for
the case of no memmgr restart, and extend it later).

---
JINMEI, Tatuya