[bind10-dev] proposed design of scalable in-memory zone loading/updating

Wed Jun 27 18:40:46 UTC 2012

At Mon, 25 Jun 2012 15:02:16 +0200,
Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:

> > > However:
> > > • The „MemoryEventManager“ seems to suggest there's a main-loopish thing hidden
> > >   inside and that it'll dispatch events somehow.
> > 
> > First, it's "MemoryEventHandler", not manager.  And the event loop is
> > supposed to be outside of the handler.  So, I'm not really sure about
> > your concern, but if you mean the naming is confusing (even after
> > clarifying it's not named "manager"), I'm open to suggestions.  I'm
> > not very good at naming things.
> 
> Yes, I meant the naming confusion. Maybe something like SegmentManager?

I intended to separate the concept of the actual memory management
(which may be in the same single process or in other process) and
the application frontend to the memory management service (which is
generally considered lightweight, like a single call to shm_openq or
mmap).  That's why I called it a "(event) handler".  I have no problem
with renaming it if it sounds confusing, but at least I'd like to keep
the separation of the concepts.  In that sense, "SegmentManager"
doesn't seem to be good to me for a name of the frontend, unless we
have a clearly different name for the instance that actually manages
memory.

> I was thinking the auth would still call the manager. But each
> in-memory would register itself within the manager and when the
> manager has a new segment or something, a callback of the in-memory
> is called instead of returning it to auth and auth pushing it to the
> in-memory.

If it can be designed that way cleanly, I agree it's a better design.
I don't quite remember why I ended up with the indirect way in the
initial proposal, but that's maybe because I was not sure about
layering relationship between auth, Segment(handler/manager), and
datasource-client.

> Also, the manager, would it be possible to have some kind of copy-on-write
> memory pages, so we don't have to keep two full copies?

Now I understand you are assuming we use a dedicated memory segment
via mmap or shmem for single-process, that's probably possible (when
the OS supports copy-on-write).  The main question is whether we use
such segment or the process-wide global heap as an emulation of
"segment".  I actually thought the local version would use the latter.

> > Aside from these difficulties, however, always doing it asynchronously
> > may make sense; then the caller side of code can be unified.
> 
> It is maybe not worth it, when you pointed out the problem with
> partial update.

We could still say it's the segment manger/handler's responsibility to
ensure the version integrity of underlying in-memory zones, i.e.,
whenever an increment step is completed, the data source client can
safely assume the version of the in-memory zone it's currently
accessing is complete.  But this is just a comment - I don't have a
particular preference on this point.

> > > • Would releasing really be time consuming? Wouldn't just dropping the memory
> > >   segment be enough? That would be fast.
> > 
> > I'm afraid it can be time consuming at a non-negligible level if it's
> > built locally as we'll need to go through the entire tree and the node
> > data, releasing corresponding memory chunk one by one.  Applying an
> > analogy with experience with BIND 9, that could be 10sec-ish task.
> > 
> > ...or, perhaps you are assuming we use process-local, exclusive memory
> > segments preallocated by mmap or something?
> 
> Yes. In that case, we could just munmap it and not worry about what
> was inside.

Hmm, that's cool, although we'll then need more careful memory
management (it's quite likely we'll need multiple zone segments, and
we'll need to handle the situation where the originally mapped area
becomes full while there's still available memory).  And, if we do it
that way for the "local manager", the end result will really be pretty
close to the shared memory (managed by a separate process) version.
So, whether we want to try to realize that one first may be more of
concern.

> > One obvious concern is the "update" case (see the IXFR-case discussion
> > above).  If the updating thread modifies the zone data used by the
> > main auth thread (responding to queries) the contention will cause a
> > disaster.  We should (probably) also think about the case where
> > multiple replacement requests happen at the same time (may or may not
> > be an issue).
> 
> These don't have the continuation object, so they would run synchronously,
> without a thread.

So, are you assuming in this model we use a separate thread for the
time consuming operations and a separate thread (which would actually
be the main thread) for updates?  That's one possible way.  I'm not
sure about its own complexity (managing two different updater's)
though.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.