[bind10-dev] shared memory data source: file-based (mmap) vs memory-only

JINMEI Tatuya / 神明達哉 jinmei at isc.org
Thu Feb 28 19:03:39 UTC 2013


At Thu, 28 Feb 2013 08:36:08 +0100,
Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:

> > I personally prefer the file-based approach, because this can work as
> > a persistent on-disk copy that enables nearly-zero (hot start case) or
> > a few-to-10-ish seconds (cold start case, with some initial work) of
> > start up time.  The major disadvantage is management overhead of the
> > files and the risk of having corrupt on-disk images.
> > 
> > Do others have opinions/preferences on this?
> 
> The file-based one does seem to have some advantages here. So I think we could
> use them. Only one thing ‒ I believe the memmgr (or how we'll call the process
> managing the memory segments) should mostly auto-configure. I don't want to
> force user to specify a filename for each zone being cached. I'd see that as
> setting the directory where the images should reside and examine list of files
> per zone, optionally with some commands (re-create data for this zone, remove
> this no-longer used zone, etc).

Yeah, that makes sense.

> But, I'm slightly worried about some embedded devices. They might have more RAM
> than disk space and often rewriting the disk space doesn't seem right either.

I suspect in such cases they are generally okay with non-shared memory
mode (I assume we'll keep it); although the RAM can also be relatively
scarce for embedded systems, the memory overhead won't become a real
issue unless they run multiple instances of auth.

> Also, I don't know if the file-based approach might slow things down (imagine
> you load the zone from DB to the image, then you are waiting for reads from the
> DB and on the writes to the file).

This is probably what we need to keep in our mind as we
design/implement details.  I also think it's related to how to make
updates smoothly.  See also below.

> Would it be very problematic to support both ways? Or, at least designing it
> that most of the code could be reused when adding the second one?

It wouldn't be that hard as the Boost managed_xxx classes have
consistent interfaces in general.

> > On a related note: I remember there was a mention of the use of
> > copy-on-write when updating the current version of the shared memory
> > image.  As far as I can see, we can't benefit from it, at least not if
> > we use the above Boost utility classes.  We'll first need to make a
> > complete copy of the current image, either on disk or in-memory and
> > then updated the copied version.
> 
> That looks very unfortunate. Imagine a large zone, let's say the com. Doing a
> copy when one small change comes is too slow (there are probably many changes
> per minute and doing a complete copy would take more than a minute). Also, that
> would take twice as much space (on disk or in memory).
> 
> I'm wondering if anything of this would work:
>  • Open in auths in read-only or copy on write and hope the changes will not
>    propagate until we map again.

The tricky point is to how to remap the new version.  I couldn't find
a way to implement it without dumping the entire new version to a file
(in case of file-based approach) or copy it to a newly created
in-memory segment of that size; either way, while copy-on-write itself
may work we'll lose the major advantage of it.

>  • Use some FS that uses copy-on-write internally.
>  • Keep two copies all the time, one active and one inactive. Apply changes to
>    the inactive, then switch. Then catch up (apply all the same changes to the now
>    inactive one) and start applying new changes to it. This solves the speed,
>    but not the size.

I'm assuming we'll use this last option.  And, assuming very large
zones usually make incremental updates (and, assuming we (eventually)
separate segments for systems managing a large number of zones), this
will also address the concern on the file-based approach; each
incremental updates shouldn't require a full copy of the segment, so
should be reasonably lightweight.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.


More information about the bind10-dev mailing list