[bind10-dev] Multicore Auth

Wed Jul 13 10:33:03 UTC 2011

On 12/07/2011 22:52, Michal 'vorner' Vaner wrote:
> • Multiple independent processes with shared in-memory data source. It is the
>   above variant, but the in-memory data source contains some internal logic to
>   have the data in shared memory. This needs some locking in the in-memory data
>   source and allocating data in some manner, some arbiter process updating it,
>   etc. There's some added complexity, but solves the problem of loading the same
>   data multiple times. All the complexity is located in one place ‒ the
>   in-memory data source, leaving the rest of the program as is and simple. In
>   addition, this can be gradually created from the above version (just start
>   multiple processes for now and care about only one instance of data later).
> 
>   The disadvantage here is probably just the complexity, but having to take care
>   of shared memory the correct way is said to be hard (and there might be
>   problems when the system is not shut down correctly, it might stay allocated).

A variant of this would be to use two copies of the data:

Programs map to one copy and answer queries from it.  There is no
locking as all access is read-only.

When an update comes in, the updater process updates the second copy.
It then signals the server processes which, in sequence, stop accepting
queries and map the new data source when all in-progress queries have
finished. (A variant would be to start a new copy of the server using
the new data and to kill the old copy when all in-progress queries have
finished.)

When all server processes have switched to the second copy of the data,
the update process updates the first copy and the cycle begins again.

It is a compromise - it requires two copies of the data and requires
that updates be batched.  But it does avoid the need for locking data
between readers and writers.

> 
> • Multiple processes forked from some kind of parent. The parent loads all the
>   data and forks itself multiple times. The children have a (read only) copy of
>   the in-memory data source, the other things stay the same as now and happily
>   serve answers from there. The OS takes care of the copy-on-writing the pages,
>   making the in-memory data effectively shared memory. If the data change, the
>   parent just modifies its copy, kills all the children and starts them again
>   with new copy. No need to lock or synchronise anything.
> 
>   The disadvantages are two. One of them is, the C++ allocator kind of likes to
>   put data wherever it feels fit, which will fragment the in-memory data and mix
>   it with other variables over time. We either need to provide our own allocator
>   for this to make sure the data stay together (same as with the shared memory
>   version).

There are pitfalls with C++ allocators, e.g. STL assumes that allocators
for a particular type are equivalent.  So if we (using the example given
in Meyers's book "Effective STL") splice elements from list L1 (which
uses allocator A) into L2 (which uses allocator B), when L2 is destroyed
the elements of L1 will destroyed using L2's allocator.

We can do it, but we do have to be careful.

> 
>   The other one is the glorious Windows, which doesn't support fork (as some
>   people point out, because it's too sharp and users might get hurt and they
>   should use spoon instead).

I think it was because in more than 99% of the time, the first thing the
fork does is to exec a new image. So why go through the process of
duplicating address space only to throw it away? Instead they adopted a
different approach.  As the man page for vfork() says:

"Under Linux, fork() is implemented using copy-on-write pages, so the
only penalty incurred by fork() is the time and memory required to
duplicate the parent's page tables, and to create a unique task
structure for the child. However, in the bad old days a fork() would
require making a complete copy of the caller's data space, often
needlessly, since usually immediately afterwards an exec() is done.
Thus, for greater efficiency, BSD introduced the vfork() system call,
that did not fully copy the address space of the parent process, but
borrowed the parent's memory and thread of control until a call to
execve() or an exit occurred. The parent process was suspended while the
child was using its resources. The use of vfork() was tricky: for
example, not modifying data in the parent process depended on knowing
which variables are held in a register."

But in answer to how to do it on Windows, Windows does support
copy-on-write - e.g. see http://support.microsoft.com/kb/103858 and
http://msdn.microsoft.com/en-us/library/aa366761(v=vs.85).aspx

The difference is that instead of forking a process to duplicate the
address space, you explicitly map a shared memory object (which could be
backed by the file or the paging file).  All processes mapping the
object map to the same physical pages.  If the object is mapped with the
FILE_MAP_COPY flag, any writes by a process to that region of memory
cause a process-private page to be created.

This suggests a similar mode of operation to that described above - have
an updater process modify the data using copy-on-write, then write its
data to a new shared memory object.  The server processes are signalled
to map the new object once all their current queries are complete.

Stephen