[bind10-dev] Multicore Auth

Tue Jul 12 21:52:06 UTC 2011

Hello

I think I promised to bring this up during the previous planning call, but I
forgot about it, so at last now. One of the goal of the next release is using
multiple cores for Auth. But we didn't decide how to do that. There was an old
discussion here:

https://lists.isc.org/pipermail/bind10-dev/2010-December/001738.html

To sum things up a little bit, there AFAIK were these ideas:
• Threads. Quite common way to use multiple cores, every developer knows what is
  going on there. There could be multiple work threads looking up the answers in
  some backend and sending the answers. There are some questions about if
  there's a „updater“ thread or some other kind of „housekeeping“ thread, but
  that's probably a detail.

  This has an disadvantage: It needs locking, and working with threads is
  generally error prone. None of our current code in Auth was written with
  threads in mind, so we would need to go trough it and make sure it is thread
  safe and locks.

• Multiple independent processes, sharing only the socket they read from. This
  is simple to do (even more if we have the socket creator, the same socket
  would just be passed to multiple processes), has an advantage of separating
  them, so if one crashes, others can go on.

  Disadvantages: First, our configuration system is designed for the state when
  we have exactly one instance of each module. We need to solve this somehow
  anyway, since we often have the problem of having no instance of a module
  running.

  Another one is, in-memory data source would load the data n times into memory.
  This needlessly eats RAM (which users won't like) and there's a performance
  penalty to this (having the same data in memory and accessing all copies will
  make kick instances of it kick each other out of L3 caches). But, to be fair,
  this might help NUMA systems on the other hand (having a copy of the data in
  the RAM module nearest to the CPU needing it, which the OS tries hard to
  ensure).

• Multiple independent processes with shared in-memory data source. It is the
  above variant, but the in-memory data source contains some internal logic to
  have the data in shared memory. This needs some locking in the in-memory data
  source and allocating data in some manner, some arbiter process updating it,
  etc. There's some added complexity, but solves the problem of loading the same
  data multiple times. All the complexity is located in one place ‒ the
  in-memory data source, leaving the rest of the program as is and simple. In
  addition, this can be gradually created from the above version (just start
  multiple processes for now and care about only one instance of data later).

  The disadvantage here is probably just the complexity, but having to take care
  of shared memory the correct way is said to be hard (and there might be
  problems when the system is not shut down correctly, it might stay allocated).

• Multiple processes forked from some kind of parent. The parent loads all the
  data and forks itself multiple times. The children have a (read only) copy of
  the in-memory data source, the other things stay the same as now and happily
  serve answers from there. The OS takes care of the copy-on-writing the pages,
  making the in-memory data effectively shared memory. If the data change, the
  parent just modifies its copy, kills all the children and starts them again
  with new copy. No need to lock or synchronise anything.

  The disadvantages are two. One of them is, the C++ allocator kind of likes to
  put data wherever it feels fit, which will fragment the in-memory data and mix
  it with other variables over time. We either need to provide our own allocator
  for this to make sure the data stay together (same as with the shared memory
  version).

  The other one is the glorious Windows, which doesn't support fork (as some
  people point out, because it's too sharp and users might get hurt and they
  should use spoon instead).

I believe all of those could be implemented and serve our purpose. The question
is, what of them look like simplest to you and what do you like? We probably
should decide before we start implementing it. Or should this discussion be
moved to some future sprint?

For the record, I myself probably would put this order of preference to them
(the one I like most first):
• Shared memory
• Fork
• Threads
• Multiple processes

Thank you

With regards

-- 
Support your right to arm bears!!

Michal 'vorner' Vaner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/bind10-dev/attachments/20110712/7f597114/attachment.bin>