BIND 10 trac2775, updated. bc9125aec50bb75fcf601eb2e0c9bcb5dd3a1a1b [2775] Sharing cache

Mon Mar 11 10:15:51 UTC 2013

The branch, trac2775 has been updated
       via  bc9125aec50bb75fcf601eb2e0c9bcb5dd3a1a1b (commit)
      from  9869caead03573be43747c30c4f38929c975b409 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit bc9125aec50bb75fcf601eb2e0c9bcb5dd3a1a1b
Author: Michal 'vorner' Vaner <michal.vaner at nic.cz>
Date:   Mon Mar 11 11:15:39 2013 +0100

    [2775] Sharing cache

-----------------------------------------------------------------------

Summary of changes:
 doc/design/resolver/01-scaling-across-cores |   75 ++++++++++++++++++++++++++-
 1 file changed, 74 insertions(+), 1 deletion(-)

-----------------------------------------------------------------------

diff --git a/doc/design/resolver/01-scaling-across-cores b/doc/design/resolver/01-scaling-across-cores
index 95f1e91..dbd962f 100644
--- a/doc/design/resolver/01-scaling-across-cores
+++ b/doc/design/resolver/01-scaling-across-cores
@@ -271,4 +271,77 @@ could not fight over the query.
 [NOTE]
 This model would work only with threads, not processes.
 
-TODO: The shared caches
+Shared caches
+-------------
+
+While it seems it is good to have some sort of L1 cache with pre-rendered
+answers (according to measurements in the #2777 ticket), we probably need some
+kind of larger shared cache.
+
+If we had just a single shared cache protected by lock, there'd be a lot of
+lock contention on the lock.
+
+Partitioning the cache
+~~~~~~~~~~~~~~~~~~~~~~
+
+We split the cache into parts, either by the layers or by parallel bits we
+switch between by a hash. If we take it to the extreme, a lock on each hash
+bucket would be this kind, though that might be wasting resources (how
+expensive is it to create a lock?).
+
+Landlords
+~~~~~~~~~
+
+The landlords do synchronizations themselves. Still, the cache would need to be
+partitioned.
+
+RCU
+~~~
+
+The RCU is a lock-less synchronization mechanism. An item is accessed through a
+pointer.  An updater creates a copy of the structure (in our case, it would be
+content of single hash bucket) and then atomically replaces the pointer. The
+readers from before have the old version, the new ones get the new version.
+When all the old readers die out, the old copy is reclaimed. Also, the
+reclamation can AFAIK be postponed for later times when we are slightly more
+idle or to a different thread.
+
+We could use it for cache â€’ in the fast track, we would just read the cache. In
+the slow one, we would have to wait in queue to do the update, in a single
+updater thread (because we don't really want to be updating the same cell twice
+at the same time).
+
+Proposals
+---------
+
+In either case, we would have some kind of L1 cache with pre-rendered answers.
+For these proposals (except the third), we wouldn't care if we split the cache
+into parallel chunks or layers.
+
+Hybrid RCU/Landlord
+~~~~~~~~~~~~~~~~~~~
+
+The landlord approach, just read only accesses to the cache are done directly
+by the peasants. Only if they don't find what they want, they'd append the
+queue to the task of the landlord. The landlord would be doing the RCU updates.
+It could happen that by the time the landlord gets to the task the answer is
+already there, but that would not matter much.
+
+Accessing network would be from landlords.
+
+Coroutines+RCU
+~~~~~~~~~~~~~~
+
+We would do the coroutines, and the reads from shared cache would go without
+locking. When doing write, we would have to lock.
+
+To avoid locking, each worker thread would have its own set of upstream sockets
+and we would dup the sockets from users so we don't have to lock that.
+
+Multiple processes with coroutines and RCU
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This would need the layered cache. The upper caches would be mapped to local
+memory for read-only access. Each cache would be a separate process. The
+process would do the updates â€’ if the answer was not there, the process would
+be asked by some kind of IPC to pull it from upstream cache or network.