BIND 10 #2873: Test the landlord/RCU approach for resolver multi-threading

Mon Aug 12 14:15:09 UTC 2013

#2873: Test the landlord/RCU approach for resolver multi-threading
-------------------------------------+-------------------------------------
            Reporter:  vorner        |                        Owner:  muks
                Type:  task          |                       Status:
            Priority:  medium        |  reviewing
           Component:  resolver      |                    Milestone:
            Keywords:                |  Sprint-20130820
           Sensitive:  0             |                   Resolution:
         Sub-Project:  DNS           |                 CVSS Scoring:
Estimated Difficulty:  6             |              Defect Severity:  N/A
         Total Hours:  0             |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => muks

Comment:

 Hello

 Replying to [comment:8 muks]:
 > * While it uses a landlord, this does not use RCU but uses a structure
 >   lock for interlocking which runs everything in lockstep. In an RCU
 >   implementation, `pop()` will not wait for the body of `push()` to
 >   complete and vice versa. RCU approach's benchmark results will be
 >   different if there's a benefit to using it.

 Actually, the RCU was never meant for the queues, but for the cache.
 Notice that cache read is done in the worker thread (without lock), only
 write is pushed to a landlord ‒ this is simulating the RCU on the cache.

 I don't think it makes sense to put RCU onto the queues. Because you need
 to protect RCU against concurrent writes (by a lock or something) anyway
 and there's non-zero overhead. And because both push and pop are
 effectively write operations, RCU doesn't make much sense.

 Also, the term „in lockstep“ seems to be misleading. The push or pop are
 relatively fast operations, compared with the work done with the whole
 batch. So most of the time the queue should be unlocked and both threads
 meeting there run in parallel. Just sometimes, one of them drops a batch
 in there (while the other is probably running at the moment) and later,
 the other picks that up.

 > * This is benchmark code, so I don't know how much nitpicking is
 >   valid. I don't like how the return value of `pop()` is used to
 >   indicate shutdown. It's better to use a `isRunning()` method in the
 >   `while` loops, and in other places.

 That would be wrong. The running flag is out of band. It means the sender
 does not want to push more stuff in. But it does not mean the queue is
 empty.

 > * What happens in `read_iface()` if less than `read_batch_size_` queries
 >   are received? Is this written so because it is a benchmark, and a real
 >   implementation will timeout and `push()`/flush?

 Yes, this was left out for simplicity of the benchmark. We would allow
 smaller batches if the network wasn't busy enough in the real resolver.
 Note that if there are not enough queries to create full batches, the
 resolver is not fully busy so the performance matters less.

 > * Is a second template parameter necessary in `LandlordResolver::Queue`?
 >   Why not simply have the following `typedef` inside `class
 LandlordResolver::Queue`:
 > {{{
 > typedef std::vector<T> Container;
 > }}}

 It's not necessary, but I thought someone might get an idea for different,
 cool data structure for the queue (like list of small vectors, or
 something), so we could reuse the code to test.

 If you think it is confusing, I can remove it.

 > * Can't `downstream_queue_` be made a stack variable between
 >   `checkUpstream()` and `completedUpstream()`? Its use and life seems to
 >   be within `checkUpstream()` codepath only.

 It can't. The `completeUpstream` may be called from `read_iface()` too and
 it can be run from future iteration of the `checkUpstream()`. The query's
 `completeUpstream` remembers the address for the time the query is „out
 there“, and the stack variable would not survive that long.

 > * Especially with how the condition variables are used in this branch, I
 >   propose creating another ticket that does the following. This will
 >   make such code cleaner without having to worry about the relationship
 >   between the mutex and the condition variable. The downside is we don't
 >   support `wait()` on unrelated mutexes, but I don't think we have such
 >   a usecase.

 I think there just can't really be any such usecase and not be broken.

 >  * Make `Locker` constructor take a `Lockable` (interface) object that
 >    implements `lock()`, `tryLock()` and `unlock()` methods. Move
 >    `Locker` out of the `Mutex::` class.
 >  * Make `Mutex` and `CondVar` implement the `Lockable` interface.

 I don't really like the idea of lockable. Locking a CondVar is counter-
 intuitive (especially when more conditional variables are constructed with
 the same Mutex).

 I think it would be better to have a getMutex() method for the CondVar, so
 it would be clear a real mutex is involved.

 I'll create a ticket for that (at the merge time), although I believe this
 is very minor unimportant detail.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2873#comment:9>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development