[bind10-dev] python & threads (Re: Agenda for call 2010-10-12)

JINMEI Tatuya / 神明達哉 jinmei at isc.org
Tue Oct 12 14:54:35 UTC 2010


At Mon, 11 Oct 2010 19:35:50 +0200,
Shane Kerr <shane at isc.org> wrote:

>       * Technical issues
>               * Python & threads (or perhaps threads in general)

As we talked on jabber, I'd like to share some observations relating
to this topic.

The direct trigger was trac #335, in which a race condition was found
in some tricky case.  (I was not 100% sure about this, but it seemed
to be very rare unless multiple threads can run on different cores)

I also noticed a race condition in the xfrin code (at the time around
trac #179 5 month ago), which subsequently happened to be solved as a
side effect of a different change.

Recently I noticed and pointed out another possible race condition
(and possible dead lock) in trac #352.

I've been seen other suspicious python code in our way to handle
threads, but so far I've thought python threads cannot be run on
really concurrently (i.e. on different cores) and treated them aa
a relatively lower priority.

But now that #335 has proved it's real, I think it's time to revisit
the usage seriously.

IMO, our current python code is a bit naive in handling threads.
Maybe not all of us have sufficient experience with thread related
pitfalls.  We could (or might) fix specific bugs, and we could gain
experience through development, but the resulting code will become
more difficult to understand (BIND 9 has proved that).  That would be
against one of major goals of using python: better understandability.

Meanwhile, one common advantage in using threads would be that
dedicated worker thread code can be written in a straightforward way,
simply waiting for any blocking operation, etc.  But we've already
lost that advantage, because many of our python threads also need to
handle multiple events: one for the main task (such as xfr) and the
other for control events (typically shutdown).  As a result we still
need to use some complicated event-callback primitives such as select
or asyncore.

In short, what we are currently having is buggy thread code that has
lost one major plus of threads.

Others may have different opinions, but I'm personally inclined to
change the model to either
A: single-thread, event only code, or
B: multi-process + event primitive

Either way, we can at least avoid thread pitfalls.  Plan A wouldn't be
much different from the current approach in terms of code complexity
(and might be even better because it doesn't have to have tricky
thread considerations).  Plan B may also be similar to our current
code (without thread related bugs) at the cost of having more
processes.

---
JINMEI, Tatuya



More information about the bind10-dev mailing list