BIND 10 #2670: "A deadlock might be detected" failure on NetBSD

Wed Jan 30 09:40:32 UTC 2013

#2670: "A deadlock might be detected" failure on NetBSD
-------------------------------------+-------------------------------------
            Reporter:  naokikambe    |                        Owner:
                Type:  defect        |  jinmei
            Priority:  medium        |                       Status:
           Component:  statistics    |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130205
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  0             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by naokikambe):

 * owner:   => jinmei
 * status:  new => reviewing

Comment:

 Replying this comments on the list,
 https://lists.isc.org/pipermail/bind10-dev/2013-January/004319.html

 > - I don't understand how the "deadlock" happened and how this lock
 >   solves that.  Please make more detailed explanations and/or
 >   comments.

 As I mentioned in the above description, as long as I see the stack
 traces, the unittest seemed to be hanging between setting up msgq and
 starting msgq. So I suspected that one thread was setting up msgq and
 another thread was starting msqg just before the failure occurred. So I
 think that another thread should wait while one thread is setting up msgq.
 That's why I changed the code to lock between setting up msgq and starting
 msgq.

 > - Same comments are repeated.  I think these should be unified:
 > +        # This locking is for dead-lock failures which often occurred
 > +        # while creating or deleting a socket file in msgq.py. See
 > +        #
 http://git.bind10.isc.org/~tester/builder//BIND10/20130129033301-NetBSD4-i386-GCC/logs/unittests.out.

 From the above stack traces, the dead lock didn't seem to occur actually
 when shutting down msgq. We may not need the latter lock. Nevertheless
 I've revised the two comments.

 > - I don't understand why we need to call
 >   isc.log.resetUnitTestRootLogger() from multiple places.  Isn't it
 >   enough to call it from the test main?  If not, please explain.
 Sorry, I cannot explain well why the messages are reduced if it's inserted
 there. I actually examined that in runtime. After creating a object of
 each mock module, if `resetUnitTestRootLogger()` is done, then the
 messages seem to be reduced.
 Anyway this change isn't directly related to this dead-lock failure. This
 issue should be handled on the other ticket.

 So I've pushed 'trac2670' as a new branch instead of the temporary branch
 'fix_stats_tests'. I'll delete the old one. Please review the new one.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2670#comment:2>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development