BIND 10 #2670: "A deadlock might be detected" failure on NetBSD

Thu Jan 31 02:42:55 UTC 2013

#2670: "A deadlock might be detected" failure on NetBSD
-------------------------------------+-------------------------------------
            Reporter:  naokikambe    |                        Owner:
                Type:  defect        |  jinmei
            Priority:  medium        |                       Status:
           Component:  statistics    |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130205
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  0             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by naokikambe):

 * owner:  naokikambe => jinmei

Comment:

 Replying to [comment:3 jinmei]:
 > First off, the problem still seems to happen with this branch:
 >
 http://git.bind10.isc.org/~tester/builder//BIND10/20130130180303-NetBSD4-i386-GCC/logs/unittests.out

 Oops! the problem isn't resolved. The change which I added makes no sense.
 :-|

 > Secondly, I still don't understand how the (seeming) deadlock happened
 > and how the lock would solve it; normally adding a lock solves issues
 > like a race condition, and (although it's possible to solve a deadlock
 > happening due to a race condition) an added lock could even cause a
 > new deadlock, not solve it.
 >
 > I also didn't understand how multiple threads could instantiate
 > `BaseModules` at the same time.
 >
 > Since I don't understand these basic points it's not surprising there
 > are some other unclear points, but I'm also not sure why that's
 > specific to msgq.  From a quick look at the code, the same thing
 > (whether it's a race or deadlock) seems to be able to occur for the
 > rest of `BaseModules.__init__()`.
 >
 > And, in general: I don't like to make an arbitrary change simply
 > because it *might* help, without understanding these "how"s.
 >
 > For the same reason, I don't like to silence the log message without
 > understanding how this happened.  It's especially so because it's not
 > directly related to the main issue of this ticket.

 I thought that locking makes sense against the problem because multiple
 tests were hanging in parallel from a look of the stack traces. But
 eventually it was wrong.

 I understand your opinion, but right now I don't have much information
 about what you are asking. I neither understand mechanism about how such a
 dead lock happens only on NetBSD. I think we might need to investigate
 more deeply about how the process behaves, e.g. at the system-call level,
 so that we would realize mechanism clearly. IMO a dead-lock problem is
 very complicated in general.

 I'd like to stop working for it by guessing randomly. I would not have any
 other good idea soon. So should we pull this ticket out of the current
 sprint? I would remove the branch from the repository if so.

 Regards,

-- 
Ticket URL: <http://bind10.isc.org/ticket/2670#comment:5>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development