BIND 10 #2670: "A deadlock might be detected" failure on NetBSD
BIND 10 Development
do-not-reply at isc.org
Thu Jan 31 02:42:55 UTC 2013
#2670: "A deadlock might be detected" failure on NetBSD
-------------------------------------+-------------------------------------
Reporter: naokikambe | Owner:
Type: defect | jinmei
Priority: medium | Status:
Component: statistics | reviewing
Keywords: | Milestone:
Sensitive: 0 | Sprint-20130205
Sub-Project: DNS | Resolution:
Estimated Difficulty: 0 | CVSS Scoring:
Total Hours: 0 | Defect Severity: N/A
| Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Changes (by naokikambe):
* owner: naokikambe => jinmei
Comment:
Replying to [comment:3 jinmei]:
> First off, the problem still seems to happen with this branch:
>
http://git.bind10.isc.org/~tester/builder//BIND10/20130130180303-NetBSD4-i386-GCC/logs/unittests.out
Oops! the problem isn't resolved. The change which I added makes no sense.
:-|
> Secondly, I still don't understand how the (seeming) deadlock happened
> and how the lock would solve it; normally adding a lock solves issues
> like a race condition, and (although it's possible to solve a deadlock
> happening due to a race condition) an added lock could even cause a
> new deadlock, not solve it.
>
> I also didn't understand how multiple threads could instantiate
> `BaseModules` at the same time.
>
> Since I don't understand these basic points it's not surprising there
> are some other unclear points, but I'm also not sure why that's
> specific to msgq. From a quick look at the code, the same thing
> (whether it's a race or deadlock) seems to be able to occur for the
> rest of `BaseModules.__init__()`.
>
> And, in general: I don't like to make an arbitrary change simply
> because it *might* help, without understanding these "how"s.
>
> For the same reason, I don't like to silence the log message without
> understanding how this happened. It's especially so because it's not
> directly related to the main issue of this ticket.
I thought that locking makes sense against the problem because multiple
tests were hanging in parallel from a look of the stack traces. But
eventually it was wrong.
I understand your opinion, but right now I don't have much information
about what you are asking. I neither understand mechanism about how such a
dead lock happens only on NetBSD. I think we might need to investigate
more deeply about how the process behaves, e.g. at the system-call level,
so that we would realize mechanism clearly. IMO a dead-lock problem is
very complicated in general.
I'd like to stop working for it by guessing randomly. I would not have any
other good idea soon. So should we pull this ticket out of the current
sprint? I would remove the branch from the repository if so.
Regards,
--
Ticket URL: <http://bind10.isc.org/ticket/2670#comment:5>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list