[bind10-dev] server crashed after reload - Secondary Server only, caching enabled...

Jeremy C. Reed jreed at isc.org
Fri Sep 7 12:22:51 UTC 2012


On Fri, 7 Sep 2012, Michal 'vorner' Vaner wrote:

> > > 2012-09-03 16:03:19.884 ERROR [b10-xfrin.config]
> > > CONFIG_SESSION_STOPPING_FAILED error sending stopping message: [Errno 32]
> > > Broken pipe
> > > 2012-09-03 16:03:19.887 ERROR [b10-zonemgr.config]
> > > CONFIG_SESSION_STOPPING_FAILED error sending stopping message: [Errno 32]
> > > Broken pipe
> > 
> > And the order of shutdowns should not allow msgq or cfgmgr to exit 
> > before other components. Or if that is okay, then at least they 
> > shouldn't ever need to use msgq or cfgmgr in this case.
> 
> I don't think msgq ever exits before the others stop. There's just no 
> ?clean stop? code in msgq, msgq needs to be killed in the next 
> shutdown stage. So unless the boss started to send SIGTERMs already, 
> msgq didn't exit, it crashed (and because it has no logging, the 
> messages got lost). I suspect it sometimes can't handle too many 
> components disappearing at once (some kind of SIGPIPE or race 
> condition there).

The order of the logging and timestamps may be misleading, but it seemed 
to show that the cfgmgr shutdown cleanly and msgq received a SIGTERM 
before b10-xfrin tried to use them.

2012-09-03 16:03:18.877 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking
cfgmgr to shut down
...
2012-09-03 16:03:18.877 INFO  [b10-boss.boss] BIND10_STOP_PROCESS asking
b10-xfrin to shut down
...
2012-09-03 16:03:19.878 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED 
process 1953 of cfgmgr ended with status 0
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED 
process 1957 of b10-auth-1 ended with status 256
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_PROCESS_ENDED 
process 1959 of b10-stats ended with status 0
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_SEND_SIGTERM 
sending SIGTERM to msgq (PID 1952)
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_SEND_SIGTERM 
sending SIGTERM to b10-xfrin (PID 1954)
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_SEND_SIGTERM 
sending SIGTERM to b10-zonemgr (PID 1955)
2012-09-03 16:03:19.879 INFO  [b10-boss.boss] BIND10_SEND_SIGTERM 
sending SIGTERM to b10-cmdctl (PID 1958)

>From the logging above it appears that stopping the msqg and cfgmgr 
didn't get postponed.


More information about the bind10-dev mailing list