BIND 10 #1597: traceback in bind10 when cfgmrg can't start
BIND 10 Development
do-not-reply at isc.org
Wed Jan 18 19:18:53 UTC 2012
#1597: traceback in bind10 when cfgmrg can't start
-------------------------------------+-------------------------------------
Reporter: jreed | Owner:
Type: defect | Status: new
Priority: major | Milestone: New
Component: Boss of BIND | Tasks
Sensitive: 0 | Keywords:
Sub-Project: Core | Defect Severity: High
Estimated Difficulty: 0 | Feature Depending on Ticket:
Total Hours: 0 | Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
This maybe related to #1291.
I tried bind10 --user. I forgot about config permissions.
This is from shutdown first:
{{{
2012-01-18 12:54:51.411 ERROR [b10-boss.boss]
BIND10_CONFIGURATOR_PLAN_INTERRUPTED configurator plan interrupted, only 2
of 8 done
2012-01-18 12:54:52.503 INFO [b10-boss.boss] BIND10_PROCESS_ENDED process
1481 of Socket creator ended with status 0
}}}
Not sure what that 2 of 8 means or if it matters. Anyways here is the
problem:
{{{
2012-01-18 12:55:23.192 INFO [b10-boss.boss] BIND10_STARTING starting
BIND10: bind10 20110223 (BIND 10 20120119)
2012-01-18 12:55:23.192 DEBUG [b10-boss.boss]
BIND10_CHECK_MSGQ_ALREADY_RUNNING checking if msgq is already running
2012-01-18 12:55:23.204 INFO [b10-boss.boss] BIND10_CONFIGURATOR_START
bind10 component configurator is starting up
2012-01-18 12:55:23.204 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_BUILD
building plan '{}' -> '{'sockcreator': {'priority': 200, 'kind': 'core',
'special': 'sockcreator'}, 'msgq': {'priority': 199, 'kind': 'core',
'special': 'msgq'}, 'cfgmgr': {'priority': 198, 'kind': 'core', 'special':
'cfgmgr'}}'
2012-01-18 12:55:23.204 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_RUN
running plan of 3 tasks
2012-01-18 12:55:23.204 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_TASK
performing task start on Socket creator
2012-01-18 12:55:23.204 INFO [b10-boss.boss] BIND10_COMPONENT_START
component Socket creator is starting
2012-01-18 12:55:23.231 INFO [b10-boss.boss] BIND10_SOCKCREATOR_INIT
initializing socket creator parser
2012-01-18 12:55:23.231 DEBUG [b10-boss.boss] BIND10_STARTED_PROCESS_PID
started b10-sockcreator (PID 10365)
2012-01-18 12:55:23.231 INFO [b10-boss.boss] BIND10_SETUID setting UID to
1000
2012-01-18 12:55:23.232 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_TASK
performing task start on msgq
2012-01-18 12:55:23.232 INFO [b10-boss.boss] BIND10_COMPONENT_START
component msgq is starting
2012-01-18 12:55:23.232 INFO [b10-boss.boss] BIND10_STARTING_PROCESS
starting process b10-msgq
2012-01-18 12:55:23.348 DEBUG [b10-boss.boss] BIND10_STARTED_PROCESS_PID
started b10-msgq (PID 16077)
2012-01-18 12:55:23.569 DEBUG [b10-boss.boss] BIND10_CONFIGURATOR_TASK
performing task start on cfgmgr
2012-01-18 12:55:23.569 INFO [b10-boss.boss] BIND10_COMPONENT_START
component cfgmgr is starting
2012-01-18 12:55:23.569 INFO [b10-boss.boss] BIND10_STARTING_PROCESS
starting process b10-cfgmgr
2012-01-18 12:55:23.637 DEBUG [b10-boss.boss] BIND10_STARTED_PROCESS_PID
started b10-cfgmgr (PID 13175)
2012-01-18 12:55:23.637 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:23.951 FATAL [b10-cfgmgr.cfgmgr] CFGMGR_DATA_READ_ERROR
error reading configuration database from disk: Can't read configuration
file: [Errno 13] Permission denied:
'/home/reed/opt/bind10/var/bind10-devel/b10-config.db'
2012-01-18 12:55:23.967 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:24.977 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:25.987 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:26.998 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:28.008 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:29.018 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:30.029 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:31.039 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:32.050 DEBUG [b10-boss.boss] BIND10_WAIT_CFGMGR waiting
for configuration manager process to initialize
2012-01-18 12:55:33.060 ERROR [b10-boss.boss]
BIND10_COMPONENT_START_EXCEPTION component cfgmgr failed to start:
Configuration manager process has not started
2012-01-18 12:55:33.061 ERROR [b10-boss.boss] BIND10_COMPONENT_FAILED
component cfgmgr (pid None) failed with unknown exit status
2012-01-18 12:55:33.061 FATAL [b10-boss.boss] BIND10_COMPONENT_UNSATISFIED
component cfgmgr is required to run and failed
2012-01-18 12:55:33.062 ERROR [b10-boss.boss]
BIND10_CONFIGURATOR_PLAN_INTERRUPTED configurator plan interrupted, only 2
of 3 done
2012-01-18 12:55:33.062 INFO [b10-boss.boss] BIND10_KILLING_ALL_PROCESSES
killing all started processes
2012-01-18 12:55:33.063 INFO [b10-boss.boss] BIND10_KILL_PROCESS killing
process Socket creator
2012-01-18 12:55:33.063 WARN [b10-boss.boss] BIND10_SOCKCREATOR_KILL
killing the socket creator
Traceback (most recent call last):
File "/home/reed/opt/bind10/lib/python3.1/site-
packages/isc/bind10/component.py", line 165, in start
self._start_internal()
File "/home/reed/opt/bind10/lib/python3.1/site-
packages/isc/bind10/component.py", line 403, in _start_internal
procinfo = self._start_func()
File "/home/reed/opt/bind10/sbin/bind10", line 476, in start_cfgmgr
raise ProcessStartError("Configuration manager process has not
started")
__main__.ProcessStartError: Configuration manager process has not started
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/reed/opt/bind10/sbin/bind10", line 638, in startup
self.start_all_components()
File "/home/reed/opt/bind10/sbin/bind10", line 599, in
start_all_components
self._component_configurator.startup(self.__core_components)
File "/home/reed/opt/bind10/lib/python3.1/site-
packages/isc/bind10/component.py", line 504, in startup
self.__reconfigure_internal(self._components, configuration)
File "/home/reed/opt/bind10/lib/python3.1/site-
packages/isc/bind10/component.py", line 492, in __reconfigure_internal
self._run_plan(self._build_plan(old, new))
File "/home/reed/opt/bind10/lib/python3.1/site-
packages/isc/bind10/component.py", line 633, in _run_plan
component.start()
File "/home/reed/opt/bind10/lib/python3.1/site-
packages/isc/bind10/component.py", line 168, in start
self.failed(None)
File "/home/reed/opt/bind10/lib/python3.1/site-
packages/isc/bind10/component.py", line 222, in failed
self._boss.component_shutdown(1)
File "/home/reed/opt/bind10/sbin/bind10", line 671, in
component_shutdown
raise Exception("Component failed during startup");
Exception: Component failed during startup
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/reed/opt/bind10/sbin/bind10", line 1170, in <module>
main()
File "/home/reed/opt/bind10/sbin/bind10", line 1148, in main
startup_result = boss_of_bind.startup()
File "/home/reed/opt/bind10/sbin/bind10", line 640, in startup
self.kill_started_components()
File "/home/reed/opt/bind10/sbin/bind10", line 343, in
kill_started_components
self.components[pid].kill(True)
File "/home/reed/opt/bind10/lib/python3.1/site-
packages/isc/bind10/special_component.py", line 68, in kill
self.__creator.kill()
File "/home/reed/opt/bind10/lib/python3.1/site-
packages/isc/bind10/sockcreator.py", line 239, in kill
self.__process.kill()
File "/usr/pkg/lib/python3.1/subprocess.py", line 1403, in kill
self.send_signal(signal.SIGKILL)
File "/usr/pkg/lib/python3.1/subprocess.py", line 1393, in send_signal
os.kill(self.pid, sig)
OSError: [Errno 1] Operation not permitted
}}}
I tried to start bind10 again without --user and it failed due to running
still:
{{{
2012-01-18 12:56:05.693 INFO [b10-boss.boss] BIND10_STARTING starting
BIND10: bind10 20110223 (BIND 10 20120119)
2012-01-18 12:56:05.694 DEBUG [b10-boss.boss]
BIND10_CHECK_MSGQ_ALREADY_RUNNING checking if msgq is already running
2012-01-18 12:56:05.695 FATAL [b10-boss.boss] BIND10_MSGQ_ALREADY_RUNNING
msgq daemon already running, cannot start
2012-01-18 12:56:05.695 FATAL [b10-boss.boss] BIND10_STARTUP_ERROR error
during startup: b10-msgq already running, or socket file not cleaned ,
cannot start
[b10-msgq] Closing socket fd 5
[b10-msgq] Receive error: EOF
[b10-msgq] Closing socket fd 4
[b10-msgq] Receive error: EOF
[b10-msgq] Closing socket fd 4
[b10-msgq] Receive error: EOF
}}}
with these processes:
{{{
root 10365 0.0 0.2 17920 1592 ttyp1 I 12:55PM 0:00.01
b10-sockcreator
reed 16077 0.0 1.7 109924 15296 ttyp1 I 12:55PM 0:00.28
/usr/pkg/bin/python3.1 /home/reed/opt/bind10/libexec/bind10-devel/b10-msgq
}}}
I killed these manually and started bind10 without --user.
The traceback should not happen.
sockcreator should never stay running (I have seen this before).
msgq should never stay running.
bind10 should say to stderr if can't start (don't assume logging works).
--
Ticket URL: <http://bind10.isc.org/ticket/1597>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list