BIND9.2.0 not killing on SIGTERM signal
Dhanasekaran
dhana at india.hp.com
Mon Jun 3 14:55:23 UTC 2002
Hi,
I am running BIND9.2.0 nameserver on multiprocessor system (8 CPU).
When named receives SIGTERM signal, named is not getting killed. This
problem occurs intermittently. Most of the times named exits properly.
I instrumented the code and found the following in the syslog.
May 24 11:34:17 tst1 named[3795]: isc_app_run: Return value of sigwait()
:0
May 24 11:34:17 tst1 named[3795]: isc_app_run: signal received :15
May 24 11:34:17 tst1 named[3795]: isc_app_run: Setting want_shutdown to
ISC_TRUE
May 24 11:34:17 tst1 named[3795]: isc_app_run: Returning SUCCESS
May 24 11:34:17 tst1 named[3795]: main: Return value of isc_app_run(): 0
May 24 11:34:17 tst1 named[3795]: main: Calling cleanup()
May 24 11:34:17 tst1 named[3795]: shutting down: flushing changes
May 24 11:34:17 tst1 named[3795]: stopping command channel on
127.0.0.1#953
May 24 11:34:17 tst1 named[3795]: no longer listening on 192.1.1.152#53
May 24 11:34:17 tst1 named[3795]: no longer listening on 15.14.148.121#53
May 24 11:34:17 tst1 named[3795]: no longer listening on 127.0.0.1#53
From the above log, it is confirmed that main thread receives the
SIGTERM signal and cleanup() function is called. In cleanup() function
destroy_managers() are called to destroy the task, timer and socket
managers.
In isc_taskmgr_destroy(), the main thread is sending shutdown message
to all the tasks and waits for the death of all the threads it created during
isc_taskmgr_create().
----
/*
* Wake up any sleeping workers. This ensures we get work done if
* there's work left to do, and if there are already no tasks left
* it will cause the workers to see manager->exiting.
*/
BROADCAST(&manager->work_available);
/*
* Wait for all the worker threads to exit.
*/
for (i = 0; i < manager->workers; i++)
(void)isc_thread_join(manager->threads[i], NULL);
----
I think, named is hanging in isc_thread_join() function as it will
return only when all the threads exited itself. Using a tool, I found
the following information on the threads state before and after the
sending the SIGTERM signal.
Before sending the signal:
Thread PID PPID Ticks Ticks
since since PRI KT_STAT COMMAND WCHAN
run idle
4477 3795 1 50 666 0 TSSLEEP named ksleep_one(0x21)
4481 3795 1 50 666 4 TSSLEEP named ksleep_one(0x21)
4472 3795 1 50 666 3 TSSLEEP named ksleep_one(0x21)
4482 3795 1 88 666 7 TSSLEEP named per_processor_selects+0x1c0
4479 3795 1 88 666 6 TSSLEEP named ksleep_one(0x21)
4480 3795 1 88 666 5 TSSLEEP named ksleep_one(0x21)
4473 3795 1 88 666 2 TSSLEEP named ksleep_one(0x21)
4474 3795 1 100 666 6 TSSLEEP named ksleep_one(0x21)
4476 3795 1 100 666 1 TSSLEEP named ksleep_one(0x21)
4475 3795 1 100 666 4 TSSLEEP named ksleep_one(0x21)
4471 3795 1 6717 680 3 TSSLEEP named
pm_sigwait(0x400003ffffff0e6c)
After sending the signal:
Thread PID PPID Ticks Ticks
since since PRI KT_STAT COMMAND WCHAN
run idle
4474 3795 1 91 666 1 TSSLEEP named ksleep_one(0x21)
4472 3795 1 91 666 1 TSSLEEP named ksleep_one(0x21)
4479 3795 1 91 666 6 TSSLEEP named ksleep_one(0x21)
4481 3795 1 91 666 0 TSSLEEP named ksleep_one(0x21)
4477 3795 1 91 666 7 TSSLEEP named ksleep_one(0x21)
4473 3795 1 91 666 5 TSSLEEP named ksleep_one(0x21)
4476 3795 1 91 666 3 TSSLEEP named ksleep_one(0x21)
4475 3795 1 91 666 4 TSSLEEP named ksleep_one(0x21)
4480 3795 1 91 666 2 TSSLEEP named ksleep_one(0x21)
4482 3795 1 91 666 3 TSSLEEP named per_processor_selects+0xc0
4471 3795 1 92 666 3 TSSLEEP named thread_wait(0x1178)
So before and after named receives SIGTERM signal, we have a thread in select().
The difference is the thread 4471 which went from pm_sigwait() to thread_wait().
Thread 4471 is now waiting on thread 0x1178 (4472) which is sleeping.
For some reason, the threads which are created in
isc_taskmgr_create() are not exiting.
I suspect the problem is in the following code
in lib/dns/task.c file.
if (task->references == 0 &&
TASK_SHUTTINGDOWN(task)) {
/*
* The task is done.
*/
XTRACE(isc_msgcat_get(
isc_msgcat,
ISC_MSGSET_TASK,
ISC_MSG_DONE,
"done"));
finished = ISC_TRUE;
task->state = task_state_done;
}
When the task is created, dispatch() function is called for each thread that
named is creating. So in this case, 8 threads will be executing the dispatch()
function.
I found that following are the sequence for any task that
is created by named.
isc_task_create()
isc_task_attach()
isc_task_sendanddetach()
task_send()
task_ready()
......
......
task_finished()
If only 'finished' variable is set to ISC_TRUE, task_finished() will
be called and the respective tasks is unlinked from the linked list.
Only after all the tasks are unlinked from the list, named will come
out of the first while() loop in dispatch() function. This will cause the
respective thread to exit itself.
So, I suspect the problem might be
1. Due to some reason, some task is not getting detached so that
task->references is not equal to 0.
OR
2. When main thread sends broadcast message, some tasks might not
be receiving and executing the shutdown events.
Are my assumptions right?
It would be great if anyone could give some additional information which
will help in identifying the exact problem.
Thanks,
With Regards,
D.Dhana.
More information about the bind-workers
mailing list