BIND9.2.0 not killing on SIGTERM signal
Dhanasekaran
dhana at india.hp.com
Wed Jun 5 14:44:14 UTC 2002
Hello,
Did anyone had chance to look into this mail?
Any additional information on this problem would be greatly
appreciated.
Thanks in advance,
With Regards,
D.Dhana.
Dhanasekaran wrote:
> Hi,
>
> I am running BIND9.2.0 nameserver on multiprocessor system (8 CPU).
> When named receives SIGTERM signal, named is not getting killed. This
> problem occurs intermittently. Most of the times named exits properly.
>
> I instrumented the code and found the following in the syslog.
>
> May 24 11:34:17 tst1 named[3795]: isc_app_run: Return value of sigwait()
> :0
> May 24 11:34:17 tst1 named[3795]: isc_app_run: signal received :15
> May 24 11:34:17 tst1 named[3795]: isc_app_run: Setting want_shutdown to
> ISC_TRUE
> May 24 11:34:17 tst1 named[3795]: isc_app_run: Returning SUCCESS
> May 24 11:34:17 tst1 named[3795]: main: Return value of isc_app_run(): 0
> May 24 11:34:17 tst1 named[3795]: main: Calling cleanup()
> May 24 11:34:17 tst1 named[3795]: shutting down: flushing changes
> May 24 11:34:17 tst1 named[3795]: stopping command channel on
> 127.0.0.1#953
> May 24 11:34:17 tst1 named[3795]: no longer listening on 192.1.1.152#53
> May 24 11:34:17 tst1 named[3795]: no longer listening on 15.14.148.121#53
> May 24 11:34:17 tst1 named[3795]: no longer listening on 127.0.0.1#53
>
> From the above log, it is confirmed that main thread receives the
> SIGTERM signal and cleanup() function is called. In cleanup() function
> destroy_managers() are called to destroy the task, timer and socket
> managers.
>
> In isc_taskmgr_destroy(), the main thread is sending shutdown message
> to all the tasks and waits for the death of all the threads it created during
> isc_taskmgr_create().
> ----
> /*
> * Wake up any sleeping workers. This ensures we get work done if
> * there's work left to do, and if there are already no tasks left
> * it will cause the workers to see manager->exiting.
> */
> BROADCAST(&manager->work_available);
>
> /*
> * Wait for all the worker threads to exit.
> */
> for (i = 0; i < manager->workers; i++)
> (void)isc_thread_join(manager->threads[i], NULL);
> ----
>
> I think, named is hanging in isc_thread_join() function as it will
> return only when all the threads exited itself. Using a tool, I found
> the following information on the threads state before and after the
> sending the SIGTERM signal.
>
> Before sending the signal:
>
> Thread PID PPID Ticks Ticks
> since since PRI KT_STAT COMMAND WCHAN
> run idle
>
> 4477 3795 1 50 666 0 TSSLEEP named ksleep_one(0x21)
> 4481 3795 1 50 666 4 TSSLEEP named ksleep_one(0x21)
> 4472 3795 1 50 666 3 TSSLEEP named ksleep_one(0x21)
> 4482 3795 1 88 666 7 TSSLEEP named per_processor_selects+0x1c0
>
> 4479 3795 1 88 666 6 TSSLEEP named ksleep_one(0x21)
> 4480 3795 1 88 666 5 TSSLEEP named ksleep_one(0x21)
> 4473 3795 1 88 666 2 TSSLEEP named ksleep_one(0x21)
> 4474 3795 1 100 666 6 TSSLEEP named ksleep_one(0x21)
> 4476 3795 1 100 666 1 TSSLEEP named ksleep_one(0x21)
> 4475 3795 1 100 666 4 TSSLEEP named ksleep_one(0x21)
> 4471 3795 1 6717 680 3 TSSLEEP named
> pm_sigwait(0x400003ffffff0e6c)
>
> After sending the signal:
>
> Thread PID PPID Ticks Ticks
> since since PRI KT_STAT COMMAND WCHAN
> run idle
>
> 4474 3795 1 91 666 1 TSSLEEP named ksleep_one(0x21)
> 4472 3795 1 91 666 1 TSSLEEP named ksleep_one(0x21)
> 4479 3795 1 91 666 6 TSSLEEP named ksleep_one(0x21)
> 4481 3795 1 91 666 0 TSSLEEP named ksleep_one(0x21)
> 4477 3795 1 91 666 7 TSSLEEP named ksleep_one(0x21)
> 4473 3795 1 91 666 5 TSSLEEP named ksleep_one(0x21)
> 4476 3795 1 91 666 3 TSSLEEP named ksleep_one(0x21)
> 4475 3795 1 91 666 4 TSSLEEP named ksleep_one(0x21)
> 4480 3795 1 91 666 2 TSSLEEP named ksleep_one(0x21)
> 4482 3795 1 91 666 3 TSSLEEP named per_processor_selects+0xc0
> 4471 3795 1 92 666 3 TSSLEEP named thread_wait(0x1178)
>
> So before and after named receives SIGTERM signal, we have a thread in select().
> The difference is the thread 4471 which went from pm_sigwait() to thread_wait().
> Thread 4471 is now waiting on thread 0x1178 (4472) which is sleeping.
>
> For some reason, the threads which are created in
> isc_taskmgr_create() are not exiting.
>
> I suspect the problem is in the following code
> in lib/dns/task.c file.
>
> if (task->references == 0 &&
> TASK_SHUTTINGDOWN(task)) {
> /*
> * The task is done.
> */
> XTRACE(isc_msgcat_get(
> isc_msgcat,
> ISC_MSGSET_TASK,
> ISC_MSG_DONE,
> "done"));
> finished = ISC_TRUE;
> task->state = task_state_done;
> }
>
> When the task is created, dispatch() function is called for each thread that
> named is creating. So in this case, 8 threads will be executing the dispatch()
> function.
>
> I found that following are the sequence for any task that
> is created by named.
>
> isc_task_create()
> isc_task_attach()
> isc_task_sendanddetach()
> task_send()
> task_ready()
> ......
> ......
> task_finished()
>
> If only 'finished' variable is set to ISC_TRUE, task_finished() will
> be called and the respective tasks is unlinked from the linked list.
> Only after all the tasks are unlinked from the list, named will come
> out of the first while() loop in dispatch() function. This will cause the
> respective thread to exit itself.
>
> So, I suspect the problem might be
>
> 1. Due to some reason, some task is not getting detached so that
> task->references is not equal to 0.
> OR
> 2. When main thread sends broadcast message, some tasks might not
> be receiving and executing the shutdown events.
>
> Are my assumptions right?
>
> It would be great if anyone could give some additional information which
> will help in identifying the exact problem.
>
> Thanks,
>
> With Regards,
> D.Dhana.
More information about the bind-workers
mailing list