BIND 10 #642: SIGHUP and other signals cause boss to leave BIND 10 processes lying around
BIND 10 Development
do-not-reply at isc.org
Fri Apr 8 13:10:42 UTC 2011
#642: SIGHUP and other signals cause boss to leave BIND 10 processes lying around
-------------------------------------+-------------------------------------
Reporter: shane | Owner: shane
Type: defect | Status: reviewing
Priority: minor | Milestone:
Component: Boss of | Sprint-20110419
BIND | Resolution:
Keywords: | Sensitive: 0
Estimated Number of Hours: 0.0 | Add Hours to Ticket: 0
Billable?: 1 | Total Hours: 0
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by shane):
I sent a mail to the bind10-dev list about this:
{{{
From: Shane Kerr <shane at isc.org>
To: bind10-dev <bind10-dev at lists.isc.org>
Date: Fri, 18 Mar 2011 14:30:01 +0100
Subject: [bind10-dev] Handling Disappearing Terminals
All,
We need to think about what happens to the server when the terminal it
is running in disappears.
History
-------
(Skip if you are impatient for the good stuff.)
At the end of last month, Jeremy sent a mail about his problems setting
up a forwarding resolver:
https://lists.isc.org/pipermail/bind10-dev/2011-February/002038.html
He reported this:
I know why my bind10 was killed; it doesn't daemonize so when I
closed terminal it was running in, it was killed -- but
sometimes children didn't get killed. HUP or whatever signal was
not trapped or passed to children?
This led me to make a ticket so that we handle SIGHUP and other signals
that might kill the boss process:
http://bind10.isc.org/ticket/642
However, Michal noted that this didn't seem to do anything at all when
he started a process in the background and the terminal was closed. So I
had a look and discovered that the behavior for processes varies quite a
bit depending on the exact details of how the controlling terminal goes
away.
Details of Terminal Closing
---------------------------
I looked at what happens to a process under 3 ways of being started:
1. Running the program
2. Using "su" and then running the program
3. Using "sudo" to run the program
My theory was that there may be slightly different things done, and it
turns out that is true.
I tried 3 types of test:
A. Start program and close the terminal window
B. Start program in the background (with & at the shell) then logout
C. Start program in the background then close the terminal window
I wrote small Python programs to use for this test, to concentrate on
figuring out the behavior.
My 1st program intercepted all signals possible, and then just waited
around for a KILL signal. :)
My 2nd program intercepted all signals possible, and then wrote a stream
to STDOUT in a loop.
My 3rd program intercepted all signals possible, and then used select()
to see if anything was available for reading, and tried to read if it
was.
Results:
--[ 1: idle ]----------------------------------------------------------
Start/Close Background/Logout Background/Close
normal SIGHUP nothing SIGHUP
su nothing nothing nothing
sudo SIGHUP (3x) nothing SIGHUP
--[ 2: writing ]-------------------------------------------------------
Start/Close Background/Logout Background/Close
normal SIGHUP, SIGTSTP, err err SIGHUP, err
su err, SIGHUP, SIGTSTP err err
sudo SIGHUP, SIGTSTP, err err SIGHUP, err
--[ 3: reading ]-------------------------------------------------------
Start/Close Background/Logout Background/Close
normal SIGHUP, EOF EOF SIGHUP, EOF
su EOF SIGTTIN, EOF SIGTTIN, EOF
sudo SIGHUP, err SIGTTIN, SIGTSTP, SIGHUP, EOF
SIGTERM, SIGTSTP, EOF
If more than one thing happened, they are listed in the order they
occurred.
Key:
SIGXXX is a signal arriving
err is an I/O error (either writing or reading)
EOF means a read return 0 bytes, indicating EOF
Michal's Observation
--------------------
I think we can understand Michal's results:
* When the terminal window closed, the boss got no signal at all.
* Then when one of the child processes tried to output some
message, it got a write error.
* When the boss caught the dying child, it tried to output a
message explaining this and *also* got a write error.
* Over time, more and more children got write errors and died.
Analysis
--------
The boss process can adapt itself to handle the terminal going away,
because based on the research above, we can detect this and change
outputs so that they go to /dev/null (or better yet so they call empty
functions).
The problem becomes what we do with child processes. If we want them to
write to the console, then they will get some sort of error too.
* We could let the children die, and restart them, but this is...
inelegant.
* We could perhaps have the boss act as a proxy and use pipes to
read the output.
* We could do the same thing, but with pseudo-ttys. Python even
has a module for this:
http://docs.python.org/py3k/library/pty.html
* We could shut down.
I realize some people want us to 'properly' daemonize. This would make
the problem go away, but we'll have to change all of the processes to
live in such an environment, and we'll *still* have to deal with these
issues when the program is run in the equivalent of '-f' or '-g' from
BIND 9 (run in foreground).
Please let me know what you think.
--
Shane
_______________________________________________
bind10-dev mailing list
bind10-dev at lists.isc.org
https://lists.isc.org/mailman/listinfo/bind10-dev
}}}
--
Ticket URL: <http://bind10.isc.org/ticket/642#comment:8>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list