BIND 10 #2617: MSGQ_RECV_ERR errors on clean shutdown

Tue Jan 29 23:08:09 UTC 2013

#2617: MSGQ_RECV_ERR errors on clean shutdown
-------------------------------------+-------------------------------------
            Reporter:  jreed         |                        Owner:
                Type:  defect        |                       Status:  new
            Priority:  low           |                    Milestone:
           Component:  msgq          |  Sprint-20130205
            Keywords:                |                   Resolution:
           Sensitive:  0             |                 CVSS Scoring:
         Sub-Project:  Core          |              Defect Severity:  Low
Estimated Difficulty:  Discuss (4?)  |  Feature Depending on Ticket:
         Total Hours:  0             |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------

Comment (by jinmei):

 First off, I cannot reproduce it.  In some cases I see these messages
 on shutdown:

 {{{
 2013-01-29 14:57:23.926 ERROR [b10-msgq.msgq/25245] MSGQ_SEND_ERR Error
 while sending to socket 9: EPIPE
 2013-01-29 14:57:23.926 ERROR [b10-msgq.msgq/25245] MSGQ_READ_UNKNOWN_FD
 Got read on strange socket 9
 }}}

 but I can never successfully produce the errors reported in this message.

 In any case, according to the code, I suspect these are in fact
 "erroneous" events, in that something unexpected is happening within
 the system, like msgq encounters EPIPE or EOF in the middle of
 handling a message.  It's true that these are something msgq cannot
 always control, but it's still a kind of error within the entire BIND
 10 system.  I guess what's happening is something like this: one
 process sends a "I'm quitting" message to some other process but the
 other process terminates before accepting it.  So, the right fix
 should be to clarify the system-shutdown process and implement it
 correctly (terminate the processes in the expected order, guaranteeing
 necessary synchronization).  If it doesn't work that way it's
 reasonable to report such events as an ERROR at msgq.

 But, such higher level fix will be beyond the scope of this task.
 So my suggestion is to keep the error level but clarify these things
 in the detailed version of log descriptions.  Even if people don't
 read them that seems to be the right way to handle this matter than
 pretending there's no issue by lowering the log level.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2617#comment:5>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development