BIND 10 #296: Timeouts on msgq's cc-channel

Mon Aug 16 18:29:51 UTC 2010

#296: Timeouts on msgq's cc-channel
-------------------------------+--------------------------------------------
      Reporter:  jelte         |        Owner:  jelte               
          Type:  enhancement   |       Status:  reviewing           
      Priority:  major         |    Milestone:  y2 6 month milestone
     Component:  Unclassified  |   Resolution:                      
      Keywords:                |    Sensitive:  0                   
Estimatedhours:  0.0           |        Hours:  0                   
      Billable:  1             |   Totalhours:  0                   
      Internal:  0             |  
-------------------------------+--------------------------------------------
Changes (by jinmei):

  * owner:  jinmei => jelte

Comment:

 Replying to [comment:15 jelte]:

 > >  - is it okay to do io_service().reset()?  According to the ASIO
 reference "this function must not be called while there are any unfinished
 calls".  I'm afraid we cannot assume such a thing in general.
 >
 > thing is, we can't assume we don't need it either; if io_service has
 decided it ran out of work, it'll stop doing anything on run_one(), and
 our while loop will run forever (this happens in one of the test cases
 here). The documentation also says that you need to call it before any
 second or later set of calls to run() or run_one().
 >
 Hmm, I see, but I'm afraid reset() can acutally "be called while there are
 unfinished calls to run()" in practice.  In fact, if I understand the code
 correctly, almost all calls to SessionImpl::readData() are performed
 inside a main io_service loop (i.e run()) in case of b10-auth (exceptions
 are the very first ones to establish sessions before starting a mail
 loop).

 From a quick look at the implementation of reset() (which I think is in
 asio/detail/xxx_io_service.hpp), reset() actually doesn't trigger an
 exception or otherwise indicate an error even if the assumption isn't met.
 But I'm not so sure if this means this usage is actually safe or it
 actually breaks something fundamental inside io_serivce which could cause
 strange failures in rare cases.

 One possible workaround would be
  - to set session timeout to 0 (meaning no timeout) in b10-auth just
 before starting the main loop, and
  - in readData(), skip the reset()->run_one() trick completely, if
 getTimeout() == 0

 And we should eventually integrate it to the asio wrapper so that we can
 handle this in the same io_service().run context or make everything
 asynchronous.

 > >  - I don't understand why we need to check getTimeout() != 0 here:
 > > {{{
 > >             if (read_result && getTimeout() != 0) {
 > > }}}
 >
 > If timeout is set to 0, the timer isn't started, and waiting for it to
 end would result in an eternal loop (after data has arrived). Added
 comment.
 >
 Hmm, but doesn't timer.cancel() triggers the setResult() event for the
 timer?  Perhaps it's not the case when the timer hasn't started in the
 first place?

 > >  - The following line isn't necessarily incorrect but looks a bit
 awkward:
 > > {{{
 > >         if (read_result->value() != 0) {
 > > }}}
 > >   why not this?  This seems to be more consistent with the other usage
 of asio::error_code in this code:
 > > {{{
 > >         if (*read_result) {
 > > }}}
 >
 > Personal preference; I don't have a strong opinion, but it is more clear
 to me as a reader what we are checking for here (success result). Unless
 i'm wrong in reading the code, there's no asio::error_code::success or
 something similar, so this would have to do. If you think *read_result
 directly is more clear, we can change it.
 >
 To me, the problem of this code is that it implicitly assumes the
 relationship between the low level value() and the abstract level of
 asio::error_code, that is, it assumes if value() is non 0 it means an
 error (not a success).

 This is true in the current implementation, and it's quite unlikely to
 change in future versions, but it's not guaranteed (the reference only
 says value() returns "the error value").  What if, again unlikely, a
 future version of ASIO introduces a new "success" error_category to
 separate different types of successful results?  In that case value()
 would return non 0 values while it can still indicate a success via
 operator! or the type conversion operator.

 I'm not sure if I fully understand "there's no asio::error_code::success
 or something similar", but if what you mean is a way to know if a
 particular error_code indicates a success or failure, you can use
 operator! or the type conversion operator (to bool), and the reference
 explicitly says so (unlike the semantics of value()):

  - operator unspecified_bool_type: Operator returns non-null if there is a
 non-success error code.
  - operator!: Operator to test if the error represents success.

 In fact, this code explicitly tests if there's a non-success error code
 using "operator unspecified_bool_type":

 {{{
         if (*read_result) {
 }}}

 Having said that, these are relatively pedantic argument, and it's not or
 probably won't be different from the current code usig value() in
 practice.  So, if this is not convincing and you still think value() is a
 more intuitive representation, please move forward with it.  (but please
 make a comment that the code assumes the implicit relationship between the
 raw value() and the success/failure semantics).

-- 
Ticket URL: <http://bind10.isc.org/ticket/296#comment:16>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development