BIND 10 #2942: b10-auth terminated with signal 6, Aborted.

Mon May 6 21:16:30 UTC 2013

#2942: b10-auth  terminated with signal 6, Aborted.
-------------------------------------+-------------------------------------
            Reporter:  jreed         |                        Owner:
                Type:  defect        |                       Status:  new
            Priority:  very high     |                    Milestone:  Next-
           Component:  b10-auth      |  Sprint-Proposed
            Keywords:                |                   Resolution:
           Sensitive:  0             |                 CVSS Scoring:
         Sub-Project:  DNS           |              Defect Severity:  Very
Estimated Difficulty:  0             |  High
         Total Hours:  0             |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------

Comment (by jinmei):

 This is a use-after-free problem.

 According to the backtrace, what happened appears to be:
 - b10-auth first sets listen_on at the time of creating
   `ModuleCCSession`.  It eventually creates corresponding
   `SyncUDPServer` (and other `DNSServer`) objects, which register
   themselves with read event callbacks for the ASIO io_service.
 - In the case of `SyncUDPServer`, async_receive_from() is called,
   which first checks if there's any readable data using non blocking
   I/O, and if there is post an event to the main loop of the
   io_service (at this point such events cannot be canceled any more).
   As the production AS112 server is very busy, I guess the `SyncUDPServer`
   effectively starts receiving queries at this point.
 - b10-auth then calls configureAuthServer() to all install user
   configurations.  For listen_on, this means the previously created
   `SyncUDPServer` (and other `DNSServer`) objects are destroyed:
 {{{#!cpp
 void
 DNSService::clearServers() {
     BOOST_FOREACH(const DNSServiceImpl::DNSServerPtr& s, impl_->servers_)
 {
         s->stop();
     }
     impl_->servers_.clear();
 }
 }}}
 - SyncUDPServer::stop() closes the socket, but at this point any
   already posted completed read event isn't affected.  It will be
   still trigger the read callback eventually.
 - the `SyncUDPServer` object itself is destroyed at this point.
 - then, to complete configureAuthServer(), b10-auth communicates with
   other modules via the CC session, which causes calls to
   io_service::run_one().  One of such calls result in the call to the
   `SyncUDPServer` callback (this is actually happening according to
   the backtrace)
 - but at this point `this` server object has already been destroyed,
   so any behavior that relies on the local member variables of the
   object is undefined.  That should be the reason we saw "calling pure
   virtual" or other strange crash.

 Before #2903, this code should somehow prevented catastrophic results:
 {{{#!cpp
     if (checkin_callback_ != NULL) {
         (*checkin_callback_)(message);
         if (stopped_) {
             return;
         }
     }
 }}}

 While it still uses local member variables of the destroyed object and
 should cause crash or other troubles, we were probably lucky and both
 `checkin_callback_` and `stopped_` retained the original value.

 So, simply reverting #2903 wouldn't be a real solution.  What we
 should do is to control the lifetime of the server objects so they are
 not destroyed until all posted events are completed.

 But, for the 1.1.0-release (beta), I suggest a workaround that is not
 really correct but just as bad as pre-#2903
 (http://bind10.isc.org/raw-attachment/ticket/2942/udp-server3.diff),
 and developing a complete fix (not only for `SyncUDPServer` but also
 for other DNS server classes) separately.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2942#comment:18>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development