BIND 10 #2459: call to getCachedZoneWriter must be protected by mutex

Sun Nov 4 02:40:11 UTC 2012

#2459: call to getCachedZoneWriter must be protected by mutex
-------------------------------------+-------------------------------------
            Reporter:  jinmei        |                        Owner:
                Type:  defect        |                       Status:  new
            Priority:  very high     |                    Milestone:  Next-
           Component:  b10-auth      |  Sprint-Proposed
           Sensitive:  0             |                     Keywords:
         Sub-Project:  DNS           |              Defect Severity:  N/A
Estimated Difficulty:  0             |  Feature Depending on Ticket:
         Total Hours:  0             |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
 I just noticed one big issue in the thread based zone reloading:
 there can be an inter-thread race for the same DB connection (if
 I understand it correctly).

 Consider an sqlite3 data source that doesn't use in-memory "cache".
 If b10-auth receives a loadzone command for one of the zones
 of the data source (either by manually, or more likely from xfrin
 or DDNS), the separate builder thread calls `getCachedZoneWriter()`.
 This method internally uses the `SQLite3` connection to identify the
 zone (since this data source doesn't have an in-memory cache).  But
 the main thread also uses the same connection for normal query
 processing, and both threads run in parallel in this case.

 SQLite3 doesn't ensure such an operation succeeds:
 http://www.sqlite.org/faq.html#q6
 (btw same kind of restriction seems to apply to PostgreSQL and MySQL).

 This problem seems deep (will create a separate ticket), but for now I
 suggest an urgent care fix: just protect the call to
 `getCachedZoneWriter()` using the mutex.

 I also suggest one piggy back fix: don't treat this case as an error:
 {{{#!cpp
     case datasrc::ConfigurableClientList::ZONE_NOT_CACHED:
         isc_throw(InternalCommandError, "failed to load zone " << origin
                   << "/" << rrclass << ": not served from memory");
 }}}
 because it can happen in the scenario described above, and if it's
 from xfrin or DDNS it's not really an error.  My suggestion is to
 just log it at the DEBUG level and return a NULL pointer.  In
 doLoadZone() we check the return value and if it's NULL just return.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2459>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development