BIND 10 #356: Resolver address database

Thu Dec 9 18:46:25 UTC 2010

#356: Resolver address database
-------------------------------+--------------------------------------------
      Reporter:  stephen       |        Owner:  ocean    
          Type:  enhancement   |       Status:  reviewing
      Priority:  major         |    Milestone:           
     Component:  Unclassified  |   Resolution:           
      Keywords:                |    Sensitive:  0        
Estimatedhours:  0.0           |        Hours:  0        
      Billable:  1             |   Totalhours:  0        
      Internal:  0             |  
-------------------------------+--------------------------------------------
Changes (by stephen):

  * owner:  stephen => ocean

Comment:

 '''src/lib/nsas/tests/nameserver_entry_test.cc'''[[BR]]
 Changes are OK but in checking them I noticed what appears to be an error
 in the checks in test !AddressSelection.  The first test attempts to check
 that three addresses are selected with equal probability and tests this by
 iterating 10,000 times and incrementing one of three counters (c1, c2,
 c3).  The test is then:
 {{{
     // c1, c2 and c3 should almost be equal
     ASSERT_EQ(1, (int)(c1*1.0/c2 + 0.5));
     ASSERT_EQ(1, (int)(c2*1.0/c3 + 0.5));
 }}}
 But these two assertions are true if (for example) c1 = 4713, c2 = 3163
 and c3 = 2124, i.e. c1 > 2*c3 - which is a significant deviation from
 equal probability.

 (Figures obtained by noting that {{{(int) 1.99 = 1}}} then setting
 {{{c1/c2 = c2/c3 = 1.49}}} and {{{c1+c2+c3 = 10000}}}.  BTW the iteration
 count is 10,000 for the first loop in this test and 100,000 for the rest -
 I suggest that the higher number be used.)

 Similar comments apply to the ASSERT_EQ checks at the end of the test.

 Testing the output of a random number generator is an involved process.  I
 am not an expected on statistics, but I analyze the first test (where we
 expect each of three addresses to be selected with equal probability) as
 follows:

 For each address, each time through the loop we expected it to be selected
 with a probability of p (= 1/3) and not selected with a probability of q
 (= 1-p = 2/3).  This suggests a binomial distribution, but as the number
 of selections (N) is large, we end up with a normal distibution with mean
 Np (= N/3) and variance of Npq (= 2N/9) (see for example
 [http://mathworld.wolfram.com/NormalDistribution.html]).  With N =
 100,000, the standard deviation (= sqrt(variance)) is about 149 or roughly
 0.15% of N.

 Although we want an even distribution, it doesn't really matter too much
 if the numbers vary slightly.  So checking that the actual number of
 selections of a given address is within 1% of its expected value is
 probably good enough.  With the figures used here, one percent of N away
 from the mean is over six standard deviations, and we expect that actual
 value will be closer than this to the expected value well over 99.99% of
 the time.

 In summary, testing that "abs(c1 - (N/3)) <= (N/100)" (and the same for c2
 and c3) should check that they are selected with equal probability and
 work virtually every time.

 Note that slightly different values are required when the probabilities
 are unequal.

-- 
Ticket URL: <http://bind10.isc.org/ticket/356#comment:19>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development