[bind10-dev] Nameserver Address Store and RFC 2308 - Comments Requested

Fri Nov 26 10:43:27 UTC 2010

On 25 Nov 2010, at 23:43, Jerry Scharf wrote:

> Stephen,
> 
> Very cool paper (thanks Jinmei). It got me going to do some statistical analysis of the tables in the paper. Just from a quick look, the BIND 8 selection algorithm seems sub-optimal.
> 
> So I decided to take three of the algorithms, best server, d^2 and BIND 8, and see how they compared using some coarse metrics. I had the rtt from the data and created a column called server spread. This calculated the percentage of traffic diverted from the best server for each algorithm. I attached the spreadsheet so people can look at the details.
> 
> For best server the spread is 0 (by definition) and the rtt sets the baseline for the other two algorithms.
> for the d^2 algorithm, the average spread was 67% (2/3rds of traffic did not go to the best server) and there was a 57% increase in rtt.
> for the BIND 8 algorithm, there was 63% spread in traffic and a 168% increase in rtt.
> 
> To me, it seems like the d^2 algorithm produces far less penalty while still moving most of the load off the primary server. There will still need to be some method for testing low use server at some interval, but this can be very simple.  There will still need to be the weighted averaging of rtts.

I agree about the weighted averaging of RTTs.  If a server has a transiently high RTT, we don't want to mark it with a high RTT immediately as that will immediately drop the frequency with which it is accessed.  The smoothing algorithm used by BIND (equation 1 in the paper) should serve to dampen the transient.

Having said that, it has just occurred to me that a transient is time-dependent whereas the smoothing is query-dependent.  What I mean is that if at the time a server has a transiently high RTT and a burst of queries is made to it (and so the RTT is recalculated multiple times in a short period), the resultant RTT associated with it will be higher than if a single query were made in that period.  Of course, the severity of the effect depends on the smoothing parameter (the term alpha in the equation), but it is there.  At the expense of some complexity we could ensure that the RTT were updated no more than (say) once a second. 

I agree about use of the d^2 (or, more accurately, 1/d^2) algorithm.  Discussions between Ocean and I had touched on (1/d), but I think this sends too high a fraction of traffic to high-RTT servers (With two servers having RTTs of 1ms and 9ms, that fraction is 10%.)  (1/d^2) _seems_ about right (roughly 1%) but higher powers could be used - a (1/d^3) would send about 0.14% of traffic that way.  Mind you, again that could depend on query load - 1% of traffic to a high RTT server might be right for a low query rate, but for higher query rates would 0.1% be better? 

In both case, we could make the algorithm adaptive to query load.  But as we have no experience of how the (1/d^2) algorithm works in practice, I think that we should resist that temptation until we get some experience.

Stephen