No subject

Fri Feb 17 00:34:30 UTC 2012

- at least here at two CPUs - as there is only one sched_yield() call
per 72 requests, and only 2-3 ksleeps.

So, the next step is to kit the thing with prospect [1]. The prospect
profile of the named (+O2 compilation - aka -O). 

Here is the massaged profile of the first thread in named. There are
over 304 routines in the list, only the first ~20 are shown here:

 11%  11% 1711             pthread_mutex_lock  /usr/lib/libpthread.1
  5%  15%  723               dns_rbt_findnode    /var/named/named.O2
  4%  19%  657           pthread_mutex_unlock  /usr/lib/libpthread.1
  4%  23%  572           dns_name_fullcompare    /var/named/named.O2
  3%  26%  508           dns_name_concatenate    /var/named/named.O2
  3%  29%  468                    __spin_lock  /usr/lib/libpthread.1
  3%  32%  421      dns_name_getlabelsequence    /var/named/named.O2
  2%  34%  383                     detachnode    /var/named/named.O2
  2%  37%  375                         memcpy        /usr/lib/libc.2
  2%  39%  326                 dns_name_split    /var/named/named.O2
  2%  41%  312                 dns_name_equal    /var/named/named.O2
  2%  43%  310             pthread_mutex_lock  /usr/lib/libpthread.1
  2%  44%  301                           doit    /var/named/named.O2
  2%  46%  286                  __spin_unlock  /usr/lib/libpthread.1
  2%  48%  252                    set_offsets    /var/named/named.O2
  2%  49%  249                  dns_name_hash    /var/named/named.O2
  2%  51%  249           pthread_mutex_unlock  /usr/lib/libpthread.1
  1%  52%  226                     chain_name    /var/named/named.O2
  1%  54%  194                  new_reference    /var/named/named.O2
  1%  55%  184                     cache_find    /var/named/named.O2

The remaining ~45% of the CPU time is scattered across a further ~284
routines... 

If there is not much contention on the mutexes, and there appears to
be a non-trivial quantity of time spent in the pthread mutext
routines, it could imply that there is a bit too fine a granularity on 
the locking and there are simply many locks being acquired and
released on each request. It could also suggest that further tuning of
the path lengths in libpthread are in order :) One way to check either 
hypothesis is to log each request and mutex lock and simply count the
number of mutex locks per request. Of course, if the author looked
deeply into the source, there might even be some statistics already
available... :)

A quick examination of the hitcounts shows that out of 15,904 sample
hits, 3955 of them were in libpthread and 10,904 of then were in
named. So, roughly 25% of the time was spent in libpthread, 69% in
named, and 6% elsewhere.

A second thread had a substantially similar profile.

[1] prospect - ftp://ftp.cup.hp.com/dist/networking/tools/