makehistory performance tests
Heath Kehoe
heath.kehoe at intermec.com
Tue Aug 8 23:43:08 UTC 2000
Ok, I've been running some performance tests with makehistory.
All of the tests were run on a server with a populated CNFS
spool, but with no innd running, so the spool was the same
for each run. I started makehistory with an empty overview,
and after an hour did a "inndf -n" to get a count of overview
records.
I varied the sorttype and -F/-l options to get an idea of
what combination yielded the best results. The "nosort"
type is using Katsuhiro's recent patch; in which each OV
record is stored as it is read from the spool (no batching).
First, ovdb:
ovdb
sorttype options # overview after one hour
-------------------------------------------------------------------
newsgroup -O -x 764804
newsgroup -O -x -l 10000 709484
newsgroup -F -O -x 783193
newsgroup -F -O -x -l 10000 852740 *
nosort -O -x 716960
* I noticed during this run that the forked child (the one storing
overview) always finished before the next 'batch' was ready.
This means that the data was being stored to overview faster than
it could be read from the spool; so the spool was the limiting
factor.
And here's the buffindexed results. The -F option
does not work with buffindexed, BTW. This is because the
second OVopen (in the child process) fails with:
"buffindexed: dupulicate index in line '0'".
I hacked makehistory so that I could use -F with buffindexed;
by hard-coding "sorttype" in makehistory.c.
buffindexed
sorttype options # overview after one hour
-------------------------------------------------------------------
arrived -O -x 544451
arrived -O -x -l 10000 593206
arrived -F -O -x 590870
arrived -F -O -x -l 10000 621499
newsgroup -O -x 544451
newsgroup -O -x -l 10000 577456
nosort -O -x 622958
In all of the cases, setting -l (the batch size) to 10000
(the default is 100000) improved performance.
The additional time needed to sort the larger batches
outweighs any improvement in storing the data.
buffindexed had nearly the same performance in the
unsorted mode as the "arrived/-F/10000" mode. This
leads me to believe that buffindexed does not benefit
enough from sorting to overcome the additional time
needed to perform the sorts.
While using fork mode (-F) showed improvement, the
improvement was smaller then I expected. The reason for
this is that the test machine I used has its spool and
overview drives on the same (slow) SCSI bus. The
spool-reading-process and overview-writing-process
were sharing the same I/O path. On a server with
faster or multiple busses, the Fork mode would show
a much larger improvement.
More information about the inn-workers
mailing list