Design issues for nntp storage manager

Russell Vincent russellv at uk.uu.net
Fri May 25 10:32:07 UTC 2001


On Fri, May 25, 2001 at 12:00:56PM +0200, Marco d'Itri wrote:
> As I wrote, diablo people did not find this important enough to
> implement it. I think it should use the Xref server name as an hint, and
> if the article is not found there look on other servers (for
> redundancy).

In most of the Diablo installations I have seen, the Xref: hostname
bears no resemblence to the spool servers and the hostname in the
Xref: line (the Xref: generator) only keeps articles for a few
hours (and hence the readers don't even have access to it). The
articles reaching the readers often don't even pass through the
spool servers (they go via different routes).

When you start implementing large distributed systems, you come
across all sorts of interesting issues like that.

Diablo's current system of requesting all articles from equal
weighted spool servers works very well in very large installations
and I haven't seen a need to change it.

>  >But I'd quite like to be able to cache articles in a local CNFS can,
>  >which means we need the history file.
> dreaderd uses a storage method similar to timecaf and does not use an
> history file but just the overview DB, so it can be done. It does not
> mind being fed the same article two times, as long as the Xref number is
> the same.

You are talking about the header db, but Alex was talking about an
article cache. For the dreaderd article cache, the article is stored
on local disk in a directory/filename made from a hash of the
Message-ID. Before an article is retrieved from the spool (by
Message-ID), dreaderd checks the local cache (if enabled). Very
simplistic, but it works well.

It would probably be reasonably easy to pre-load the cache (with
a bit of extra code), but I have never worked out the need to do
pre-loading of caches. The whole point of a cache is that the first
person loads the article into the cache and the next articles are
found there. The spool retrieval time is pretty negligable on a
well built network anyway. I am aware of some installations that
just turn off caching altogther to reduce disk IO on the reader
frontends. They claim it works very well in that mode.

 -Russell



More information about the inn-workers mailing list