Tradindexed cache entries
Russ Allbery
rra at stanford.edu
Sat Mar 13 00:48:55 UTC 2010
Julien ÉLIE <julien at trigofacile.com> writes:
>> Running lsof on innd to see what files it has open when this happens would
>> be interesting. It would at least confirm if it's overview entries that
>> are being leaked.
> # lsof -c innd | more
[...]
Yes, indeed, you're leaking open overview files out of the cache somehow.
They're falling out of the hash table and hence never being found and
closed until you run out of open file descriptors.
> Hmm... I see that I have several times the same file...
> innd 20401 news 163u REG 3,2 30239064 9505754
> /home/news/spool/overview/f/s/p/fr.soc.politique.IDX
> innd 20401 news 164u REG 3,2 282495821 9505755
> /home/news/spool/overview/f/s/p/fr.soc.politique.DAT
Because it's fallen out of the cache and can't be found, when innd goes to
open the overview for that group again, it opens another copy.
> Is there something else to check?
I think there's either a bug in the tradindexed cache code or in the hash
table implementation, or possibly elsewhere in the server that's
corrupting memory and happening to corrupt the hash table.
> That's strange.
> Maybe a bug I introduced with:
> http://inn.eyrie.org/trac/changeset/8947/trunk/storage/tradindexed/tradindexed.c
> though I do not see why.
What data_cache_reopen does is this:
tdx_cache_delete(global->cache, entry->hash);
data = tdx_data_open(global->index, group, entry);
if (data == NULL)
return NULL;
tdx_cache_insert(global->cache, entry->hash, data);
When the open group data is deleted from the hash, its files are closed if
the reference count is zero. The cache holds one reference, which it
decrements before checking the count. Any open search holds an additional
reference.
The behavior here is consistent with a reference leak somewhere. We're
deleting the record out of the cache so that we can reopen the data files,
but that only closes the old data files if the reference count is zero.
For some reason, it isn't zero.
I suspect that if you did something like this:
--- tradindexed.c (revision 8989)
+++ tradindexed.c (working copy)
@@ -230,6 +230,9 @@
if (data == NULL)
return false;
if (artnum > data->high) {
+ if (data->refcount > 1)
+ warn("tradindexed: reopening overview for %s with refcount %d",
+ group, data->refcount);
data = data_cache_reopen(tradindexed, group, entry);
if (data == NULL)
return false;
you'd find that it produced warnings prior to this sort of incident
happening.
The question is, where's the reference count leak coming from?
--
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
Please send questions to the list rather than mailing me directly.
<http://www.eyrie.org/~eagle/faqs/questions.html> explains why.
More information about the inn-workers
mailing list