Tradindexed cache entries

Russ Allbery rra at stanford.edu
Sat Mar 13 00:48:55 UTC 2010


Julien ÉLIE <julien at trigofacile.com> writes:

>> Running lsof on innd to see what files it has open when this happens would
>> be interesting.  It would at least confirm if it's overview entries that
>> are being leaked.

> # lsof -c innd | more

[...]

Yes, indeed, you're leaking open overview files out of the cache somehow.
They're falling out of the hash table and hence never being found and
closed until you run out of open file descriptors.

> Hmm...  I see that I have several times the same file...

> innd    20401 news  163u   REG        3,2   30239064  9505754
> /home/news/spool/overview/f/s/p/fr.soc.politique.IDX
> innd    20401 news  164u   REG        3,2  282495821  9505755
> /home/news/spool/overview/f/s/p/fr.soc.politique.DAT

Because it's fallen out of the cache and can't be found, when innd goes to
open the overview for that group again, it opens another copy.

> Is there something else to check?

I think there's either a bug in the tradindexed cache code or in the hash
table implementation, or possibly elsewhere in the server that's
corrupting memory and happening to corrupt the hash table.

> That's strange.
> Maybe a bug I introduced with:
>    http://inn.eyrie.org/trac/changeset/8947/trunk/storage/tradindexed/tradindexed.c
> though I do not see why.

What data_cache_reopen does is this:

    tdx_cache_delete(global->cache, entry->hash);
    data = tdx_data_open(global->index, group, entry);
    if (data == NULL)
        return NULL;
    tdx_cache_insert(global->cache, entry->hash, data);

When the open group data is deleted from the hash, its files are closed if
the reference count is zero.  The cache holds one reference, which it
decrements before checking the count.  Any open search holds an additional
reference.

The behavior here is consistent with a reference leak somewhere.  We're
deleting the record out of the cache so that we can reopen the data files,
but that only closes the old data files if the reference count is zero.
For some reason, it isn't zero.

I suspect that if you did something like this:

--- tradindexed.c       (revision 8989)
+++ tradindexed.c       (working copy)
@@ -230,6 +230,9 @@
     if (data == NULL)
         return false;
     if (artnum > data->high) {
+        if (data->refcount > 1)
+            warn("tradindexed: reopening overview for %s with refcount %d",
+                 group, data->refcount);
         data = data_cache_reopen(tradindexed, group, entry);
         if (data == NULL)
             return false;

you'd find that it produced warnings prior to this sort of incident
happening.

The question is, where's the reference count leak coming from?

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.



More information about the inn-workers mailing list