lock between nnrpd and expireover

Russ Allbery rra at stanford.edu
Mon Sep 1 17:47:50 UTC 2003


mark heather <mark.heather at btinternet.com> writes:

> I have recently migrated from an older version of inn to inn2.4 on
> seperate servers (Solaris 2.6). Using cnfs for storage and tradindexed
> for overview method.

> All works fine with no reported feed or client problems.

> However, the following error fills up my news.daily report each night :

> expireover: tradindexed: cannot lock group entry at 65544: Resource temporarily unavailable
> expireover: tradindexed: cannot unlock group entry at 65544: Resource temporarily unavailable
> expireover: tradindexed: cannot lock group entry at 65616: Resource temporarily unavailable
> expireover: tradindexed: cannot unlock group entry at 65616: Resource temporarily unavailable
> expireover: tradindexed: cannot lock group entry at 65688: Resource temporarily unavailable
> and so on for several megabytes.

> Truss'ing the expireover process, the process is trying to get a lock on
> the file ./news/spool/overview/groups.index but all the nnrpd processes
> also have a lock on this file. This causes the lock acquisition fail and
> the error message generated.

Each nnrpd process only locks the region of the file corresponding to the
current group, and expireover locks the region of the file corresonding to
the group that it's currently expiring.  So they shouldn't have that sort
of contention.  Plus, even if the locks do contend, expireover should
block waiting for the lock, not have fcntl return EAGAIN (which appears to
be what's happening).

I must admit I've not tested this specifically on Solaris 2.6, but I
didn't think Solaris 2.6 had any problems with its fcntl implementation.
lib/lockfile.c has the relevant code, and block is set to true by the
tradindexed routines, so it should be passing F_SETLKW to fcntl.

> Perhaps I need to throttle INN first and then expireover. It could be
> that INN is holding a exclusive lock whilst the nnrpd's are only holding
> minor locks.

This is the case, but that shouldn't require throttling.  And in any case,
that shouldn't result in the above messages.  Even if nnrpd was holding a
lock and refused to ever give it up, that should just cause expireover to
wait forever, not to get error messages like the above.

(Particularly given that you get the error even on unlock.)

expireover is proceeding without a lock, so expiration is still working
normally.  (It does this under the assumption that even if something goes
horribly wrong with the locking process, nnrpd is read-only so the worst
thing that's going to happen is that one client will get a confused answer
as to the overview information for a newsgroup until it disconnects and
reconnects.)

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.


More information about the inn-workers mailing list