problem with db/active file and server power off

bill davidsen davidsen at tmr.com
Mon May 7 15:45:28 UTC 2001


In article <1077221425.20010507164855 at priv6.onet.pl>,
Hawk  <mr_free at priv.onet.pl> wrote:

| I have a little problem. Today I have power off'ed my little LAN
| server by accident. After turning it back on I found that file
| db/active has wrong number of messages comparing to spool contents.
| Due to that no one can post new messages, because server rejects them
| with "error: /news/spool/articles/my.newsgroup/postnumber - file
| exists". Also history doesn't have info on all posts in spool.
| But... it was easy thing to fix. I have manually entered correct number
| of messages in db/active, deleted spool/overview/* spool/tradspool.map
| and db/history*. After rebuilding history from scratch everything is
| ok now. It looks like innd is not updating db/active and db/history
| rifgt after receiving messages, but it keeps some latest stuff in memory
| (and this stuff is lost after sudden power off).

  I had an operator power cycle my numbering server the other day, and
as a result have lower article numbers than my xref slaves. This
resulted in the numbering sending misnumbered articles to all the
servers, hosing their overview, the reader's history, etc. I was looking
for Execute Operator Immediate instruction for sure.

  Out of this I generated a script to send a copy of the active to the
backup master very frequently, and one to identify the highest article
number on any slave server and be sure the master active is at least
that high.

| Finally my question: what and where may I change to force innd
| to update db/active and db/history right after receiving messages and
| not keeping some stuff in memory? Is it possible? Because rebuilding
| history from scratch may be annoying if power is (hipoteticaly) turned
| off few times a day.

  What you can change is the msync() call to make sure the changes get
back to disk. However, the performance penalty of that can be
unpleasant. I would really like to see this as a parameter in inn.conf.

  I would rebuild and tell the users and management that it is caused by
bad power, and that a small UPS would remove the problem. Hopefully
that's true, if you have a system in a "real computer room" it may not
be so easy, and msync() is needed. My deep sympathy in that case, I've
had one operator induced drop and one real outage caused by a power fail
to the building and a physical cable fault in the UPS circuit which
carries the building for 30 sec until the generators get up to speed.

  I'm reasonably the o/s type is involved in this, so systems are
reported to do fine in this situation, while others don't seem to flush
to disk unless the file is munmapped().
-- 
bill davidsen <davidsen at tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


More information about the inn-workers mailing list