INN 2.5.4 strange crash

Julien ÉLIE julien at trigofacile.com
Wed Oct 29 14:47:18 UTC 2014


Hi Petr,

> I have INN 2.5.4 on old FreeBSD box.
> After upgrade from 2.4.3, innd sometimes began to segfault.

How often does innd segfault?



> Last core creation time is Oct 29 03:20:19 2014, and looks like this
> is scanlogs problem.
> 
> Command "ctlinnd -s pause "Flushing log and syslog files" from line
> 114 looks like work fine, and I see this message in news.notice. But I
> suspect that "ctlinnd flushlogs" from line 118 is crash innd.
> 
> Has anyone have such problem?

I do not see that problem on my news server.
Maybe it comes from the following patch:
     https://inn.eyrie.org/trac/changeset/9463/trunk/innd/cc.c

   Rotate innfeed logs
     Exploder and process channels are now properly reopened when 
"ctlinnd
     flushlogs" is used, which is in particular the command invoked by
     scanlogs to rotate log files.

--- a/trunk/innd/cc.c
+++ b/trunk/innd/cc.c
@@ -630,9 +630,12 @@

  /*
-**  Flush the log files.
+**  Flush the log files as well as exploder and process channels.
  */
  static const char *
  CCflushlogs(char *unused[])
  {
+    SITE        *sp;
+    CHANNEL     *cp;
+    int         i;
      unused = unused;            /* ARGSUSED */

@@ -644,4 +647,13 @@
      ReopenLog(Log);
      ReopenLog(Errlog);
+    /* Flush exploder and process channels so that they take into 
account
+     * the new log files (for instance during log rotation). */
+    for (sp = Sites, i = nSites; --i >= 0; sp++) {
+        if (((cp = sp->Channel) != NULL)
+             && ((cp->Type == CTexploder) || (cp->Type == CTprocess))) 
{
+            SITEflush(sp, true);
+            syslog(L_NOTICE, "%s flush", sp->Name);
+        }
+    }
      return NULL;
  }



Would it happen that we're flushing a channel we should not at that very 
time?
A race-condition somewhere?

Or that the syslog() line should be put before SITEflush() because 
sp->Name
is no longer valid?  The gdb trace should have shown it, had it been the 
case.
I see "0x0 in ?? ()" in your gdb trace so it seems there is somewhere a 
NULL pointer.
Is your INN built with optimizations?  It sometimes prevents gdb from 
giving
useful information; in that case, you should try to rebuild INN with for 
instance
"-g -O0 -fno-inline".


In your message, the backtrace also hints at line 1284 of chan.c:
     (*cp->Waker)(cp);
but I do not see why an error would happen here.
*cp->Waker is either SITEspoolwake or CHANwakeup but I do not see
what could be NULL there.


Note that there have been tons of changes since INN 2.4.3 so the issue 
can be
totally elsewhere than the recent commit I had in mind.
A complete backtrace from gdb would be of help.

-- 
Julien ÉLIE


More information about the inn-workers mailing list