INN 2.5.4 strange crash
Petr Novopashenniy
pety at rusnet.ru
Wed Oct 29 15:28:16 UTC 2014
On Wed, 29 Oct 2014, Julien ?LIE wrote:
J?? Hi Petr,
J??
J?? > I have INN 2.5.4 on old FreeBSD box.
J?? > After upgrade from 2.4.3, innd sometimes began to segfault.
J??
J?? How often does innd segfault?
I upgrade INN at 9 september, and this is result:
-rw------- 1 news news 187858944 Sep 14 03:20 innd.core1
-rw------- 1 news news 193404928 Sep 16 03:20 innd.core2
-rw------- 1 news news 189050880 Oct 27 02:20 innd.core3
-rw------- 1 news news 200908800 Oct 28 02:20 innd.core4
-rw------- 1 news news 196829184 Oct 29 03:20 innd.core5
J??
J??
J??
J?? > Last core creation time is Oct 29 03:20:19 2014, and looks like this
J?? > is scanlogs problem.
J?? >
J?? > Command "ctlinnd -s pause "Flushing log and syslog files" from line
J?? > 114 looks like work fine, and I see this message in news.notice. But I
J?? > suspect that "ctlinnd flushlogs" from line 118 is crash innd.
J?? >
J?? > Has anyone have such problem?
J??
J?? I do not see that problem on my news server.
J?? Maybe it comes from the following patch:
J?? https://inn.eyrie.org/trac/changeset/9463/trunk/innd/cc.c
J??
J?? Rotate innfeed logs
J?? Exploder and process channels are now properly reopened when "ctlinnd
J?? flushlogs" is used, which is in particular the command invoked by
J?? scanlogs to rotate log files.
J??
J?? --- a/trunk/innd/cc.c
J?? +++ b/trunk/innd/cc.c
J?? @@ -630,9 +630,12 @@
J??
J?? /*
J?? -** Flush the log files.
J?? +** Flush the log files as well as exploder and process channels.
J?? */
J?? static const char *
J?? CCflushlogs(char *unused[])
J?? {
J?? + SITE *sp;
J?? + CHANNEL *cp;
J?? + int i;
J?? unused = unused; /* ARGSUSED */
J??
J?? @@ -644,4 +647,13 @@
J?? ReopenLog(Log);
J?? ReopenLog(Errlog);
J?? + /* Flush exploder and process channels so that they take into account
J?? + * the new log files (for instance during log rotation). */
J?? + for (sp = Sites, i = nSites; --i >= 0; sp++) {
J?? + if (((cp = sp->Channel) != NULL)
J?? + && ((cp->Type == CTexploder) || (cp->Type == CTprocess))) {
J?? + SITEflush(sp, true);
J?? + syslog(L_NOTICE, "%s flush", sp->Name);
J?? + }
J?? + }
J?? return NULL;
J?? }
J??
J??
J??
J?? Would it happen that we're flushing a channel we should not at that very
J?? time?
J?? A race-condition somewhere?
Looks like it.
But my other servers with 2.5.4 (and other OS version) never had such
an error.
J??
J?? Or that the syslog() line should be put before SITEflush() because sp->Name
J?? is no longer valid? The gdb trace should have shown it, had it been the
J?? case.
J?? I see "0x0 in ?? ()" in your gdb trace so it seems there is somewhere a NULL
J?? pointer.
J?? Is your INN built with optimizations? It sometimes prevents gdb from giving
J?? useful information; in that case, you should try to rebuild INN with for
J?? instance
J?? "-g -O0 -fno-inline".
J??
J??
J?? In your message, the backtrace also hints at line 1284 of chan.c:
J?? (*cp->Waker)(cp);
J?? but I do not see why an error would happen here.
J?? *cp->Waker is either SITEspoolwake or CHANwakeup but I do not see
J?? what could be NULL there.
J??
J??
J?? Note that there have been tons of changes since INN 2.4.3 so the issue can
J?? be
J?? totally elsewhere than the recent commit I had in mind.
J?? A complete backtrace from gdb would be of help.
I rebuild INN with "-g -O0 -fno-inline", and waiting new segfaults.
Thanks, Julien!
--pety
More information about the inn-workers
mailing list