INN 2.5.4 strange crash
Julien ÉLIE
julien at trigofacile.com
Wed Feb 4 20:07:06 UTC 2015
Hi Petr,
> Feb 4 08:34:33 ns innd: innfeed-neva1!:26:proc:15113 cant write: Resource
> temporarily unavailable
> Feb 4 08:34:33 ns innd: innfeed-neva1!:26:proc:15113 blocked sleeping 5
> [dd]
> Feb 4 08:34:34 ns innd: innfeed-neva1!:26:proc:15113 cant write: Resource
> temporarily unavailable
> Feb 4 08:34:34 ns innd: innfeed-neva1!:26:proc:15113 blocked sleeping 5
Strange that the log is written twice (the second one should have been
written at least 5 seconds later).
> Feb 4 08:34:39 ns innd: innfeed-neva1!:26:proc:15113 wakeup
> Feb 4 08:34:39 ns last message repeated 33 times
> [dd]
> Feb 4 08:34:39 ns innd: innfeed-neva1!:26:proc:15113 wakeup
> Feb 4 08:34:39 ns last message repeated 85 times
A lot of wake up at the same second...
> Feb 4 10:33:08 ns innd: controlchan!:26:proc:26571 26 sleeping without
> Waker
> Feb 4 10:33:08 ns innd: controlchan!:26:proc:26571 closed
> Feb 4 10:33:08 ns innd: controlchan! exit 0 elapsed 0 pid 26571
> Feb 4 10:33:08 ns innd: controlchan!:99:file closed
> Feb 4 10:33:09 ns innd: controlchan! spawned controlchan!:99:proc:26578
> Feb 4 10:33:09 ns innd: controlchan! restarted
That's the behaviour of the patch: it closes the channel if found in an
unexpected state.
Seems fine here.
> Feb 4 10:33:09 ns innd: goblin1.stu.neva.ru:26 26 sleeping without Waker
> Feb 4 10:33:09 ns innd: goblin1.stu.neva.ru:26 checkpoint seconds 0
> accepted 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size
> 0 rejected size 0
> Feb 4 10:33:09 ns innd: goblin1.stu.neva.ru:26 closed seconds 0 accepted
> 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 0
> rejected size 0
> Feb 4 10:33:09 ns innd: goblin3.stu.neva.ru:26 26 sleeping without Waker
> Feb 4 10:33:09 ns innd: goblin3.stu.neva.ru:26 checkpoint seconds 0
> accepted 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size
> 0 rejected size 0
> Feb 4 10:33:09 ns innd: goblin3.stu.neva.ru:26 closed seconds 0 accepted
> 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 0
> rejected size 0
>
> [and same for several peers]
>
> Feb 4 10:33:10 ns innd: free:-1 26 free but was in SMASK
>
> And no errors after that.
> Looks like problem with my system, I see same fd 26.
It should normally be fine as long as goblin1.stu.neva.ru:26 is closed
before goblin3.stu.neva.ru:26, and so on.
What I do not explain is why the sleeping state of these channels is not
removed by the call of SCHANremove(cp); it sounds as though
CHANsleeping(cp) were false when entering into SCHANremove.
Maybe you could change
void
SCHANremove(CHANNEL *cp)
{
if (!CHANsleeping(cp))
return;
to
void
SCHANremove(CHANNEL *cp)
{
if (!CHANsleeping(cp)) {
notice("%s was not sleeping", CHANname(cp));
return;
}
so as to verify that SCHANremove does the right thing when called after
a "sleeping without Waker" warning?
At least fd 26 is freed at 10:33:10, which solves the issue you are
facing. But I also do not explain why it had not been freed sooner...
--
Julien ÉLIE
« Ce qui m'intéresse chez une femme, c'est ce que je n'ose pas lui
demander. » (Wolinski)
More information about the inn-workers
mailing list