INN 2.5.4 strange crash

Julien ÉLIE julien at trigofacile.com
Wed Feb 4 20:07:06 UTC 2015


Hi Petr,

> Feb  4 08:34:33 ns innd: innfeed-neva1!:26:proc:15113 cant write: Resource
> temporarily unavailable
> Feb  4 08:34:33 ns innd: innfeed-neva1!:26:proc:15113 blocked sleeping 5
> [dd]
> Feb  4 08:34:34 ns innd: innfeed-neva1!:26:proc:15113 cant write: Resource
> temporarily unavailable
> Feb  4 08:34:34 ns innd: innfeed-neva1!:26:proc:15113 blocked sleeping 5

Strange that the log is written twice (the second one should have been 
written at least 5 seconds later).


> Feb  4 08:34:39 ns innd: innfeed-neva1!:26:proc:15113 wakeup
> Feb  4 08:34:39 ns last message repeated 33 times
> [dd]
> Feb  4 08:34:39 ns innd: innfeed-neva1!:26:proc:15113 wakeup
> Feb  4 08:34:39 ns last message repeated 85 times

A lot of wake up at the same second...


> Feb  4 10:33:08 ns innd: controlchan!:26:proc:26571 26 sleeping without
> Waker
> Feb  4 10:33:08 ns innd: controlchan!:26:proc:26571 closed
> Feb  4 10:33:08 ns innd: controlchan! exit 0 elapsed 0 pid 26571
> Feb  4 10:33:08 ns innd: controlchan!:99:file closed
> Feb  4 10:33:09 ns innd: controlchan! spawned controlchan!:99:proc:26578
> Feb  4 10:33:09 ns innd: controlchan! restarted

That's the behaviour of the patch:  it closes the channel if found in an 
unexpected state.
Seems fine here.


> Feb  4 10:33:09 ns innd: goblin1.stu.neva.ru:26 26 sleeping without Waker
> Feb  4 10:33:09 ns innd: goblin1.stu.neva.ru:26 checkpoint seconds 0
> accepted 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size
> 0 rejected size 0
> Feb  4 10:33:09 ns innd: goblin1.stu.neva.ru:26 closed seconds 0 accepted
> 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 0
> rejected size 0
> Feb  4 10:33:09 ns innd: goblin3.stu.neva.ru:26 26 sleeping without Waker
> Feb  4 10:33:09 ns innd: goblin3.stu.neva.ru:26 checkpoint seconds 0
> accepted 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size
> 0 rejected size 0
> Feb  4 10:33:09 ns innd: goblin3.stu.neva.ru:26 closed seconds 0 accepted
> 0 refused 0 rejected 0 duplicate 0 accepted size 0 duplicate size 0
> rejected size 0
>
> [and same for several peers]
>
> Feb  4 10:33:10 ns innd: free:-1 26 free but was in SMASK
>
> And no errors after that.
> Looks like problem with my system, I see same fd 26.

It should normally be fine as long as goblin1.stu.neva.ru:26 is closed 
before goblin3.stu.neva.ru:26, and so on.

What I do not explain is why the sleeping state of these channels is not 
removed by the call of SCHANremove(cp); it sounds as though 
CHANsleeping(cp) were false when entering into SCHANremove.

Maybe you could change

void
SCHANremove(CHANNEL *cp)
{
     if (!CHANsleeping(cp))
         return;

to

void
SCHANremove(CHANNEL *cp)
{
     if (!CHANsleeping(cp)) {
         notice("%s was not sleeping", CHANname(cp));
         return;
     }

so as to verify that SCHANremove does the right thing when called after 
a "sleeping without Waker" warning?

At least fd 26 is freed at 10:33:10, which solves the issue you are 
facing.  But I also do not explain why it had not been freed sooner...

-- 
Julien ÉLIE

« Ce qui m'intéresse chez une femme, c'est ce que je n'ose pas lui
   demander. » (Wolinski)


More information about the inn-workers mailing list