similar named 8.3.3-REL crashes on two different machines....

Greg A. Woods woods at weird.com
Fri Aug 16 05:08:05 UTC 2002


This is looking quite suspicious.  First named (from BIND-8.3.3-REL) on
one of the cache servers I run complains about a strange error that I
don't believe I've ever seen from a production named:


Aug 15 08:40:54 corporate named[4070]: ns_main.c:831: INSIST(evRead(lev, rfd, &iov, 1, stream_getlen, sp, &sp->evID_r) != -1): Invalid argument failed.
Aug 15 08:40:54 corporate named[4070]: ns_main.c:831: INSIST(evRead(lev, rfd, &iov, 1, stream_getlen, sp, &sp->evID_r) != -1): Invalid argument failed.
Aug 15 08:40:56 corporate /netbsd: named: pid 4070 [eid 32769:40, rid 32769:40] sent signal 6: was set-id, core dump not permitted [in /etc/namedb]


Later the exact same failure happens on the second cache server:

Aug 15 21:57:21 lucky named[152]: ns_main.c:831: INSIST(evRead(lev, rfd, &iov, 1, stream_getlen, sp, &sp->evID_r) != -1): Invalid argument failed.
Aug 15 21:57:21 lucky named[152]: ns_main.c:831: INSIST(evRead(lev, rfd, &iov, 1, stream_getlen, sp, &sp->evID_r) != -1): Invalid argument failed.
Aug 15 21:57:21 lucky /netbsd: named: pid 152 [eid 32769:40, rid 32769:40] sent signal 6: was set-id, core dump not permitted [in fs /, cwd inode 51912]


These are both i386 machines, one running NetBSD 1.3.2 and the other
running NetBSD 1.5W (-current as of 2001/06/24).  Both have ECC RAM.

Also curious is the double appearance of the message from syslog....

What really worries me is the comment above the line generating this
error:

	/* XXX FIXME: This should probably not cause a crash! */
	INSIST_ERR(evRead(lev, rfd, &iov, 1, stream_getlen, sp, &sp->evID_r)
	           != -1);

This is of course in stream_accept().  It seems a well timed TCP connect
and reset can totally D.o.S. named, at least on NetBSD (there are some
buggaboos in the NetBSD TCP stack that might be allowing this to happen,
though I thought they'd been fixed before 1.5W).

What's it going to take to do as the comment says?  Is it a simple
matter of propogating the error back up, or are there other things which
need cleaning up to do it safely?

-- 
								Greg A. Woods

+1 416 218-0098;            <g.a.woods at ieee.org>;           <woods at robohack.ca>
Planix, Inc. <woods at planix.com>; VE3TCP; Secrets of the Weird <woods at weird.com>


More information about the bind-workers mailing list