Memory corruption after AXFER

Thu Feb 3 23:11:58 UTC 2000

I found the source of the problem. Turned out to be a subtle difference in
the way that readv() and writev() function under OS/2. The BIND functions
readable() and writeable() [contained in ev_streams.c] ASSUME that the iov
array will be left intact as originally passed in to readv() and writev().
Then they call consume() to update the pointers contained in the iov array
to point to the next available byte based on how many bytes were just
read/written.

The problem is that under OS/2, the iov array is ALREADY updated by the
readv() and writev() functions before they return to the caller...
duplicating what consume() was attempting to accomplish. So consume()
actually corrupted memory because the pointers had already been moved.

I cannot find any clear documentation for what the TCP stack may/may not do
with the iov array for a readv() and writev() call. Thus it appears it's up
to the specific implementation of the TCP stack as to whether or not it is
changed by the function call. So other operating systems may have a similar
problem.

This is what I did to fix it on my system:

   ev_streams.c (Feb 03 2000 13:47:00)
           x:ev_streams.c (Feb 03 2000 13:28:38)
===================
      39      43  |
      40      44  |static int  copyvec(evStream *str, const struct iovec
*iov, int iocnt);
+             45  |#ifndef DONT_NEED_CONSUME
      41      46  |static void consume(evStream *str, size_t bytes);
+             47  |#endif
      42      48  |static void done(evContext opaqueCtx, evStream *str);
      43      49  |static void writable(evContext opaqueCtx, void *uap, int
fd, int evmask);
===================
     215     221  |}
     216     222  |
+            223  |#ifndef DONT_NEED_CONSUME
     217     224  |/* Pull off or truncate lead iovec(s). */
     218     225  |static void
===================
     233     240  |    }
     234     241  |}
+            242  |#endif
     235     243  |
     236     244  |/* Add a stream to Done list and deselect the FD. */
===================
     262     270  |        if ((str->flags & EV_STR_TIMEROK) != 0)
     263     271  |            evTouchIdleTimer(opaqueCtx, str->timer);
+            272  |#ifndef DONT_NEED_CONSUME
     264     273  |        consume(str, bytes);
+            274  |#else
+            275  |        str->ioDone += bytes;
+            276  |#endif
     265     277  |    } else {
     266     278  |        if (bytes < 0 && errno != EINTR) {
===================
     283     295  |        if ((str->flags & EV_STR_TIMEROK) != 0)
     284     296  |            evTouchIdleTimer(opaqueCtx, str->timer);
+            297  |#ifndef DONT_NEED_CONSUME
     285     298  |        consume(str, bytes);
+            299  |#else
+            300  |        str->ioDone += bytes;
+            301  |#endif
     286     302  |    } else {
     287     303  |        if (bytes == 0)
===================

-----Original Message-----
From: Cody.Gibson at intermec.com [mailto:Cody.Gibson at intermec.com]
Sent: Wednesday, February 02, 2000 1:15 PM
To: bind-workers at isc.org
Subject: Memory corruption after AXFER

I am trying to track down what appears to be some sort of memory corruption
by AXFER in my OS/2 port of BIND 8.2. I would like to hear from anyone that
can reproduce this, or knows how to fix it. If I do the following:

>nslookup
>ls -d <any active primary domain name here... I'm using jon.intermec.com>
<results displayed here>
>/exit
>ndc reconfig

I get an access violation inside of __memget_record() where it's dealing
with the freelists[] array (line 292) because a "next" pointer is invalid.
The "ls -d" works fine by itself, even when repeating it. Also "ndc
reconfig" works fine (even if repeated many times) if NOT preceded by an
AXFER. It's the combination of the 2 that is deadly.

It would be very useful to know if I am dealing with an OS/2 port specific
problem, or a problem that exists in the common code base. Thx for any help
you can provide.

Cody Gibson