[bind10-dev] Socket creator and low-level code

Wed Oct 20 14:38:26 UTC 2010

At Fri, 15 Oct 2010 16:07:48 +0200,
Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:

> I and jinmei had a disagreement about the correct approach to more secure socket
> creator (which is what we want, as it is the only component that will need to
> run as root, hopefully, therefore it is more important to it to be secure).
> 
> I wrote the code using only OS-level API and C-like code (it is C++, but doesn't
> use fancy wrappers around types, classes), without any libraries and so. I did
> this on purpose, because:
> • Classes help keeping big projects together, hide implementation details and
>   stuff. But they make small programs more complicated.
> • Each library, wrapper, etc, contains some code. More code means potentially
>   more bugs (there's saying that average number of bugs depends only on the
>   number of lines and it is linear). And it makes it harder to check all the
>   code (if there's a paranoid admin and wants to read it herself, the less code
>   is better).

[snip]

> I hope I didn't misinterpret anything you said, jinmey. If I did,
> please correct me.

The summary seems to be accurate, thanks for that.

I see valid points in your arguments, but I simply have different
opinion/preference.  I'm going to supplement my own points a bit:

- just like it's true that the number of bugs is proportional to the
  number of code lines, it's also true (at least in my understanding)
  that most of the time one fails to beat sufficiently matured and
  widely used library implementations.
- I admit a paranoid admin doesn't trust anything, but that argument
  doesn't stop at the system call level.  If we want to worry about
  such level of paranoid admin, we'd need to bypass all OS
  implementations and somehow write/read our data to/from the wire
  directly with our own code that can satisfy that admin.  This is
  of course too extreme and bogus, but then the reasonable point of
  compromise is a matter of opinion after all.
- I don't necessarily agree that code written only with low level
  primitives such as pointer arithmetics is simpler.  In fact, it's in
  many cases difficult to be sure about its correctness, especially
  for those who review/modify the code.
- we may be able to provide a proof of correctness for the current
  version of the code, and that may help.  But, even for this type of
  single purpose program there will have to be a time to extend it
  (for example, we may want to specify some specific socket option).
  If we keep the current style, we'll have to keep using the (IMO)
  dangerous primitives for every new extension, and we'll need to
  worry about its safety.  IMO, it's much better to minimize the place
  of using risky operations and use much safer interfaces throughout
  the rest of the code.

Some specific examples: the current version have code like this
(omitting the error handling for simplicity).

ssize_t
read_data(const int fd, void *buffer_v, const size_t length) {
    unsigned char *buffer(static_cast<unsigned char *>(buffer_v));
    size_t rest(length), already(0);
    while (rest) { // Stil something to read
        ssize_t amount(read(fd, buffer, rest));
        if (amount) {
            already += amount;
            rest -= amount;
            buffer += amount;
        } else { // EOF
            return already;
        }
    }
    return already;
}

Using the read(2) system call is probably okay (higher level
abstraction like ASIO is probably overkilling for this purpose), but
I'd like to restrict the use of raw pointer only for that operation.

We could do, for example, as follows (although I admit this may not be
sufficiently clean - this is an example to explain the concept):

ssize_t
read_data(const int fd, std::vector<uint8_t>& buffer, const size_t length) {
    uint8_t readbuf[length];
    size_t rest(length);

    while (rest > 0) {
        const ssize_t amount(read(fd, readbuf, rest));
        if (amount > 0) {
            buffer.insert(buffer.end(), readbuf, readbuf + amount);
            rest -= amount;
        } else {
            break;
        }
    }
    return (buffer.size());
}

Also, instead of doing this:
                        READ(static_cast<char *>(static_cast<void *>(
                            &addr_in.sin_port)), 2);

I'd introduce a wrapper to extract a uint16_t value from a read
buffer, and do something like this:

	       		vector<uint8_t> read_buf;
	       		readData(input_fd, read_buf, 2)
                        addr_in.sin_port = htons(getUint16(read_buf));

where getUint16() is

uint16_t
getUint16(const vector<uint8_t>& buffer) {
    return ((buffer.at(0) << 8) | buffer.at(1));
}

You may still disagree with me.  As I admitted, this is probably a
matter of opinion, and IMO either approach has its own pros and cons.
Hopefully we can reach a group consensus on this list.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.