[bind10-dev] #1534, IPV6_USE_MIN_MTU and similar

Mon Feb 20 09:34:20 UTC 2012

Michal,

On Thursday, 2012-02-16 13:10:52 +0100, 
Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:
> Hello
> 
> I tried to implement it, it is in the trac1534 branch (all the
> relevant changes are in the last commit). However, I have a problem
> I'm not sure how to solve. I took the code from bind9 and modified
> it, writing the tests to verify it works. But when I run the test, it
> fails with this:
> 
> [ RUN      ] get_sock.udp6_create
> sockcreator_tests.cc:131: Failure
> Value of: getsockopt(socknum, IPPROTO_IPV6, 24, &options, &len) //
> The 24 comes from the IPV6_MTU, expanded by preprocessor Actual: -1
> Expected: 0
> Transport endpoint is not connected
> sockcreator_tests.cc:132: Failure
> Value of: options
>   Actual: 1
> Expected: 1280
> [  FAILED  ] get_sock.udp6_create (0 ms)
> 
> I'm wondering, are we supposed to connect the socket, if it is UDP?
> Or setting works but getting doesn't and the test should be omitted?
> Or, is something really wrong?

Argh! This stuff is hard. No wonder everyone uses high-level APIs!!

I had a quick look a the kernel source and see this for setting the MTU:

    case IPV6_MTU:
    {
        struct dst_entry *dst;

        val = 0;
        rcu_read_lock();
        dst = __sk_dst_get(sk);
        if (dst)
            val = dst_mtu(dst);
        rcu_read_unlock();
        if (!val)
            return -ENOTCONN;
        break;
    }

So it looks like MTU is set per-destination, which means that a
connected socket is required. :(

That means that - at least on Linux - we can't set MTU on a per-socket
basis in the socket creator. Apparently this kind of logic would need to
be put into a send function instead in Linux:

  1. bind to address we are sending packet to
  2. set IPV6_MTU
  3. send packet
  4. re-bind to original listen address

The problem with this is that we'd lose packets that arrive between
step 1 and 4. So we'd really need *two* sockets, I guess, one that we
receive on and one that we send on. And we'd have to use sendmsg() so
we can specify the interface to make sure we reply on the same one that
the packet came in on.

This seems like enough work that it should be a separate ticket, at
least.

I wonder how BIND 9 handles this? Perhaps this doesn't actually do
anything on BIND 9 but it was never tested?

--
Shane
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/bind10-dev/attachments/20120220/46d0eae2/attachment.bin>