bind crashes with assertion, maybe due to many ephemeral network devices?

Erich Eckner bind at eckner.net
Tue Mar 11 07:05:33 UTC 2025


Hi Ondrej,

thanks for the fast answer :)

On Mon, 10 Mar 2025, Ondřej Surý wrote:

>> bind crashes with assertion, maybe due to many ephemeral network devices?
>
> Looking at the symptoms and your description, I actually think this is a problem
> of interfaces appearing during the network interface scan and then disappearing
> before named can process them.
>
> I would suggest to disable the automatic-interface-scan and setup named to
> listen of fixed addresses so it doesn't have to deal with the mayhem the docker
> is creating.

Yes, indeed: That fixes the issue for me! Bind is now running stable for 
more than 8h.

>
> I've unblocked and "trusted" your account, so it should not get blocked again.
> If you setup 2fa on the account it also acts as a permanent marked this not
> a spam account.

Thanks a lot, I added 2FA. Though, I think, it will be some time, before I 
come back and actively participate in the bug tracker (due to bind's 
stability :D).

>
> Feel free to fill the issue, but I can't promise this will be looked at quite soon
> as this is in the "doctor it hurts when I do this" territory.

Yeah makes sense: You probably have more important things to do. I'll see, 
whether the config change has any negative side effects for me, and only 
open a bug report, if I see any problems with the current solution.

>
> Ondrej

Cheers!
Erich

> --
> Ondřej Surý (He/Him)
> ondrej at isc.org
>
> My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours.
>
>> On 10. 3. 2025, at 21:19, Erich Eckner <bind at eckner.net> wrote:
>>
>> Hi,
>>
>> I'm running bind version 9.20.6 on artix linux (an arch linux derivate without systemd) with a pretty standard config:
>>
>> # named -V
>> BIND 9.20.6 (Stable Release) <id:72cbad0>
>> running on Linux x86_64 6.13.5-artix1-1 #1 SMP PREEMPT_DYNAMIC Fri, 28 Feb 2025 10:18:15 +0000
>> built by make with  '--prefix=/usr' '--sysconfdir=/etc' '--sbindir=/usr/bin' '--localstatedir=/var' '--disable-static' '--enable-fixed-rrset' '--enable-full-report' '--with-maxminddb' '--with-openssl' '--with-libidn2' '--with-json-c' '--with-libxml2' '--with-lmdb' 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto -DDIG_SIGCHASE' 'LDFLAGS=-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now          -Wl,-z,pack-relative-relocs -flto=auto'
>> compiled by GCC 14.2.1 20250207
>> compiled with OpenSSL version: OpenSSL 3.4.1 11 Feb 2025
>> linked to OpenSSL version: OpenSSL 3.4.1 11 Feb 2025
>> compiled with libuv version: 1.50.0
>> linked to libuv version: 1.50.0
>> compiled with liburcu version: 0.15.0
>> compiled with jemalloc version: 5.3.0
>> compiled with libnghttp2 version: 1.64.0
>> linked to libnghttp2 version: 1.65.0
>> compiled with libxml2 version: 2.13.5
>> linked to libxml2 version: 21306-GITv2.13.6
>> compiled with json-c version: 0.18
>> linked to json-c version: 0.18
>> compiled with zlib version: 1.3.1
>> linked to zlib version: 1.3.1
>> linked to maxminddb version: 1.12.2
>> threads support is enabled
>> DNSSEC algorithms: RSASHA1 NSEC3RSASHA1 RSASHA256 RSASHA512 ECDSAP256SHA256 ECDSAP384SHA384 ED25519 ED448
>> DS algorithms: SHA-1 SHA-256 SHA-384
>> HMAC algorithms: HMAC-MD5 HMAC-SHA1 HMAC-SHA224 HMAC-SHA256 HMAC-SHA384 HMAC-SHA512
>> TKEY mode 2 support (Diffie-Hellman): no
>> TKEY mode 3 support (GSS-API): yes
>>
>> default paths:
>>  named configuration:  /etc/named.conf
>>  rndc configuration:   /etc/rndc.conf
>>  nsupdate session key: /var/run/named/session.key
>>  named PID file:       /var/run/named/named.pid
>>  geoip-directory:      /usr/share/GeoIP
>>
>>
>> # grep '^\s*[^[:space:]#/]' /etc/named.conf
>> options {
>>    directory "/var/named";
>>    pid-file "/run/named/named.pid";
>>    allow-recursion { 127.0.0.1; 192.168.188.0/24; };
>>    allow-transfer { none; };
>>    allow-update { none; };
>>    version none;
>>    hostname none;
>>    server-id none;
>> };
>> zone "localhost" IN {
>>    type master;
>>    file "localhost.zone";
>> };
>> zone "0.0.127.in-addr.arpa" IN {
>>    type master;
>>    file "127.0.0.zone";
>> };
>> zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa" {
>>    type master;
>>    file "localhost.ip6.zone";
>> };
>>
>> # pgrep -af named
>> 22958 /usr/bin/named -u named -L /var/log/named.log
>>
>> Since a few days (or weeks?) now, it started to act up. Every few ten minutes, it crashes with:
>>
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.995 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.995 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 general: error: uv.c:95:isc__uverr2result(): unexpected error:
>> 10-Mar-2025 20:33:36.996 general: error: unable to convert libuv error code in start_udp_child_job (netmgr/udp.c:172) to isc_result: -19: no such device
>> 10-Mar-2025 20:33:36.996 network: error: creating IPv6 interface veth731351f failed; interface ignored
>> 10-Mar-2025 20:33:36.996 network: info: listening on IPv6 interface vetha808625, fe80::d0cf:5fff:fe3a:1e50%954915#53
>> 10-Mar-2025 20:33:36.998 network: info: listening on IPv6 interface veth92035bc, fe80::58f0:c5ff:fecf:4a8d%954971#53
>> 10-Mar-2025 20:33:37.000 network: info: listening on IPv6 interface vethb1ef26b, fe80::58e2:d2ff:fe3f:c77f%955141#53
>> 10-Mar-2025 20:33:37.003 network: info: listening on IPv6 interface veth0ee3ea4, fe80::44be:c7ff:fefd:83fb%955153#53
>> 10-Mar-2025 20:33:37.005 network: info: listening on IPv6 interface veth39e879e, fe80::34fb:98ff:fe9e:d49f%955162#53
>> 10-Mar-2025 20:33:37.007 network: info: listening on IPv6 interface veth2f2d6df, fe80::2c2b:e8ff:fe8e:2339%955167#53
>> 10-Mar-2025 20:33:37.010 network: info: listening on IPv6 interface vetha0e2b2b, fe80::84fd:7aff:fe72:9c82%955207#53
>> 10-Mar-2025 20:33:37.012 network: info: listening on IPv6 interface vethb633142, fe80::58a5:32ff:feaf:bdb2%955208#53
>> 10-Mar-2025 20:33:37.014 network: info: listening on IPv6 interface veth232d291, fe80::f442:a2ff:fe0d:18f8%955383#53
>> 10-Mar-2025 20:33:37.017 network: info: listening on IPv6 interface vetha87c0e9, fe80::2431:26ff:fe1e:adac%955384#53
>> 10-Mar-2025 20:33:37.021 network: info: listening on IPv6 interface vethadab24f, fe80::7d:44ff:fe11:7284%955606#53
>> 10-Mar-2025 20:33:37.024 network: info: listening on IPv6 interface vethe9c8381, fe80::1847:42ff:fe98:cd5c%955655#53
>> 10-Mar-2025 20:33:37.026 network: info: listening on IPv6 interface veth5f5869a, fe80::ec06:66ff:fe5d:ef74%955668#53
>> 10-Mar-2025 20:33:37.029 network: info: listening on IPv6 interface vethe46d2e1, fe80::f48e:14ff:fe94:2efd%955683#53
>> 10-Mar-2025 20:33:37.032 network: info: listening on IPv6 interface vethf87bbe4, fe80::6c0b:47ff:fed2:404d%955686#53
>> 10-Mar-2025 20:33:37.035 network: info: listening on IPv6 interface veth207c7ca, fe80::f019:b8ff:feda:517d%955692#53
>> 10-Mar-2025 20:33:37.038 network: info: listening on IPv6 interface veth1654fa8, fe80::fc83:fcff:fe79:8f01%955718#53
>> 10-Mar-2025 20:33:37.041 network: info: listening on IPv6 interface vethe4e528f, fe80::901d:7fff:fe58:ed2%955719#53
>> 10-Mar-2025 20:33:37.041 general: critical: netmgr/udp.c:77:isc__nm_udp_lb_socket(): fatal error:
>> 10-Mar-2025 20:33:37.041 general: critical: RUNTIME_CHECK(result == ISC_R_SUCCESS) failed
>> 10-Mar-2025 20:33:37.041 general: critical: exiting (due to fatal error in library)
>>
>> As a first-aid, I added a script to simply restart the nameserver, if it crashes. This showed me two things:
>>
>> 1. If the server crashed, a restart will fail for the next one or two minutes, too.
>>
>> 2. The crashes seem to correlate with the other main load, that I have on this machine: A couple hundred docker containers (each of which apparently setting up a network device on the host system), that are started every ten minutes and run for a few minutes (in rare cases longer). Looking at the minutes of the assertion-logs, there is a clear emphasis on minutes when many containers start(?)/run/stop:
>>
>> $ grep -F 'RUNTIME_CHECK(result == ISC_R_SUCCESS)' /var/log/named.log | cut -d' ' -f2 | cut -d: -f2 | cut -c2 | sort | uniq -c
>>   5976 0
>>  14767 1
>>  42850 2
>>  31292 3
>>    693 4
>>    204 5
>>    199 6
>>    211 7
>>    226 8
>>    198 9
>>
>> The containers are started via a cronjob:
>> */10 * * * *  /home/erich/git/archlinuxewe/build-all-with-docker
>>
>> In between the crashes, the nameserver seems to run as-expected. Also, the docker containers (which require working name resolution on the host system) do not always fail, so at least sometime / somewhen, named seems to successfully process the requests of the containers.
>>
>> I hope, someone has an idea, where I should look at. It feels strange, that such a "reference" product as bind should be crashable simply by having a big number of fluctuating network devices.
>>
>> Some side notes, maybe less related to the issue at hand, but I still want to write them here for the case, that they are relevant:
>>
>> The system seems to be somewhat under load during the run of the containers, but I would be astonished, if this would cause bind to crash: RAM usage goes up to 16GB of 128GB possible, CPU goes up to 100%, though.
>>
>> I have a second, similar machine (same distribution, similar setup regarding bind), but without the "pulsed" load of docker containers, where named is running since *looks*up*the*numbers* more than 8 days without crashes (which matches the uptime of that machine).
>>
>> I wanted to open a bug at gitlab.isc.org, but my account ("deep42thought" under which I reported something a few years ago) got blocked after getting reactivated again, because I did not notice the big warning on the login page stating exactly this behaviour and took >1 day to gather the information for the bug. :-( Maybe someone can unblock me, then I could add 2FA to persist the account?
>>
>> Some time ago I tried to get the stats channel working through
>>
>> options {
>>    zone-statistics full;
>> }
>> statistics-channels {
>>    inet 127.0.0.1 port 8053;
>> };
>>
>> but this seemed to crash the server back then. And since it was just a toy project, I didn't pursue it any further and have removed it from the config since quite some time.
>>
>> regards,
>> Erich
>> --
>> Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
>>
>> ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.
>>
>>
>> bind-users mailing list
>> bind-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/bind-users
>
>


More information about the bind-users mailing list