<div dir="ltr">Hi,<div><br></div><div>I have merged config files from Tony, Robert, and me.</div><div>I have tried to be the most generic, the result below.</div><div><br></div><div>It seems to work here without regression, except a warning:<br></div><div>managed-keys-zone: Unable to fetch DNSKEY set '.': operation canceled<br></div><div><br></div><div>But only at the first boot, I don't see the message anymore when I restart the daemon.</div><div>Any clue ?</div><div><br></div><div>Thanks for your feedbacks.</div><div><br></div><div><div>[Unit]</div><div>After=network-online.target</div><div><br></div><div>[Service]</div><div>Type=simple</div><div>TimeoutSec=25</div><div>Restart=always</div><div>RestartSec=1</div><div>User=bind</div><div>Group=bind</div><div>CapabilityBoundingSet=CAP_NET_<wbr>BIND_SERVICE</div><div>AmbientCapabilities=CAP_NET_<wbr>BIND_SERVICE</div><div>SystemCallFilter=~@mount @debug acct modify_ldt add_key adjtimex clock_adjtime delete_module fanotify_init finit_module get_mempolicy init_module io_destroy io_getevents iopl ioperm io_setup io_submit io_cancel kcmp kexec_load keyctl lookup_dcookie migrate_pages move_pages open_by_handle_at perf_event_open process_vm_readv process_vm_writev ptrace remap_file_pages request_key set_mempolicy swapoff swapon uselib vmsplice</div><div><br></div><div>NoNewPrivileges=true</div><div>PrivateDevices=true</div><div>PrivateTmp=true</div><div>ProtectHome=true</div><div>ProtectSystem=strict</div><div>ProtectKernelModules=true</div><div>ProtectKernelTunables=true</div><div>ProtectControlGroups=true</div><div>InaccessiblePaths=/home</div><div>InaccessiblePaths=/opt</div><div>InaccessiblePaths=/root</div><div>ReadWritePaths=/run/named<br></div><div>ReadWritePaths=/var/cache/bind</div><div>ReadWritePaths=/var/lib/bind</div></div><div><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr">--<br><div style="font-size:small"><div>Ludovic Gasc (GMLudo)</div></div></div></div></div></div></div></div>
<br><div class="gmail_quote">2018-01-15 21:14 GMT+01:00 Robert Edmonds <span dir="ltr"><<a href="mailto:edmonds@mycre.ws" target="_blank">edmonds@mycre.ws</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Tony Finch wrote:<br>
> Ludovic Gasc <<a href="mailto:gmludo@gmail.com">gmludo@gmail.com</a>> wrote:<br>
> ><br>
> > 1. The list of minimal capabilities needed for bind to run correctly:<br>
> > <a href="http://man7.org/linux/man-pages/man7/capabilities.7.html" rel="noreferrer" target="_blank">http://man7.org/linux/man-<wbr>pages/man7/capabilities.7.html</a><br>
><br>
> named already drops capabilities - have a look at the code around here:<br>
> <a href="https://source.isc.org/cgi-bin/gitweb.cgi?p=bind9.git;a=blob;f=bin/named/unix/os.c;hb=v9_11_2#l234" rel="noreferrer" target="_blank">https://source.isc.org/cgi-<wbr>bin/gitweb.cgi?p=bind9.git;a=<wbr>blob;f=bin/named/unix/os.c;hb=<wbr>v9_11_2#l234</a><br>
><br>
> Note that it's a bit clever - the privileges are dropped in two stages,<br>
> right at the start, and after the server has been configured.<br>
<br>
</span>I checked just now to see what that code actually ends up doing, and on<br>
my system I ended up with:<br>
<br>
$ grep -h ^Cap /proc/$(pidof named)/**/status | sort | uniq -c<br>
6 CapAmb: 0000000000000000<br>
6 CapBnd: 0000003fffffffff<br>
6 CapEff: 0000000001000400<br>
6 CapInh: 0000000000000000<br>
6 CapPrm: 0000000001000400<br>
$<br>
<br>
That decodes to:<br>
<br>
- The effective and permitted capabilities sets were reduced to<br>
CAP_NET_BIND_SERVICE and CAP_SYS_RESOURCE.<br>
<br>
- The ambient and inheritable capabilities sets were cleared.<br>
<br>
- The capability bounding set was left completely open-ended.<br>
<br>
It's not clear why CAP_SYS_RESOURCE needs to be retained past startup:<br>
<br>
/*<br>
* XXX We might want to add CAP_SYS_RESOURCE, though it's not<br>
* clear it would work right given the way linuxthreads work.<br>
* XXXDCL But since we need to be able to set the maximum number<br>
* of files, the stack size, data size, and core dump size to<br>
* support named.conf options, this is now being added to test.<br>
*/<br>
SET_CAP(CAP_SYS_RESOURCE);<br>
<br>
See commits 5e4b7294d88ab58371d8c98e05ea80<wbr>086dcb67cd,<br>
108490a7f8529aff50a0ac7897580b<wbr>59a73d9845. "[T]o test"?<br>
<br>
CAP_SYS_RESOURCE is documented as permitting:<br>
<br>
CAP_SYS_RESOURCE<br>
* Use reserved space on ext2 filesystems;<br>
* make ioctl(2) calls controlling ext3 journaling;<br>
* override disk quota limits;<br>
* increase resource limits (see setrlimit(2));<br>
* override RLIMIT_NPROC resource limit;<br>
* override maximum number of consoles on console allocation;<br>
* override maximum number of keymaps;<br>
* allow more than 64hz interrupts from the real-time clock;<br>
* raise msg_qbytes limit for a System V message queue above the<br>
limit in /proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2));<br>
* allow the RLIMIT_NOFILE resource limit on the number of "in-<br>
flight" file descriptors to be bypassed when passing file<br>
descriptors to another process via a UNIX domain socket (see<br>
unix(7));<br>
* override the /proc/sys/fs/pipe-size-max limit when setting the<br>
capacity of a pipe using the F_SETPIPE_SZ fcntl(2) command.<br>
* use F_SETPIPE_SZ to increase the capacity of a pipe above the<br>
limit specified by /proc/sys/fs/pipe-max-size;<br>
* override /proc/sys/fs/mqueue/queues_max limit when creating<br>
POSIX message queues (see mq_overview(7));<br>
* employ the prctl(2) PR_SET_MM operation;<br>
* set /proc/[pid]/oom_score_adj to a value lower than the value<br>
last set by a process with CAP_SYS_RESOURCE.<br>
<br>
I would guess that retaining CAP_NET_BIND_SERVICE and CAP_SYS_RESOURCE<br>
during the process runtime permits open-ended reloading of the config at<br>
runtime (e.g., binding to a new IP address on port 53 without needing to<br>
restart the daemon). So even though BIND drops some capabilities, it's<br>
still running with elevated privileges compared to a traditional<br>
non-root user.<br>
<br>
systemd permits a nice pattern for network daemons that want to run as<br>
an unprivileged user, but bind to a privileged port (and without using<br>
socket activation), without starting the process as root. Basically, you<br>
put something like this in the unit file:<br>
<br>
[Service]<br>
User=…<br>
Group=…<br>
CapabilityBoundingSet=CAP_NET_<wbr>BIND_SERVICE CAP_SYS_CHROOT CAP_SETPCAP<br>
AmbientCapabilities=CAP_NET_<wbr>BIND_SERVICE CAP_SYS_CHROOT CAP_SETPCAP<br>
…<br>
<br>
Any needed filesystem directories and permissions need to be set up<br>
correctly before hand. The service is started by the init system as the<br>
unprivileged User/Group specified in the unit file, so there's no need<br>
to change UID/GID. CAP_NET_BIND_SERVICE is then used to bind to a<br>
privileged port, CAP_SYS_CHROOT is used to perform the chroot, and<br>
CAP_SETPCAP is used to drop all remaining capabilities from the<br>
capability sets and the capability bounding set, so you end up with a<br>
completely unprivileged process at runtime. (Alternatively you could<br>
keep CAP_NET_BIND_SERVICE and drop CAP_SYS_CHROOT and CAP_SETPCAP, if<br>
you wanted to retain the capability to perform privileged binds at<br>
runtime. Or you could eliminate CAP_SYS_CHROOT and use other systemd<br>
functionality to make parts of the filesystem inaccessible, etc.) This<br>
pattern might be a bit hard to retrofit into BIND at this point, though,<br>
other than by adding more knobs.<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Robert Edmonds<br>
</font></span><div class="HOEnZb"><div class="h5">______________________________<wbr>_________________<br>
Please visit <a href="https://lists.isc.org/mailman/listinfo/bind-users" rel="noreferrer" target="_blank">https://lists.isc.org/mailman/<wbr>listinfo/bind-users</a> to unsubscribe from this list<br>
<br>
bind-users mailing list<br>
<a href="mailto:bind-users@lists.isc.org">bind-users@lists.isc.org</a><br>
<a href="https://lists.isc.org/mailman/listinfo/bind-users" rel="noreferrer" target="_blank">https://lists.isc.org/mailman/<wbr>listinfo/bind-users</a></div></div></blockquote></div><br></div>