Fixing systemd-resolved: Final Edition

I manage a number of Ubuntu servers, almost all of which over time have developed DNS resolution issues that traced back to something wrong with systemd-resolve. Systemd-resolve has had a pretty horrible track record for actually working most of the time.

The latest wonky behaviour manifested in DNS resolution completely failing randomly, then would work again, then pop out immediately after, then only try and resolve ipv6 addresses, then none would work at all. After swapping out my nameservers and checking on the stability of my connections to said nameservers it all came back again to systemd-resolve.

Samples of the log output:

systemd-resolved[592]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
systemd-resolved[592]: Using degraded feature set (TCP) for DNS server x.x.x.x
systemd-resolved[592]: Using degraded feature set (UDP) for DNS server x.x.x.x

Then it would die. The only course for recovering connectivity was to kick the service with a systemctl restart systemd-resolved, then it would only work for another few minutes before getting into a bad state again.

I found an issue on launchpad.net #1822416 which seemed to point to the issue I was facing, but it remains open even though there is an upstream systemd fix in place for it on github/systemd.

The final solution?

Kill it with fire and switch to unbound. Unbound has been cropping up more and more in deployments, and it's been relatively painless to run and didn't require any additional configuration out of the box.

$ apt install unbound resolvconf
$ sudo systemctl disable systemd-resolved
$ sudo systemctl stop systemd-resolved
$ sudo systemctl enable unbound-resolvconf
$ sudo systemctl enable unbound
$ sudo systemctl start unbound-resolvconf

We'll see how it goes, but I haven't had any more DNS instability since switching over.