Caddy Certs Fail: systemd-resolved's DNS Shenanigans

Q: What is systemd-resolved?

`systemd-resolved` is a system service that provides network name resolution. It acts as a network name resolution manager, stub resolver, and DNSSEC validator. It manages DNS resolution for your system.

Q: Why would systemd-resolved cause Caddy certificate renewal failures?

`systemd-resolved` can sometimes return `SERVFAIL` (Server Failure) for specific DNS queries, even when other tools like `dig` might appear to work correctly. When Caddy's ACME client attempts to perform the necessary DNS challenges for certificate renewal, `systemd-resolved` may silently fail these queries, preventing the challenge from completing and thus causing the renewal to fail.

Q: How can I prevent systemd-resolved from breaking my Caddy certificates?

The primary methods involve bypassing `systemd-resolved` by pointing Caddy directly to public DNS resolvers (e.g., 1.1.1.1) in its configuration, or by forcing the Go resolver to be used via `GODEBUG=netdns=go`. Alternatively, ensuring your system's `/etc/resolv.conf` is configured to use reliable upstream resolvers directly, rather than relying on the `systemd-resolved` stub, is a more permanent solution.

Systemd resolvers lie.

That’s the headline. Forget all the nuanced systemd-resolved configurations. Forget fancy network stacks. Your Caddy certs are expiring. Why? Because systemd-resolved has a bug. A quiet, insidious bug. It selectively returns SERVFAIL for certain DNS queries. And then your ACME renewals just… die.

This isn’t a dramatic crash. Oh no. That would be too obvious. This is subtle. Caddy’s logs? They’ll tell you the renewal failed. But they won’t tell you why. Not really. You’ll see the errors. You’ll stare at them. You’ll check your firewall. You’ll check your domain registrar. You’ll assume Caddy is broken. It’s not. It’s the stub resolver. The thing that’s supposed to be helping. Instead, it’s silently sabotaging your HTTPS.

And the worst part? It’s inconsistent. Some DNS queries work. Some don’t. It depends on the upstream resolver. It depends on the phase of the moon. You can run dig against Google’s 8.8.8.8. It’ll work fine. Everything looks fine. But systemd-resolved? It’s holding back vital information. It’s playing dumb. Your application, however, doesn’t get the memo. It trusts the resolver. It gets a SERVFAIL. And the renewal attempt just evaporates into the ether. This happened to me. It wasted an entire afternoon. An afternoon that could have been spent contemplating the existential dread of modern infrastructure, or, you know, drinking coffee.

The stub resolver is the one lying to your application, and it doesn’t log it anywhere useful.

This is the core of the problem. There’s no trail. No smoking gun. Just a failed renewal. And then, two days later, an expired certificate. And then you get the frantic emails. Or worse, the silent, unnoticed downtime. Users complain about broken connections. You scramble. You check everything. You’re about to blame Caddy. Don’t. Blame systemd-resolved.

So, what do you do about it?

There are a few obvious workarounds. The first is to bypass the lie altogether. Force Caddy to use a known, reliable DNS server. You can do this in your Caddyfile:

{
 servers :443 {
  dns resolver 1.1.1.1
 }
}

Or, if you’re feeling adventurous, you can set GODEBUG=netdns=go. This forces Caddy’s underlying Go networking stack to use its own DNS resolver instead of whatever garbage systemd-resolved is feeding it. It’s a brute-force approach, but sometimes brute force is exactly what you need when dealing with this level of electronic perfidy.

Another option? Restart systemd-resolved. systemctl restart systemd-resolved. This clears its broken state. For a while. It’s a temporary patch. Like putting a band-aid on a gaping wound. You’ll be back here again. So, a more permanent fix involves actually fixing your /etc/resolv.conf. Stop relying on the broken stub resolver for everything. Make it honest.

And then there’s DNS-over-HTTPS (DoH). Configure systemd-resolved to use DoH. It’s not a silver bullet for the SERVFAIL issue, but it sidesteps a whole class of man-in-the-middle attacks. It adds a layer of encryption. It makes your DNS queries slightly less pathetic.

Why Does This Matter for Developers?

Look, Caddy is supposed to make things easy. It’s a great web server. It’s fantastic for reverse proxies. And its automatic HTTPS is a godsend. When it works. But when something like this happens, it erodes trust. It makes you second-guess your tooling. You spend hours debugging a problem that boils down to a single, broken component in the operating system. A component that refuses to be forthcoming about its failures. This is the reality of modern DevOps. It’s not just about writing code. It’s about wrestling with the opaque, often inexplicable behavior of the underlying infrastructure.

This incident, while minor in the grand scheme of things, highlights a deeper issue. We build complex systems. We rely on layers and layers of abstraction. And when one of those layers decides to be a complete idiot, the entire edifice can tremble. For developers, this means more time spent in the trenches of system administration. More time staring at logs. More time questioning the fundamental reliability of the tools we use every day. It’s a stark reminder that even the simplest-seeming problems can have surprisingly deep and frustrating roots.

So, if your Caddy certificates are expiring without a clear reason, don’t go looking for complicated Caddy configurations. Check your DNS resolver. Specifically, check systemd-resolved. It’s probably lying to you. And it’s probably costing you sleep. This isn’t a sponsored post. It’s just a warning. A weary, slightly jaded warning from someone who’s been there.

🧬 Related Insights

Read more: JSON Forms: React’s Config-Driven Future?
Read more: UVerify Sandbox: Design & Issue Custom Blockchain Certificates

Frequently Asked Questions

What is systemd-resolved?

systemd-resolved is a system service that provides network name resolution. It acts as a network name resolution manager, stub resolver, and DNSSEC validator. It manages DNS resolution for your system.

Why would systemd-resolved cause Caddy certificate renewal failures?

systemd-resolved can sometimes return SERVFAIL (Server Failure) for specific DNS queries, even when other tools like dig might appear to work correctly. When Caddy’s ACME client attempts to perform the necessary DNS challenges for certificate renewal, systemd-resolved may silently fail these queries, preventing the challenge from completing and thus causing the renewal to fail.

How can I prevent systemd-resolved from breaking my Caddy certificates?

The primary methods involve bypassing systemd-resolved by pointing Caddy directly to public DNS resolvers (e.g., 1.1.1.1) in its configuration, or by forcing the Go resolver to be used via GODEBUG=netdns=go. Alternatively, ensuring your system’s /etc/resolv.conf is configured to use reliable upstream resolvers directly, rather than relying on the systemd-resolved stub, is a more permanent solution.

Caddy Certs Fail: systemd-resolved's DNS Shenanigans