+BALUG-Admin Al,
Thanks for noticing!
Did some DNSSEC key rollovers, notably changing in bind 1:9.18.28-1~deb12u2 from: auto-dnssec maintain (auto-dnssec is deprecated/obsolete and will be going away in future) to: dnssec-policy and related changes. "Of course" starting with non-canonical less critical domains. Yeah, if one goofs with that transition, DNSSEC can get messed up and cause some failures (at least for a wee bit, notably depending upon caching and relevant TTLs).
Was "only" mostly just rolling some ZSKs (and adding KSKs some), so even there damage rather limited in scope (notably also time).
Still, there ought not be hiccups. switching from auto-dnssec maintain to dnssec-policy is a bit persnickety/hazardous (found that out many moons ago with my very first such transition), notably any existing keys not compatible with such policy when one switches from auto-dnssec maintain to dnssec-policy are summarily ejected ... and that can be a (quite) bad thing. So, pretty careful but still, not sure exactly why, but for some zones it instantly dropped the old ZSK, whereas for others it did the much more expected rollover - adding the new, doing the relevant additional signings, waiting the requisite TTL times, and only then dropping the then vestigial old. Not 100% sure what's made that difference between the domains. Some are exact same gTLD and key types and sizes, and dnssec-policy and procedures, etc. used, yet still seeing somewhat different results. E.g. on sflug.com. vs. sf-lug.com.
See also: https://dnsviz.net/ Can also there view the older (where saved) DNSSEC data too. Domains that have thus far had such changes in the last day or two: Thus far mostly worked on: (non-canonicals and less critical): e.9.1.0.5.0.f.1.0.7.4.0.1.0.0.2.ip6.arpa sf-lug.com sflug.com savingthedolph.in sf-lug.net sflug.net digitalwitness.org sflug.org And will probably hold off on these 'till all such kinks are worked out: sf-lug.org berkeleylug.com balug.org
Longer term bit 'o plan is to not only get all of the auto-dnssec maintain config bits updated to dnssec-policy, but also at least rotate the ZSKs if they've not been rotated in a while (ought get rotated about monthly - and they may or may not have been already doing that), and then work up to KSKs. KSKs probably ought get rotated about yearly or so, but looks like most all of 'em are 4+ years old. No extreme rush or anything, but they could use wee bit of attention ... and also not - at least thus far - critical. Also, if/where registrars have implemented RFC 7344 that should make things way easier to handle that part of it.
Hmmm, and just a little while ago I blew away delegated subdomain that I'd much earlier used for some such testing ... maybe time for some more such testing to get to the bottom of the matter.
And if one's curious, that version of bind's default provided configurations for that: $ named -C | sed -ne '/^dnssec-policy "default" {$/,/^};$/{/^$/d;p};/^dnssec-policy "insecure" {$/,/^};$/{p;/^};$/q}' | expand dnssec-policy "default" { keys { csk key-directory lifetime unlimited algorithm 13; }; dnskey-ttl 3600; publish-safety 3600; retire-safety 3600; purge-keys P90D; signatures-jitter PT12H; signatures-refresh P5D; signatures-validity P14D; signatures-validity-dnskey P14D; max-zone-ttl 86400; zone-propagation-delay 300; parent-ds-ttl 86400; parent-propagation-delay 3600; }; dnssec-policy "insecure" { max-zone-ttl 0; keys { }; };
Here's what I've presently got, and some related notes: // as of 2024-09-23: // TTL algorithm zsk bits ksk bits // default parent DS 86400 13 // 4.0.1.0.0.2.ip6.arpa DNSKEY 86400 8 2048 1024 // com parent DS 86400 13 // in parent DS 3600 8 2048 1024 // net parent DS 86400 13 // org parent DS 3600 8 2048 1024 dnssec-policy "my_8_2048_1024" { # legacy keys { ksk key-directory lifetime unlimited algorithm 8 2048; zsk key-directory lifetime 30d algorithm 8 1024; }; nsec3param; }; dnssec-policy "my_13_8_2048_1024" { # transition keys { ksk key-directory lifetime unlimited algorithm 13; zsk key-directory lifetime 30d algorithm 13; ksk key-directory lifetime unlimited algorithm 8 2048; zsk key-directory lifetime 30d algorithm 8 1024; }; nsec3param; }; dnssec-policy "my_8_2048_1024_ds_ttl_3600" { # in org keys { ksk key-directory lifetime unlimited algorithm 8 2048; zsk key-directory lifetime 30d algorithm 8 1024; }; parent-ds-ttl 3600; nsec3param; }; dnssec-policy "my_default" { keys { ksk key-directory lifetime unlimited algorithm 13; zsk key-directory lifetime 30d algorithm 13; }; nsec3param; };
And thus far, on that primary, these are the domains that have dnssec-policy and the policy they're set to: my_8_2048_1024 e.9.1.0.5.0.f.1.0.7.4.0.1.0.0.2.ip6.arpa my_13_8_2048_1024 sf-lug.com my_13_8_2048_1024 sflug.com my_8_2048_1024_ds_ttl_3600 savingthedolph.in my_13_8_2048_1024 sf-lug.net my_13_8_2048_1024 sflug.net my_8_2048_1024_ds_ttl_3600 digitalwitness.org my_8_2048_1024_ds_ttl_3600 sflug.org $
On Mon, Sep 23, 2024 at 10:53 PM Al aw009@sunnyside.com wrote:
On the way to bed, and no time to really dig into this now. I'll have to have a better look tomorrow. Seems like something is amiss on my ns0 (aka nsx) but not on ns1 or ns2 (aka nsy). (all .sunnyside.com)
ns0/nsx sample logs: root@post:/var/log/named# tail -f *.log | grep -i sflug 23-Sep-2024 22:39:25.360 lame-servers: info: lame server resolving 'sflug.org' (in 'sflug.org'?): 2603:3024:180d:f100:50:242:105:34#53 23-Sep-2024 22:39:25.368 lame-servers: info: insecurity proof failed resolving 'sflug.org/NS/IN': 2001:470:1f04:51a::2#53 23-Sep-2024 22:39:25.392 lame-servers: info: lame server resolving 'sflug.org' (in 'sflug.org'?): 2600:1f1c:528:c500:5e0b:8a37:6598:356c#53 23-Sep-2024 22:39:25.416 lame-servers: info: no valid RRSIG resolving 'sflug.org/NS/IN': 2001:470:1f05:19e::3#53 23-Sep-2024 22:39:25.440 lame-servers: info: no valid RRSIG resolving 'sflug.org/NS/IN': 96.95.217.99#53 23-Sep-2024 22:39:25.440 lame-servers: info: lame server resolving 'sflug.org' (in 'sflug.org'?): 50.242.105.52#53 23-Sep-2024 22:39:25.456 lame-servers: info: no valid RRSIG resolving 'sflug.org/NS/IN': 198.144.194.12#53 23-Sep-2024 22:39:25.476 lame-servers: info: lame server resolving 'sflug.org' (in 'sflug.org'?): 50.18.139.240#53 23-Sep-2024 22:39:25.504 lame-servers: info: no valid RRSIG resolving 'sflug.org/NS/IN': 96.86.170.229#53 23-Sep-2024 22:39:25.368 dnssec: info: view internal: validating sflug.org/NS: no valid signature found 23-Sep-2024 22:39:25.416 dnssec: info: view internal: validating sflug.org/NS: no valid signature found 23-Sep-2024 22:39:25.440 dnssec: info: view internal: validating sflug.org/NS: no valid signature found 23-Sep-2024 22:39:25.456 dnssec: info: view internal: validating sflug.org/NS: no valid signature found 23-Sep-2024 22:39:25.504 dnssec: info: view internal: validating sflug.org/NS: no valid signature found
All three machines show: sf-lug.org. 3591 IN SOA ns0.sf-lug.org. Michael.Paoli.berkeley.edu. 1727012375 10800 3600 3600000 86400
so at least they should be in sync.
I got this failure: al@post:/z/dns$ dig ns sflug.org
; <<>> DiG 9.16.6 <<>> ns sflug.org ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 27626 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1472 ; COOKIE: 6b64e1cd9d59aeff0100000066f2508d5ba5e929bf62035f (good) ;; QUESTION SECTION: ;sflug.org. IN NS
;; Query time: 155 msec ;; SERVER: 192.147.248.10#53(192.147.248.10) ;; WHEN: Mon Sep 23 22:39:25 PDT 2024 ;; MSG SIZE rcvd: 66
But a few minutes later it was fine. So whatever it is, it's intermittent.
More later... Al
On Tue, Sep 24, 2024 at 12:42 AM Michael Paoli michael.paoli@berkeley.edu wrote:
Not 100% sure what's made that difference between the domains. Some are exact same gTLD and key types and sizes, and dnssec-policy and procedures, etc. used, yet still seeing somewhat different results. E.g. on sflug.com. vs. sf-lug.com.
So, e.g. compare both, right after first change, both have added new ZSK: https://dnsviz.net/d/sf-lug.com/ZvGgLQ/dnssec/ https://dnsviz.net/d/sflug.com/ZvFWjQ/dnssec/ but sf-lug.com dropped the old ZSK way too fast (about instantly, or within minutes) whereas that should've probably rolled over 1 or 24 hours, depending upon TTL thereof, and we can see with sflug.com looks like it at least started a proper rollover.
On Tue, Sep 24, 2024 at 12:49 AM Michael Paoli michael.paoli@berkeley.edu wrote:
On Tue, Sep 24, 2024 at 12:42 AM Michael Paoli michael.paoli@berkeley.edu wrote:
Not 100% sure what's made that difference between the domains. Some are exact same gTLD and key types and sizes, and dnssec-policy and procedures, etc. used, yet still seeing somewhat different results.
So,
I think I've got the procedures all smoothly worked out ... though some I should still do some more testing on.
So, where'd I hit rough spots?
Yeah, notably the transition from auto-dnssec maintain to dnssec-policy That could've been better documented ... and/or I could've more thoroughly tested that first. Anyway, I believe I've got that all (or at least sufficiently) figured out at this point. BIND documentation implies that under dnssec-policy that rotation of key algorithms should be a non-issue. Of course I want to also be sure and test that on less critical domain(s) before presuming the documentation is quite correct on that.
This is what some of my named.conf.local now contains on the matter: //////////////////////////////////////////////////////////////////////// // dnssec-policy definitions // doc/bind9-doc/arm/reference.html#dnssec-policy-block-grammar // named -C | sed -ne '/^dnssec-policy "default" {$/,/^};$/{/^$/d;p};/^dnssec-policy "insecure" {$/,/^};$/{p;/^};$/q}' // https://gitlab.isc.org/isc-projects/bind9/-/blob/main/doc/misc/dnssec-policy... // https://www.iana.org/assignments/dns-sec-alg-numbers/dns-sec-alg-numbers.xht...
// auto-dnssec --> dnssec-policy: // Beware the following which can break DNSSEC: // Keys which are (far) too old, per policy, are instantly dropped, // even if they're in active use. Make use of dnssec-settime to // freshen that metadata and prevent that issue, e.g.: // # (cd /var/cache/bind/keys && // dnssec-settime -f -s Kdnssec.tmp.balug.org.+013+19831) // on key to update. Then to ensure that's (re)loaded, e.g.: // # systemctl reload named.service
If one is curious, lots of testing on dnssec.tmp.balug.org. (And that, useful enough, and perhaps again in future, I'll probably keep it around ... though may move it to, e.g. dnssec-test.balug.org.). One can look over many of those changes, including smooth and proper, and also anything but (and how to thoroughly break things), by looking over relevant history, see: https://dnsviz.net/d/dnssec.tmp.balug.org/dnssec/ And the earlier saved versions, the now current: https://dnsviz.net/d/dnssec.tmp.balug.org/ZvOVUA/dnssec/ and what came before it. Examples of both rotating KSK and ZSK. https://dnsviz.net/ will complain (slightly) about that zone, notably on some of the SOA data (vs. keys), notably: " RRSIG dnssec.tmp.balug.org/SOA alg 13, id 6523: With a TTL of 3600 the RRSIG RR can be in the cache of a non-validating resolver until 54 minutes after it expires at 2024-09-25 04:50:30+00:00. See RFC 4035, Sec. 5.3.3. " That's consequence of the way some of the testing is configured for that domain. Notably quite minimally permissible SOA values (per RFCs), and also quite short key and related timings - notably to be able to run tests on rotations and the like in much more reasonable timeframes (e.g. like around to well under an hour, as opposed to around 30 to 90 days or so), so, the compromises on those values - rather atypical, but quite handy for testing, lead to https://dnsviz.net/ finding one bit in RFCs between SOA data and some DNSSEC data that's not fully okay per RFCs (or, well, one bit of one RFC). Yes, https://dnsviz.net/ does some pretty darn good comprehensive testing - well including DNSSEC, but also quite useful even more generally in that regard for DNS - even without DNSSEC. So, domains and DNSSEC history testing ... now have: dnssec.tmp.balug.org. (though may relocate that to, e.g.: dnssec-test.balug.org.) Just recently got rid of: dnssec-test.mpaoli.net. and earlier dropped: tmp2.mpaoli.net. tmp.mpaoli.net. Though that last one may pop in and out of existence in various forms.
Okay, so, ... I think I got it effectively figured out. At least how to operate it, notably BIND 9 with dnssec-policy and including safely transitioning from from auto-dnssec maintain to dnssec-policy and doing ZSK and KSK rollovers.
So, first, I don't think there are any issues with DNSSEC itself, that's fine and solid as always. And sure, wee bit of care is required - as with DNS in general - as it is possible to shoot oneself in the foot. But it's not rocket science, and pretty straight forward.
I also think ISC BIND's dnssec-policy is a very significant improvement over their earlier auto-dnssec and such. But, despite their extensive documentation, alas, I think this time around that's where they fell significantly short. Some things just don't behave, at least under certain circumstances, at all like the documentation implies - and in manners that can significantly break things (most notably DNSSEC, and very much violating the principle of least surprise, etc.). So, there's definitely room for improvement there - most notably on the documentation. "Other than that", seems all quite solid as far as I can tell. And as far as I'm able to tell thus far, with proper procedures, all works basically fine. Alas, exactly what those proper procedures are - and aren't - not so clear from their documentation (and in some cases what their documentation implies is in fact not correct). So, maybe case of the code having gotten rather far ahead of and kind'a out'a step with the documentation, and ... documentation needs to properly catch up? Seems like it to me.
Anyway, I'll likely ... when I get around to it, and likely after yet more testing to fully confirm, probably do some wiki writeup bits on it to better document that (for myself, and others that may be looking for similar). At first I was thinking on www.wiki.balug.org, but probably more fitting is further enhancing Debian wiki: DNSSEC Howto for BIND 9.9+ https://wiki.debian.org/DNSSEC%20Howto%20for%20BIND%209.9+ (I'd earlier written/enhanced quite a bit of its content ... looks like time for yet some more of that). I should also give the ISC folks a bit of a nudge on the matter too ... most notably regarding their documentation. Also possible things were intended to go as the documentation implies, but at some point they significantly diverged.
And yes, alas, did have some more glitches with DNSSEC on sflug.org (non-canonical, notice no - in it). And did much further extensive testing on dnssec-test.balug.org. See also, e.g.: https://dnsviz.net/d/dnssec-test.balug.org/ZvaWoQ/dnssec/ etc.
And, short synopsis of key (no pun intended) bits? Goes about like this: auto-dnssec --> dnssec-policy: Make target policy and files, etc. fully consistent with existing keys, less instant breakage at conversion. Notably not only policy on key types and algorithms, aging/rotation, etc. but also, use dnssec-settime -f -s before switch to freshen creation time and provide any needed missing metadata. After that be sure to also readjust file ownerships/permissions, note also under dnssec-policy BIND user/group will need to not only be able to read files, but write directory/files. Rollover of ZSKs seems to work as documented, including automagically and/or per triggering with rndc dnssec -rollover. I've not tested CSK, but I'm presuming behaves similar to KSK. For KSK it's not nearly as smooth as documented, and though in some cases that seems to work as documented, that not uncommonly fails miserably. The proper way (/workaround) to do it smoothly appears to be like this: use dnssec-keygen to generate keypair consistent with policy use dnssec-settime -f -s to create all the relevant metadata appropriately adjust ownerships/permissions. use dnssec-dsfromkey (and/or other means as relevant) to get the needed data to add the new DS record for this new key - note on all these steps to wait the relevant TTLs to prevent issues (standard practice, particularly noteworthy where these changes are being done/triggered manually). use rndc sign zone use rndc dnssec -checkds -key newid published zone use rndc dnssec -checkds -key oldid withdrawn zone And probably highly similar applies to CSK What NOT to do, at least thus far (maybe this improves in future versions of BIND): for KSK and presumably also CSK keys: rndc dnssec -rollover -key oldid zone (though that appears perfectly fine for ZSK) as that will often instantly eject that key and instantly break DNSSEC, despite what current ISC BIND documentation suggest/implies. For KSK (and presumably likewise CSK) only use: rndc dnssec -rollover -key oldid zone after all the other relevant steps (and TTLs) have passed. Also appears using: rndc dnssec -rollover -key oldid zone straight off on KSK (and presumed CSK) may not always break things. It's possible this depends upon metadata on existing key(s) and how they got to that state, notably some of the metadata traced in the files alongside the primary zone files.
Upon further testing, mostly just don't do: rndc dnssec -rollover -key oldid zone even for ZSK, as appears in at least some circumstances it instantly ejects the old key when it ought not. So only use it as a last step to finally get that key out of DNS (and remove key file itself if policy is likewise set to do that or later to that) after all other bits of the rollover have been properly done. So, in other words, don't use -rollover to do rollover. <sigh> Just don't. At least not for this (1:9.18.28-1~deb12u2) version of BIND ... and probably any BIND with dnssec-policy <= at least 9.18.x I'll probably reevaluate with future major BIND version upgrade, but I'm not going to trust that to properly do rollover with the current, given its oft observed behaviors.
On Fri, Sep 27, 2024 at 7:45 AM Michael Paoli michael.paoli@berkeley.edu wrote:
Okay, so, ... I think I got it effectively figured out. rndc dnssec -rollover -key oldid zone (though that appears perfectly fine for ZSK)
And ... Yay! Think I've got it pretty well figured out. There is a way to go from auto-dnssec maintain to dnssec-policy and even do key rotations thereafter, and manage to not break DNSSEC at all. At this point have run through multiple test runs of that, and all looking good. Mostly a matter of exactly how to (and not to) do the transition from auto-dnssec maintain to dnssec-policy and how to do the first key rotations after that. Once all that's well taken care of, looks like it's easy peay thereafter. Just have to properly make it to that point first.
E.g. can look at these sequences, from: 2024-09-29T0823:45Z https://dnsviz.net/d/dnssec-test.balug.org/ZvkOkQ/dnssec/ through 2024-09-29T22:46:52Z https://dnsviz.net/d/dnssec-test.balug.org/dnssec/
On Fri, Sep 27, 2024 at 2:56 PM Michael Paoli michael.paoli@berkeley.edu wrote:
Upon further testing, mostly just don't do: rndc dnssec -rollover -key oldid zone even for ZSK, as appears in at least some circumstances it instantly ejects the old key when it ought not. So only use it as a last step to finally get that key out of DNS (and remove key file itself if policy is likewise set to do that or later to that) after all other bits of the rollover have been properly done. So, in other words, don't use -rollover to do rollover. <sigh> Just don't. At least not for this (1:9.18.28-1~deb12u2) version of BIND ... and probably any BIND with dnssec-policy <= at least 9.18.x I'll probably reevaluate with future major BIND version upgrade, but I'm not going to trust that to properly do rollover with the current, given its oft observed behaviors.
On Fri, Sep 27, 2024 at 7:45 AM Michael Paoli michael.paoli@berkeley.edu wrote:
Okay, so, ... I think I got it effectively figured out. rndc dnssec -rollover -key oldid zone (though that appears perfectly fine for ZSK)