+BALUG-Admin Al,
Thanks for noticing!
Did some DNSSEC key rollovers, notably changing in bind 1:9.18.28-1~deb12u2 from: auto-dnssec maintain (auto-dnssec is deprecated/obsolete and will be going away in future) to: dnssec-policy and related changes. "Of course" starting with non-canonical less critical domains. Yeah, if one goofs with that transition, DNSSEC can get messed up and cause some failures (at least for a wee bit, notably depending upon caching and relevant TTLs).
Was "only" mostly just rolling some ZSKs (and adding KSKs some), so even there damage rather limited in scope (notably also time).
Still, there ought not be hiccups. switching from auto-dnssec maintain to dnssec-policy is a bit persnickety/hazardous (found that out many moons ago with my very first such transition), notably any existing keys not compatible with such policy when one switches from auto-dnssec maintain to dnssec-policy are summarily ejected ... and that can be a (quite) bad thing. So, pretty careful but still, not sure exactly why, but for some zones it instantly dropped the old ZSK, whereas for others it did the much more expected rollover - adding the new, doing the relevant additional signings, waiting the requisite TTL times, and only then dropping the then vestigial old. Not 100% sure what's made that difference between the domains. Some are exact same gTLD and key types and sizes, and dnssec-policy and procedures, etc. used, yet still seeing somewhat different results. E.g. on sflug.com. vs. sf-lug.com.
See also: https://dnsviz.net/ Can also there view the older (where saved) DNSSEC data too. Domains that have thus far had such changes in the last day or two: Thus far mostly worked on: (non-canonicals and less critical): e.9.1.0.5.0.f.1.0.7.4.0.1.0.0.2.ip6.arpa sf-lug.com sflug.com savingthedolph.in sf-lug.net sflug.net digitalwitness.org sflug.org And will probably hold off on these 'till all such kinks are worked out: sf-lug.org berkeleylug.com balug.org
Longer term bit 'o plan is to not only get all of the auto-dnssec maintain config bits updated to dnssec-policy, but also at least rotate the ZSKs if they've not been rotated in a while (ought get rotated about monthly - and they may or may not have been already doing that), and then work up to KSKs. KSKs probably ought get rotated about yearly or so, but looks like most all of 'em are 4+ years old. No extreme rush or anything, but they could use wee bit of attention ... and also not - at least thus far - critical. Also, if/where registrars have implemented RFC 7344 that should make things way easier to handle that part of it.
Hmmm, and just a little while ago I blew away delegated subdomain that I'd much earlier used for some such testing ... maybe time for some more such testing to get to the bottom of the matter.
And if one's curious, that version of bind's default provided configurations for that: $ named -C | sed -ne '/^dnssec-policy "default" {$/,/^};$/{/^$/d;p};/^dnssec-policy "insecure" {$/,/^};$/{p;/^};$/q}' | expand dnssec-policy "default" { keys { csk key-directory lifetime unlimited algorithm 13; }; dnskey-ttl 3600; publish-safety 3600; retire-safety 3600; purge-keys P90D; signatures-jitter PT12H; signatures-refresh P5D; signatures-validity P14D; signatures-validity-dnskey P14D; max-zone-ttl 86400; zone-propagation-delay 300; parent-ds-ttl 86400; parent-propagation-delay 3600; }; dnssec-policy "insecure" { max-zone-ttl 0; keys { }; };
Here's what I've presently got, and some related notes: // as of 2024-09-23: // TTL algorithm zsk bits ksk bits // default parent DS 86400 13 // 4.0.1.0.0.2.ip6.arpa DNSKEY 86400 8 2048 1024 // com parent DS 86400 13 // in parent DS 3600 8 2048 1024 // net parent DS 86400 13 // org parent DS 3600 8 2048 1024 dnssec-policy "my_8_2048_1024" { # legacy keys { ksk key-directory lifetime unlimited algorithm 8 2048; zsk key-directory lifetime 30d algorithm 8 1024; }; nsec3param; }; dnssec-policy "my_13_8_2048_1024" { # transition keys { ksk key-directory lifetime unlimited algorithm 13; zsk key-directory lifetime 30d algorithm 13; ksk key-directory lifetime unlimited algorithm 8 2048; zsk key-directory lifetime 30d algorithm 8 1024; }; nsec3param; }; dnssec-policy "my_8_2048_1024_ds_ttl_3600" { # in org keys { ksk key-directory lifetime unlimited algorithm 8 2048; zsk key-directory lifetime 30d algorithm 8 1024; }; parent-ds-ttl 3600; nsec3param; }; dnssec-policy "my_default" { keys { ksk key-directory lifetime unlimited algorithm 13; zsk key-directory lifetime 30d algorithm 13; }; nsec3param; };
And thus far, on that primary, these are the domains that have dnssec-policy and the policy they're set to: my_8_2048_1024 e.9.1.0.5.0.f.1.0.7.4.0.1.0.0.2.ip6.arpa my_13_8_2048_1024 sf-lug.com my_13_8_2048_1024 sflug.com my_8_2048_1024_ds_ttl_3600 savingthedolph.in my_13_8_2048_1024 sf-lug.net my_13_8_2048_1024 sflug.net my_8_2048_1024_ds_ttl_3600 digitalwitness.org my_8_2048_1024_ds_ttl_3600 sflug.org $
On Mon, Sep 23, 2024 at 10:53 PM Al aw009@sunnyside.com wrote:
On the way to bed, and no time to really dig into this now. I'll have to have a better look tomorrow. Seems like something is amiss on my ns0 (aka nsx) but not on ns1 or ns2 (aka nsy). (all .sunnyside.com)
ns0/nsx sample logs: root@post:/var/log/named# tail -f *.log | grep -i sflug 23-Sep-2024 22:39:25.360 lame-servers: info: lame server resolving 'sflug.org' (in 'sflug.org'?): 2603:3024:180d:f100:50:242:105:34#53 23-Sep-2024 22:39:25.368 lame-servers: info: insecurity proof failed resolving 'sflug.org/NS/IN': 2001:470:1f04:51a::2#53 23-Sep-2024 22:39:25.392 lame-servers: info: lame server resolving 'sflug.org' (in 'sflug.org'?): 2600:1f1c:528:c500:5e0b:8a37:6598:356c#53 23-Sep-2024 22:39:25.416 lame-servers: info: no valid RRSIG resolving 'sflug.org/NS/IN': 2001:470:1f05:19e::3#53 23-Sep-2024 22:39:25.440 lame-servers: info: no valid RRSIG resolving 'sflug.org/NS/IN': 96.95.217.99#53 23-Sep-2024 22:39:25.440 lame-servers: info: lame server resolving 'sflug.org' (in 'sflug.org'?): 50.242.105.52#53 23-Sep-2024 22:39:25.456 lame-servers: info: no valid RRSIG resolving 'sflug.org/NS/IN': 198.144.194.12#53 23-Sep-2024 22:39:25.476 lame-servers: info: lame server resolving 'sflug.org' (in 'sflug.org'?): 50.18.139.240#53 23-Sep-2024 22:39:25.504 lame-servers: info: no valid RRSIG resolving 'sflug.org/NS/IN': 96.86.170.229#53 23-Sep-2024 22:39:25.368 dnssec: info: view internal: validating sflug.org/NS: no valid signature found 23-Sep-2024 22:39:25.416 dnssec: info: view internal: validating sflug.org/NS: no valid signature found 23-Sep-2024 22:39:25.440 dnssec: info: view internal: validating sflug.org/NS: no valid signature found 23-Sep-2024 22:39:25.456 dnssec: info: view internal: validating sflug.org/NS: no valid signature found 23-Sep-2024 22:39:25.504 dnssec: info: view internal: validating sflug.org/NS: no valid signature found
All three machines show: sf-lug.org. 3591 IN SOA ns0.sf-lug.org. Michael.Paoli.berkeley.edu. 1727012375 10800 3600 3600000 86400
so at least they should be in sync.
I got this failure: al@post:/z/dns$ dig ns sflug.org
; <<>> DiG 9.16.6 <<>> ns sflug.org ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 27626 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1472 ; COOKIE: 6b64e1cd9d59aeff0100000066f2508d5ba5e929bf62035f (good) ;; QUESTION SECTION: ;sflug.org. IN NS
;; Query time: 155 msec ;; SERVER: 192.147.248.10#53(192.147.248.10) ;; WHEN: Mon Sep 23 22:39:25 PDT 2024 ;; MSG SIZE rcvd: 66
But a few minutes later it was fine. So whatever it is, it's intermittent.
More later... Al