Nameserver ns0.balug.org responds to ping, responds to DNS queries, _but_ refuses AXFR request for zone balug.org from slave nameserver ns1.linuxmafia.com. (Below mail from logcheck shows the first appearance of refusal in the logs; this behaviour is still ongoing.)
[rick@linuxmafia] ~ $ dig -t axfr balug.org @96.86.170.229 ;; Connection to 96.86.170.229#53(96.86.170.229) for balug.org failed: connection refused. [rick@linuxmafia] ~ $
----- Forwarded message from logcheck system account logcheck@linuxmafia.com -----
Date: Thu, 16 Jan 2020 12:02:01 -0800 From: logcheck system account logcheck@linuxmafia.com To: root@linuxmafia.com Subject: linuxmafia.com 2020-01-16 12:02 System Events
System Events =-=-=-=-=-=-= Jan 16 11:33:15 linuxmafia named[4620]: client 96.86.170.229#6229: received notify for zone 'balug.org' Jan 16 11:33:15 linuxmafia named[4620]: zone balug.org/IN: Transfer started. Jan 16 11:33:15 linuxmafia named[4620]: transfer of 'balug.org/IN' from 96.86.170.229#53: failed to connect: connection refused Jan 16 11:33:15 linuxmafia named[4620]: transfer of 'balug.org/IN' from 96.86.170.229#53: Transfer completed: 0 messages, 0 records, 0 bytes, 0.028 secs (0 bytes/sec)
----- End forwarded message -----
Thanks for catching that. Should be "all better now".
Apparently was "operator error" (me) 8-O ... I shouldn't ought be trying to pay attention to & do multiple important things at the same time (was on relatively long non-trivial phone call, while also doing unrelated Linux sysadmin activities at same time ... what could possibly go wrong? Well ...). W evening, I accidentally took down network interface on incorrect system (intended to do it on an unrelated virtual machine, but inadvertently did so on the physical host (oops!), which also had implications for other virtual machines (notably balug) on same physical host. Went about working to correct that, and after some futzing about, got all the interfaces and networking stuff squared away again ... but alas, I forgot to check bind9 (and potentially other services too). For better and/or worse, bind9 is pretty good at dropping privilege. Unfortunately this also causes bind to - at least mostly - stop listening on interfaces (or at least IPv4 IPs) that go away ... even if/when they come back ... at least typically without a reload or the like. I may have done a reload, but in any case, didn't check. And, with whatever happened in this particular case, reload wasn't sufficient to get bind also listening on TCP on the primary Internet facing IPv4 IP address ... (the interface didn't merely drop IPs, but was taken fully down and away and brought back - in part of the earlier troubleshooting/correction process), a restart of bind9 did, however resolve the listening issue - and seems things to be "all better now".
I ought put some more proper monitoring in place - one of many things on the todo list. I'm more likely to do that after I upgrade my relevant Debian oldstable installations to stable (by adding the monitoring after such upgrade, that's one less set of custom configurations needing review and possibly upgrades going through major OS version upgrade).
From: "Rick Moen" rick@linuxmafia.com To: balug-admin@lists.balug.org Cc: Michael Paoli Michael.Paoli@cal.berkeley.edu Subject: ns0.balug.org refusing AXFR request from ns1.linuxmafia.com, zone balug.org Date: Fri, 17 Jan 2020 00:37:18 -0800
Nameserver ns0.balug.org responds to ping, responds to DNS queries, _but_ refuses AXFR request for zone balug.org from slave nameserver ns1.linuxmafia.com. (Below mail from logcheck shows the first appearance of refusal in the logs; this behaviour is still ongoing.)
[rick@linuxmafia] ~ $ dig -t axfr balug.org @96.86.170.229 ;; Connection to 96.86.170.229#53(96.86.170.229) for balug.org failed: connection refused. [rick@linuxmafia] ~ $
----- Forwarded message from logcheck system account logcheck@linuxmafia.com -----
Date: Thu, 16 Jan 2020 12:02:01 -0800 From: logcheck system account logcheck@linuxmafia.com To: root@linuxmafia.com Subject: linuxmafia.com 2020-01-16 12:02 System Events
System Events
Jan 16 11:33:15 linuxmafia named[4620]: client 96.86.170.229#6229: received notify for zone 'balug.org' Jan 16 11:33:15 linuxmafia named[4620]: zone balug.org/IN: Transfer started. Jan 16 11:33:15 linuxmafia named[4620]: transfer of 'balug.org/IN' from 96.86.170.229#53: failed to connect: connection refused Jan 16 11:33:15 linuxmafia named[4620]: transfer of 'balug.org/IN' from 96.86.170.229#53: Transfer completed: 0 messages, 0 records, 0 bytes, 0.028 secs (0 bytes/sec)
----- End forwarded message -----
Quoting Michael Paoli (Michael.Paoli@cal.berkeley.edu):
Thanks for catching that. Should be "all better now".
A slave nameserver admin running a well-tuned instance of logcheck is the next best thing to automated service monitoring -- and less likely to wake you up with SMS alerts. ;->
And, with whatever happened in this particular case, reload wasn't sufficient to get bind also listening on TCP on the primary Internet facing IPv4 IP address ...
Wow, that's subtle -- and pernicious, in that almost all DNS queries will then work (because UDP), and the only things that won't are DNS queries with answers longer than 512 bytes (requiring TCP transport) and AXFR/IXFR zone transfers (requiring TCP transport).
(I'm explaining for the benefit of readers who may not be old hands at this.)
That's the kind of non-obvious breakage one normally sees only with attempting to pass DNS through firewall rules but forgetting that UDP isn't always sufficient.