[BALUG-Admin] After compensating for flaky secondaries, one notes flaky primaries

Rick Moen rick@linuxmafia.com
Mon Oct 16 10:14:25 PDT 2017

Thought you'd be amused at this cautionary tale, Michael P.

----- Forwarded message from Rick Moen <rick@linuxmafia.com> -----

Date: Mon, 16 Oct 2017 10:05:29 -0700
From: Rick Moen <rick@linuxmafia.com>
To: Duncan MacKinnon <duncan1@gmail.com>
Subject: Rest of that story about secondary DNS
Organization: If you lived here, you'd be $HOME already.

I _think_ I started to tell you a story about doing secondary DNS for
people, and something I learned.  Of course, the standard model is
supposed to be:  You do auth DNS for my domains; I do it for yours.

Years ago, I started to see the flaw in that optimistic,
we-help-each-other mental model.  There was a user group in Santa Cruz,
SMAUG, which owned domain 'scruz.net'.   (Terrible naming and choice of
domain; not my doing, not my call.)  I ended up being primary/master DNS, and
we were doing really well because we signed up five more individuals
with auth nameservers to help out with secondary/slave DNS,  for six
auth nameservers total, widely dispersed geographically and with a lot
of geographic diversity.  That's fabulous redundancy.  What could
possibly go wrong?  </deadpan>

I relaxed about quality of service, because obviously we were way ahead
of the game.  (SMAUG had a mailing list on the SVLUG mailing list
server.  The mailing list still exists, derelict, the group having now
fallen apart.)  Roll forward to one day when my household uplink through
Raw Bandwidth Communications was offline for about an hour because
SBC/AT&T shot the company in the foot.

When my aDSL came back, I found postings to SMAUG's mailing list
bitching about the scruz.net nameservice having been totally offline.  
I noticed that some of the complaining came from the five individuals
who were allegedly doing secondary/slave nameservice.  Hmm?

So, I checked on the five secondaries.  Certainly, my aDSL being offline
for an hour should not have taken all DNS offline.  And what I found
was:  Over about a two-year period, some of the five had moved their
nameservers to new IPs and failed to notify me as master nameserver
admin.  Some had ceased doing auth DNS entirely, and failed to notify me
as master nameserver admin.  Some still had the same nameserver running
at the same IP as always, but had quietly ceased doing auth namservice
for scruz.net, and failed to notify me as master nameserver admin.

All of the nameserver IPs they'd provided me for their secondary
nameservice were still listed in the whois (and as NS lines for the
domain in the parent .net zone).  But exactly one nameserver still
existed and was actually _doing_ auth DNS for scruz.net -- mine.
All five of the others had silently flaked out.  Which made it extra
galling that some of these guys complained about -my- nameservice being
unreliable, since theirs was 100% unreliable, their having broken it
in various ways, whereas mine worked great except once in a blue moon
when my uplink went down.

I thought:  OK, obviously it turns out to be a mistake to just trust
that secondaries will continue to exist and that their operators will
do due-diligence communication with the primary when something important
changes.  They _should_, but it turns out they don't.

So, I wrote a weekly cron script to check on all the secondaries for my
two domains, linuxmafia.com and unixmercenary.net:  It queries and
reports the parent-zone NS "glue" records, queries and reports the
nameservers declared authoritative in whois, and reports each auth
nameserver's zonefile S/N so I can make sure they all respond and give
the same value.  This means I can detect and act on flaky secondaries.

What I did _not_ do was bother writing a script to check on other
people's master nameservers, on domains for which *I* do secondary
nameservice.  Failures in this case are almost entirely the domain
owner's problem, not mine.  As long as I keep _my_ word for quality of
secondary DNS, I'm OK.

Well, almost.  I do secondary for five or six domains Ruben Safir owns,
and recently double-checked those.  For most of them, I advised Ruben
_again_ that having only two auth nameserver isn't enough and is
dangerously thin.  I urged him to find a couple more, somewhere.

For one of them, nylxs.com, I noticed and advised Ruben that _neither_
his nor my auth nameservers were authoritative any more.  Instead of 
the records now listed the auth nameservers like so:
$ whois nylxs.com | grep 'Name Server'

I wrote Ruben:  'I find that you have _ceased_ using my secondary
(slave) nameservice, but neglected to inform me.  That's rude, Ruben.
You need to friggin' tell your secondaries if/when you move auth
nameservice somewhere else.  Grr.  Learn to do it right, already!'

Turns out, that's not what happened, exactly.

Ruben had failed to pay his domain renewal, so his registrar (GoDaddy)
had repointed its DNS to 'parked domain' nameservice from its own
domaincontrol.com nameservers, making those authoritative in place of
Ruben's and mine.

Luckily, because I (in effect) warned Ruben of his expired domain in
time, he was able to renew it.

So, lesson:  When you find yourself annoyingly still doing futile
secondary DNS for a domain whose owner _seems_ to have moved auth
nameservice elsewhere without telling you, the explanation isn't always
owner lack of diligence concerning communicating with secondaries:
Sometimes, it's merely owner lack of diligence in paying the bill.

----- End forwarded message -----

More information about the BALUG-Admin mailing list