[BALUG-Admin] online again: outage: [www.]{sf-lug,balug}.org

Wed Nov 21 16:59:41 PST 2018

online again,
the short version of the outage synopsis:
unmanaged switch failed, connectivity lost around:
2018-11-21T02:06:39Z
switch replaced, connectivity restored around:
2018-11-21T23:42:09Z

the longer version, first mostly from the log, and with some adjustments
on details:
2018-11-20T-08:00
2018-11-21T+00:00
Lost (Internet+) connectivity, apparently first briefly, then solid:
Nov 21 01:44:09 [physical host that has BALUG as guest VM] kernel:  
[3213467.919511] e1000e: eth0 NIC Link is Down
Nov 21 01:44:41 [physical host that has BALUG as guest VM] kernel:  
[3213499.958862] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex,  
Flow Control: Rx/Tx
Nov 21 02:06:39 [physical host that has BALUG as guest VM] kernel:  
[3214817.086977] e1000e: eth0 NIC Link is Down
POTS connection verified still good; deferred remaining troubleshooting, etc.

2018-11-21
Investigated connectivity issue:
tigger shows link down - both software and hardware LEDs,
unmanaged gig switch:
D-Link MODEL: DGS-1005D P/N: BDGS1005D.B1 S/N: DR1914C001239 H/W Ver.: B1
shows no link or activity (host tigger connected to Internet via above),
only LED(s) lit on it are power.
Westell DSL modem also shows no link nor activity, otherwise appears
normal ("top" two LEDs, "Power" and "Ready" are both solid green),
suspected switch, but covering bases also,
tried power cycling the Westell DSL modem - since it tends to require
that once in a while anyway (but fails in different mode than current
situation) ... no change.
Tried substituting in another DSL modem - probably not needed, but given the
semiregular failures of the Westell, thought might be good to try out
other (a Motorola) DSL modem for a while and see if it's more reliable
... well, that one worked some year(s) back, but seems to have subsequently
failed and no longer work as far as I could easily and quickly tell -
it showed absolutely no signs of life (not a single LED lit),
so switched back to the Westell again.
After powercycling the switch, not even the power LED on switch would
come back on (though it may have briefly flickered), and still couldn't
get it to show any link on any ports.
Swapped out switch, removing the:
D-Link MODEL: DGS-1005D P/N: BDGS1005D.B1 S/N: DR1914C001239 H/W Ver.: B1
and replacing it with:
TP-Link Model: TL-SG1008D S/N: 2177442009713 8-Port Gigabit Desktop  
Switch (unmanaged), and all fine again after that, service restored at:
Nov 21 23:42:09 [physical host that has BALUG as guest VM] kernel:  
[3292541.509278] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex,  
Flow Control: Rx/Tx

Anyway, ... guess that
D-Link MODEL: DGS-1005D P/N: BDGS1005D.B1 S/N: DR1914C001239 H/W Ver.: B1
it had zero issues prior to this failure, and I originally obtained it
and had put it into service ... from the log:
2005-10-17
purchased and installed:
D-Link
MODEL: DGS-1005D
P/N: BDGS1005D.B1
S/N: DR1914C001239
H/W Ver.: B1
and including power supply:
D-Link
Model No.: JTA0302C
2R01044803741
from CompUSA
replacing/"retiring":
Hawking Technology
Model NO: PN108ES

> From: "Michael Paoli" <Michael.Paoli@cal.berkeley.edu>
> Subject: [BALUG-Admin] outage: [www.]{sf-lug,balug}.org
> Date: Tue, 20 Nov 2018 19:02:21 -0800

> May be mostly moot by the time this makes it through to the BALUG
> list server and out.  In any case ...
>
> ----- Forwarded message from Michael.Paoli@cal.berkeley.edu -----
>     Date: Tue, 20 Nov 2018 18:59:24 -0800
>     From: "Michael Paoli" <Michael.Paoli@cal.berkeley.edu>
>  Subject: outage: [www.]{sf-lug,balug}.org
>       To: SF-LUG <sf-lug@linuxmafia.com>
>
> And ... outage again :-\
>
> Guestimating if it's the "usual", will likely be on-line again
> by sometime Wednesday evening (I may or may not get to it before then).
>
> Per earlier:
> impacts all [*.]sf-lug.{org,com} & [*.]balug.org
> SF-LUG lists remain up and on-line (at least as far as I'm aware).
>
> Also, DNS mostly not impacted (in general, slaves remain functional),
> though there may be some additional latencies on DNS due to failovers
> to other nameservers.
>
>> From: "Michael Paoli" <Michael.Paoli@cal.berkeley.edu>
>> Subject: "all better now:" Re: outage: [www.]{sf-lug,balug}.org
>> Date: Wed, 10 Jan 2018 20:15:04 -0800
>
>> And ... again, same deal, went off-line at:
>> <~= 2018-01-11T01:23:21+00:00 2018-01-10T17:23:21-08:00
>> and back on-line by:
>>> ~= 2018-01-11T04:04:06+00:00 2018-01-10T20:04:06-08:00
>> ... again, swift kick to the power switch on DSL modem to reset it,
>> and "all better" - again ... at least for now.
>>
>>> From: "Michael Paoli" <Michael.Paoli@cal.berkeley.edu>
>>> Subject: "all better now:" Re: outage: [www.]{sf-lug,balug}.org  
>>> (ETR: eveningish)
>>> Date: Wed, 06 Dec 2017 18:08:09 -0800
>>
>>> And ... swift kick to the power switch on DSL modem to reset it,
>>> and "all better now".
>>> Looks like the outage started around:
>>> BALUG PING: ping6 2001:470:1f04:19e::2 FAILED 2017-12-06T20:53:56+00:00
>>> GW PING: ping 198.144.194.233 FAILED 2017-12-06T20:54:09+00:00
>>> and service restored a few minutes ago
>>>
>>>
>>>> From: "Michael Paoli" <Michael.Paoli@cal.berkeley.edu>
>>>> Subject: outage: [www.]{sf-lug,balug}.org (ETR: eveningish)
>>>> Date: Wed, 06 Dec 2017 14:46:38 -0800
>>>
>>>> So,
>>>>
>>>> They were up and on-line this morning, at least as late as about
>>>> mid-morning or later, but off-line now.
>>>> POTS line appears to be working, but not IP,
>>>> likely the DSL modem needs to be reset (powercycled) again ...
>>>> but don't have means to do that remotely.
>>>>
>>>> Presumably I'll have this resolved sometime this evening once I'm
>>>> on-site again.
>>>>
>>>> impacts all [*.]sf-lug.{org,com} & [*.]balug.org
>>>> SF-LUG lists remain up and on-line (at least as far as I'm aware).
>
>
> ----- End forwarded message -----