[BALUG-Test] Fwd: And the crash and the issue and image attachment test - test - ignore

23 Aug 2024


      Okay, good actually, that "failed",
held for moderator approval, because, with
attachment, too large.
So, now, forwarded below, without attachment,
and there's link in body if one wants to view the attachment.
And here's at least part of the bounce bit that let me know it "failed":
From: balug-test-owner@lists.balug.org
Date: Thu, Aug 22, 2024 at 11:27 PM
Subject: balug-test@lists.balug.org post from
michael.paoli@berkeley.edu requires approval
To: balug-test-owner@lists.balug.org
As list administrator, your authorization is requested for the
following mailing list posting:
List:    balug-test@lists.balug.org
    From:    michael.paoli@berkeley.edu
    Subject: And the crash and the issue and image attachment test -
test - ignore
The message is being held because:
The message is larger than the 40 KB maximum size
At your convenience, visit your dashboard to approve or deny the
request.
---------- Forwarded message ---------
From: Michael Paoli michael.paoli@berkeley.edu
Date: Thu, Aug 22, 2024 at 11:26 PM
Subject: And the crash and the issue and image attachment test - test - ignore
To: BALUG-Test balug-test@lists.balug.org
So,
All was working fine until ...
There was crash(/lockup) ... see "attached"
image (if it makes it through to list?).
And ... if you really want to see that image and it's
not (or no longer attached or not or no longer in the
archive), I've also, at least temporarily
located it here:
https://www.balug.org/tmp.lists/crash.jpg
That's from physical host "vicki",
upon which sometimes the BALUG VM runs.
It was running on there this past Tuesday (US/Pacific) evening,
and then fairly late evening - a
crash - relatively rare short of having a
power glitch/outage (do have a moderate bit of those,
don't have UPS, and when running on laptop (with battery
that holds no charge), if I manage to accidentally pull/nuge
wiggle cord connection out - it drops hard).
Anyway, the image - that's photo I took of the console
screen of the physical vicki host (powered on ye olde CRT -
relatively rare I do that), and took picture from "smart" phone
(and then trimmed excess bits out of the image ... also dropped the
quality a bit to reduce image file size while not losing much, if anything
in readability of the text on the screen).
So, before that, all was working fine.
After that all seems to be working fine except
for the web interface to archives and such
(postorius / hyperkitty).
And thus far all my testing and troubleshooting and
isolating, it actually looks like the Mailman 3 parts of it
(all, or almost all) are working properly.  The mail interface
still appears to be working fine.  The django portion of web
interface works fine.  But on the Apache server, the
postorius portion (mostly*) fails ... however, when I did (most notably
with strace) on the communication between the two,
it appears postorius is responding perfectly fine with good
content, yet somehow Apache has issues or doesn't get that
content, and ends up giving a 500 error page.
*mosty ... I did the other day, stumble across a bit that still works.
List membership roster ... with cookie ... can't login now, but still
having older
valid authentication cookie, I'm able to load up that page fine:
https://lists.balug.org/mailman3/postorius/lists/balug-test.lists.balug.org/...
... well, it actually wants to download it, and downloads it fine with
the correct data.
So, yeah, odd, that bit of postorius works all the way through
Apache and to client ... haven't, however, found other parts that
make it through successfully like that while this issue is otherwise
still present.
And yes, did also do strace(1) data gathering on that too ... haven't yet
isolated why that works and (most of the rest) doesn't - but may use
significantly different interface/component(s) under the covers.
Anyway, that's where things presently stand.
Still working to (isolate and) fix the issue.
Meantime, did also do some updates on the main page,
notably: https://www.balug.org/#Lists
So folks at least have a clue/information about that (and
workarounds as feasible).
I'm guestimating there's like maybe some lock or state file that didn't
get properly cleared or reset, or some subtle corruption or the like - and
something along those lines is what's causing the issue.  Don't know for sure,
but given circumstances, that seems possible/probable.  Also possible (but
perhaps not as likely) there was some subtle latent defect, as I'd not rebooted
in some moderate while, and the (crash and) reboot exposed that issue,
where it wasn't seen before that.  But looking over the
(re)boot history, that doesn't seem most probable.
Let's see, peeking at that again ... (and times UTC / GMT0),
let's see, the not quite latest (Aug 21 06:20)
was the crash and subsequent boot, and at least all of quite a number
immediately before that were regular normal reboots.
And the one I did after that was another reboot just to see if
that might happen to clear the issue/error (no luck on that).
$ { who -H | head -n 1; who -r /var/log/wtmp | tac; } | head -n 20
NAME     LINE         TIME         COMMENT
         run-level 5  Aug 21 08:57
         run-level    Aug 21 08:56
         run-level 5  Aug 21 06:20
         run-level 5  Aug 19 07:23
         run-level    Aug 19 07:22
         run-level 5  Jul 30 15:13
         run-level    Jul 30 14:50
         run-level 5  Jul 28 00:10
         run-level 5  Jul  6 14:45
         run-level 5  Jul  5 09:02
         run-level    Jul  5 09:01
         run-level 5  Jul  5 03:31
         run-level    Jul  5 03:30
         run-level 5  Jun 23 00:03
         run-level 5  Jun 18 15:44
         run-level 5  Jun 17 08:16
         run-level 5  Jun 17 05:53
         run-level 5  May 26 22:58
         run-level 5  May 26 19:44

2025

2024

2023

2022

2021

2020

2019

2018

2017

[BALUG-Test] Fwd: And the crash and the issue and image attachment test - test - ignore