Woo hoo, looks like I got the issue fixed! More details in a bit ...
On Thu, Aug 22, 2024 at 11:42 PM Michael Paoli michael.paoli@berkeley.edu wrote:
Okay, good actually, that "failed", held for moderator approval, because, with attachment, too large. So, now, forwarded below, without attachment, and there's link in body if one wants to view the attachment.
And here's at least part of the bounce bit that let me know it "failed":
From: balug-test-owner@lists.balug.org Date: Thu, Aug 22, 2024 at 11:27 PM Subject: balug-test@lists.balug.org post from michael.paoli@berkeley.edu requires approval To: balug-test-owner@lists.balug.org
As list administrator, your authorization is requested for the following mailing list posting:
List: balug-test@lists.balug.org From: michael.paoli@berkeley.edu Subject: And the crash and the issue and image attachment test -
test - ignore
The message is being held because:
The message is larger than the 40 KB maximum size
At your convenience, visit your dashboard to approve or deny the request.
---------- Forwarded message --------- From: Michael Paoli michael.paoli@berkeley.edu Date: Thu, Aug 22, 2024 at 11:26 PM Subject: And the crash and the issue and image attachment test - test - ignore To: BALUG-Test balug-test@lists.balug.org
So,
All was working fine until ... There was crash(/lockup) ... see "attached" image (if it makes it through to list?). And ... if you really want to see that image and it's not (or no longer attached or not or no longer in the archive), I've also, at least temporarily located it here: https://www.balug.org/tmp.lists/crash.jpg That's from physical host "vicki", upon which sometimes the BALUG VM runs. It was running on there this past Tuesday (US/Pacific) evening, and then fairly late evening - a crash - relatively rare short of having a power glitch/outage (do have a moderate bit of those, don't have UPS, and when running on laptop (with battery that holds no charge), if I manage to accidentally pull/nuge wiggle cord connection out - it drops hard). Anyway, the image - that's photo I took of the console screen of the physical vicki host (powered on ye olde CRT - relatively rare I do that), and took picture from "smart" phone (and then trimmed excess bits out of the image ... also dropped the quality a bit to reduce image file size while not losing much, if anything in readability of the text on the screen).
So, before that, all was working fine. After that all seems to be working fine except for the web interface to archives and such (postorius / hyperkitty).
And thus far all my testing and troubleshooting and isolating, it actually looks like the Mailman 3 parts of it (all, or almost all) are working properly. The mail interface still appears to be working fine. The django portion of web interface works fine. But on the Apache server, the postorius portion (mostly*) fails ... however, when I did (most notably with strace) on the communication between the two, it appears postorius is responding perfectly fine with good content, yet somehow Apache has issues or doesn't get that content, and ends up giving a 500 error page. *mosty ... I did the other day, stumble across a bit that still works. List membership roster ... with cookie ... can't login now, but still having older valid authentication cookie, I'm able to load up that page fine: https://lists.balug.org/mailman3/postorius/lists/balug-test.lists.balug.org/... ... well, it actually wants to download it, and downloads it fine with the correct data. So, yeah, odd, that bit of postorius works all the way through Apache and to client ... haven't, however, found other parts that make it through successfully like that while this issue is otherwise still present. And yes, did also do strace(1) data gathering on that too ... haven't yet isolated why that works and (most of the rest) doesn't - but may use significantly different interface/component(s) under the covers.
Anyway, that's where things presently stand. Still working to (isolate and) fix the issue.
Meantime, did also do some updates on the main page, notably: https://www.balug.org/#Lists So folks at least have a clue/information about that (and workarounds as feasible).
I'm guestimating there's like maybe some lock or state file that didn't get properly cleared or reset, or some subtle corruption or the like - and something along those lines is what's causing the issue. Don't know for sure, but given circumstances, that seems possible/probable. Also possible (but perhaps not as likely) there was some subtle latent defect, as I'd not rebooted in some moderate while, and the (crash and) reboot exposed that issue, where it wasn't seen before that. But looking over the (re)boot history, that doesn't seem most probable. Let's see, peeking at that again ... (and times UTC / GMT0), let's see, the not quite latest (Aug 21 06:20) was the crash and subsequent boot, and at least all of quite a number immediately before that were regular normal reboots. And the one I did after that was another reboot just to see if that might happen to clear the issue/error (no luck on that). $ { who -H | head -n 1; who -r /var/log/wtmp | tac; } | head -n 20 NAME LINE TIME COMMENT run-level 5 Aug 21 08:57 run-level Aug 21 08:56 run-level 5 Aug 21 06:20 run-level 5 Aug 19 07:23 run-level Aug 19 07:22 run-level 5 Jul 30 15:13 run-level Jul 30 14:50 run-level 5 Jul 28 00:10 run-level 5 Jul 6 14:45 run-level 5 Jul 5 09:02 run-level Jul 5 09:01 run-level 5 Jul 5 03:31 run-level Jul 5 03:30 run-level 5 Jun 23 00:03 run-level 5 Jun 18 15:44 run-level 5 Jun 17 08:16 run-level 5 Jun 17 05:53 run-level 5 May 26 22:58 run-level 5 May 26 19:44