[posting to more fitting BALUG-Admin]
From: "Rick Moen" rick@linuxmafia.com To: balug-test@temp.balug.org Subject: Re: [BALUG-Test] Oooh, ... archive tests ... :-) Date: Wed, 20 Sep 2017 15:18:59 -0700
Quoting aaronco36 (aaronco36@SDF.ORG):
Am wondering what the TL;DR of all this is, i.e., the summarized outcome and current status of all Michael P's efforts here?
I think he means he successfully used balug-test as a test case (pun semi-intended) for import of mail to the mailing list's mbox and generation of the archive. This was to gain confidence that doing likewise to the main mailing lists has a good plan.
For balug-talk in particular (maybe some others), Michael has in mind to attempt to assemble a more-complete history, adding posting salvaged from other sources such as subscribers' saved mail, and merging that into the mailing list's mbox as a basis for running $MAILMAN_HOME/bin/arch to make a new pipermail archive. Seems worthwhile, but doubtless it'll take some time, and he might have to bodge headers on some of the saved mail to make it work. (You ask subscribers whether they have old postings in mbox format, and in my experience they send you the damnedest random formats, disregarding the qualifier about 'mbox' because they didn't understand it and weren't smart enough to ask. I experienced this when sf-lug mailing list subscribers supplied backfill material for that archive. A lot of what I got was not at all what I asked.)
Sufficent scripting-fu can, in my experience, reassemble an mbox that $MAILMAN_HOME/bin/arch will parse correctly. Just make sure that you keep a safety copy of the most recent mbox _verified_ to parse right, so you don't burn your bridges, only your spare time.
I believe backfilling the archives is the only task remaining. (Dreamhost "oops"-discarded quite a lot, over the years.)
So, yes, DreamHost.com lost a lot of BALUG's list archives over the years. :-( Fortunately I *do* have much of that ... but alas, most all of that that I have ... mostly notably started saving regularly after DreamHost started messing up - and quite repeatedly ... is in "cooked" format - as mailman serves it up on the web, and alas, too, with @ --> " at " munging.
So, ... let's see ... $ hostname && pwd -P && id balug-sf-lug-v2.balug.org /home/balug/e-mail_lists uid=10582(balug) gid=10582(balug) groups=10582(balug) $ grep . */archive_date_ranges balug-admin/archive_date_ranges:2005-03-18--2013-05-24,2014-01-11--2015-01-31,2015-05-01--2015-11-30,2016-02-18-- balug-announce/archive_date_ranges:2001-06-15--2013-07-12,2013-11-18--2014-09-30,2014-11-14--2015-01-31,2015-04-20--2015-11-30--2015-12-15,2016-02-15-- balug-talk/archive_date_ranges:2001-06-15--2013-07-13,2013-11-09--2014-10-19,2014-10-22--2015-01-31,2015-04-06--2015-12-05,2016-01-23-- $ So, ... see the (--) ranges? Those are the "cooked" ranges I have. See the ,'s between the ranges? Those are DreamHost.com's repeated f*ck ups[1].
$ du -sh balug-*/archive/ 832K balug-admin/archive/ 664K balug-announce/archive/ 3.3M balug-talk/archive/ $ And that's almost entirely gzip compressed stuff ... so quite a bit 'o material ... but also with some significant gaps.
Anyway, not top of the priorities, but now also much more feasible to merge that back into archives under mailman ... without the encumberances of DreamHost being in the way.
I do also have some partial archives stuff from pre-DreamHost too. But again, that's in "cooked" format (sucked from archive.org), and has some gaps too (notably from when archive.org last grabbed that, and when those lists went bye-bye and got replaced with new lists started from scratch under DreamHost.com hosting).
Anyway, (more) stuff to do (archivist) among many BALUG tasks on the quite long todo list ... for the volunteer(s) to do in their copious <cough, cough> free time.
Oh, ... been a while since he mentioned it, but I have heard from Jim Stockford, that he has "all his old mail" - notably from earlier list traffic, etc. ... at least somewhere on some drive(s). If I ever get access to that - or can have Jim run relevant programs against such ... might be able to well get all or most all our list messages, and may also be able to much better unmunge messages where we may only have munged versions at least thus far.
Can also put out more calls/mentions/requests for email archivists/hoarders ... if we can get the data (to pull from, or hand over programs to extract the BALUG stuff, and then get that), then we could better fill in and complete our archives. In the meantime, ... gaps. :-/
references/footnotes:
1. More references on DreamHost's repeated f*ck ups, not a complete list, but ... the support reference identifiers ... at least from when I started explicitly tracking them ... until I totally gave up on that as being quite futile: $ cat balug/dreamhost_f*ckups #7395573 list archives broken again, please fix ASAP DreamHost Support Ticket #6693834 (backup request) DreamHost Support Ticket #6693826 DreamHost Support Ticket #6614786 DreamHost Support Ticket #6532542 DreamHost Support Ticket #5972304 DreamHost Support Ticket #5933596 DreamHost Support Ticket #5865872 DreamHost Support Ticket #5855851 [micpao 98044886] [micpao 97906756] [micpao 95718194] [micpao 95607623] [micpao 95543944] [micpao 95510775] [micpao 82295833] [micpao 80000975] [micpao 79685471] [micpao 78726933] [micpao 77934213] [micpao 77127808] [micpao 77039262] [micpao 75952335] [micpao 75625344] [micpao 75481168] [micpao 75473081]