[BALUG-Admin] BALUG archive merges/resurections? ...

Michael Paoli Michael.Paoli@cal.berkeley.edu
Wed Sep 20 18:45:38 PDT 2017

[posting to more fitting BALUG-Admin]

> From: "Rick Moen" <rick@linuxmafia.com>
> To: balug-test@temp.balug.org
> Subject: Re: [BALUG-Test] Oooh, ... archive tests ... :-)
> Date: Wed, 20 Sep 2017 15:18:59 -0700

> Quoting aaronco36 (aaronco36@SDF.ORG):
>> Am wondering what the TL;DR of all this is, i.e., the summarized
>> outcome and current status of all Michael P's efforts here?
> I think he means he successfully used balug-test as a test case (pun
> semi-intended) for import of mail to the mailing list's mbox and
> generation of the archive.  This was to gain confidence that doing
> likewise to the main mailing lists has a good plan.
> For balug-talk in particular (maybe some others), Michael has in mind to
> attempt to assemble a more-complete history, adding posting salvaged
> from other sources such as subscribers' saved mail, and merging that
> into the mailing list's mbox as a basis for running
> $MAILMAN_HOME/bin/arch to make a new pipermail archive.  Seems
> worthwhile, but doubtless it'll take some time, and he might have to
> bodge headers on some of the saved mail to make it work.  (You ask
> subscribers whether they have old postings in mbox format, and in my
> experience they send you the damnedest random formats, disregarding the
> qualifier about 'mbox' because they didn't understand it and weren't
> smart enough to ask.  I experienced this when sf-lug mailing list
> subscribers supplied backfill material for that archive.  A lot of what
> I got was not at all what I asked.)
> Sufficent scripting-fu can, in my experience, reassemble an mbox that
> $MAILMAN_HOME/bin/arch will parse correctly.  Just make sure that you
> keep a safety copy of the most recent mbox _verified_ to parse right,
> so you don't burn your bridges, only your spare time.
> I believe backfilling the archives is the only task remaining.
> (Dreamhost "oops"-discarded quite a lot, over the years.)

So, yes, DreamHost.com lost a lot of BALUG's list archives over the years.
Fortunately I *do* have much of that ... but alas, most all of that that
I have ... mostly notably started saving regularly after DreamHost started
messing up - and quite repeatedly ... is in "cooked" format - as
mailman serves it up on the web, and alas, too, with @ --> " at " munging.

So, ... let's see ...
$ hostname && pwd -P && id
uid=10582(balug) gid=10582(balug) groups=10582(balug)
$ grep . */archive_date_ranges
So, ... see the (--) ranges?  Those are the "cooked" ranges I have.
See the ,'s between the ranges?  Those are DreamHost.com's repeated
f*ck ups[1].

$ du -sh balug-*/archive/
832K    balug-admin/archive/
664K    balug-announce/archive/
3.3M    balug-talk/archive/
And that's almost entirely gzip compressed stuff ... so quite a bit 'o
material ... but also with some significant gaps.

Anyway, not top of the priorities, but now also much more feasible
to merge that back into archives under mailman ... without the
encumberances of DreamHost being in the way.

I do also have some partial archives stuff from pre-DreamHost too.
But again, that's in "cooked" format (sucked from archive.org), and has
some gaps too (notably from when archive.org last grabbed that, and
when those lists went bye-bye and got replaced with new lists started
from scratch under DreamHost.com hosting).

Anyway, (more) stuff to do (archivist) among many BALUG tasks on the
quite long todo list ... for the volunteer(s) to do in their
copious <cough, cough> free time.

Oh, ... been a while since he mentioned it, but I have heard from
Jim Stockford, that he has "all his old mail" - notably from earlier
list traffic, etc. ... at least somewhere on some drive(s).  If I
ever get access to that - or can have Jim run relevant programs against
such ... might be able to well get all or most all our list messages,
and may also be able to much better unmunge messages where we may only
have munged versions at least thus far.

Can also put out more calls/mentions/requests for email
archivists/hoarders ... if we can get the data (to pull from, or
hand over programs to extract the BALUG stuff, and then get that),
then we could better fill in and complete our archives.  In the meantime,
... gaps.  :-/


1. More references on DreamHost's repeated f*ck ups, not a complete list,
but ... the support reference identifiers ... at least from when I started
explicitly tracking them ... until I totally gave up on that as being
quite futile:
$ cat balug/dreamhost_f\*ckups
#7395573        list archives broken again, please fix ASAP
DreamHost Support Ticket #6693834 (backup request)
DreamHost Support Ticket #6693826
DreamHost Support Ticket #6614786
DreamHost Support Ticket #6532542
DreamHost Support Ticket #5972304
DreamHost Support Ticket #5933596
DreamHost Support Ticket #5865872
DreamHost Support Ticket #5855851
[micpao 98044886]
[micpao 97906756]
[micpao 95718194]
[micpao 95607623]
[micpao 95543944]
[micpao 95510775]
[micpao 82295833]
[micpao 80000975]
[micpao 79685471]
[micpao 78726933]
[micpao 77934213]
[micpao 77127808]
[micpao 77039262]
[micpao 75952335]
[micpao 75625344]
[micpao 75481168]
[micpao 75473081]

More information about the BALUG-Admin mailing list