[Balug-Admin] BALUG history/archives

Michael Paoli Michael.Paoli@cal.berkeley.edu
Thu Nov 15 06:16:42 PST 2007


Quoting Rick Moen <rick@linuxmafia.com>:

> Quoting Michael Paoli (Michael.Paoli@cal.berkeley.edu):
>
> > I did manage to retrieve a fair chunk of older BALUG history/archives.
> >
> > Thanks to The Internet Archive, I got what would seem to be much
> > of the older mail list materials.  Covering from approximately
> > 1997-01-27 through approximately 2001-02-11.  We have the list materials
> > from 2001-06-15 going forward.
>
> Impressive.

Thanks :-)  Was mostly a matter of:
* Finding/saving/noting the old URLs
* Catching the material on The Internet Archive and/or mirror(s) thereof
  when it was available
* Some available time to actually get to it (among many other BALUG todo
  tasks)
* and some wee bits of scripting to retrieve the stuff (e.g. over 4,000
  individual messages - and web pages from the old balug-talk archive)

Interestingly enough, one could say CABAL lead me to much of it.  It was
an old broken link to a much earlier BALUG list item about how the CABAL
got started that was the first key into my tracking down the old URLs
for the older mailing list archive locations ... and that also lead me
to all the various older lists we once had, and their archive locations.

> If you have (or can get) the mbox files, it's possible to rebuild the
> current mailman archives to present the mailing list as a seamless
> whole.  The offsetting disadvantage (minor) would be that the URLs of
> currently archived postings would change.

It's likely still feasible to extract/retrieve mbox format files for the
more recent archive stuff, but that's probably infeasible for the much
older stuff ... though it may be feasible to effectively cobble together
a pretty good fakery of mbox format, from list archive contents ... if
we had any particular need/reason to do that.  The older list archives
don't have e-mail address munging enabled ... so ... the at sign (@) is
unmunged.  Where such munging is in place, it's not so feasible to
reconstruct "originals" or good approximations, as:
s/@/ at /g;s/ at /@/g
isn't an identity transformation.
And if you're looking at a munged version of that line, try
this:
$ echo 'cy9ALyBhdCAvZztzLyBhdCAvQC9nCg==' | mimencode -u
Also, we couldn't stitch all the lists together seamlessly.  Not only
is there a bit of a gap and difference in list software and archive
format that was used at the time, but also, the number of lists and
their names changed.  About the only two that are relatively contiguous
in name and purpose are the balug-talk and balug-announce lists.  In
the range of approximately 1997-01-27 through approximately 2001-02-11
we had these lists:
advocacy-talk
balug-announce
balug-talk
rc5-crack
webdevelopers
whereas we presently have:
balug-admin
balug-announce
balug-talk

> I can describe how to go about that, if interested.  Might be best by

I'll certainly keep it in mind.

> If your regex-fu is better than mine, you might be able to spot those

My regex-fu (and perl) is pretty good.  I was thinking if I can find
someone who saved *all* their e-mail, covering the "gap" period(s), I
could do up a perl script that would pull out the BALUG list e-mails.

I'll probably also get to exercise some of my perl and/or regex-fu on
the stuff extracted from the Internet Archive and/or its mirrors.
I'll keep copy, as I retrieved them, ... but those contain some
JavaScript code to adjust references ... e.g. so that links go not to
the present day Internet, but go to the Internet Archive materials -
generally at/around the same time ... as often the original Internet
Web URLs point to stuff that's no longer present, or is quite
substantially different from the earlier content.  Anyway, ... will
want to tweak those to go to our retrieved copies, rather than bang on
the Internet Archive again.  The retrieved copies do also included in
comments added by the Internet Archive when they were archived, and
when they were retrieved, ... I'd likely want to retain those comments.

> > Anyway, I/we will likely make these older materials available (at least
> > what we've got or been able to retrieve thus far) in the
> > not-too-horribly-distant future.
>
> I very much respect and admire this effort:  It's all too common for the
> new Web guys to just discard the entire group's history up to that
> point -- and a significant batch of work to later correct that error.

Thanks.  I think having the history there (or at least getting it back)
makes a site/organization/(L)UG more credible, and provides a much
better sense of history and origins.  I also tend to think nasty rough
transitions - particularly those that break existing functionality, or
require end-users to take actions or repeat actions they've taken
before (e.g. resubscribe to lists and set all their preferences again)
generally turn people off and away ... so hopefully we can keep any
nasty rough changes to the minimum feasible.  Certainly since BALUG has
had some rough transitions here and there in the past, I tend to think
that also tends to reduce the tolerance of folks for additional rough
transitions.

I think also, at this point, I've retrieved all the older content from
the Internet Archive that does not exist at all and cannot possibly be
retrieved/recovered from our existing web site.  There's still a fair
bunch of stuff to be extracted/retrieved from our existing web site ...
once some of the broken bits on it get sufficiently fixed, ... and also
some work to be done when we're ready to move the existing mailman stuff
to a new host.



More information about the BALUG-Admin mailing list