Quoting Rick Moen rick@linuxmafia.com:
Quoting Michael Paoli (Michael.Paoli@cal.berkeley.edu):
I did manage to retrieve a fair chunk of older BALUG history/archives.
Thanks to The Internet Archive, I got what would seem to be much of the older mail list materials. Covering from approximately 1997-01-27 through approximately 2001-02-11. We have the list materials from 2001-06-15 going forward.
Impressive.
Thanks :-) Was mostly a matter of: * Finding/saving/noting the old URLs * Catching the material on The Internet Archive and/or mirror(s) thereof when it was available * Some available time to actually get to it (among many other BALUG todo tasks) * and some wee bits of scripting to retrieve the stuff (e.g. over 4,000 individual messages - and web pages from the old balug-talk archive)
Interestingly enough, one could say CABAL lead me to much of it. It was an old broken link to a much earlier BALUG list item about how the CABAL got started that was the first key into my tracking down the old URLs for the older mailing list archive locations ... and that also lead me to all the various older lists we once had, and their archive locations.
If you have (or can get) the mbox files, it's possible to rebuild the current mailman archives to present the mailing list as a seamless whole. The offsetting disadvantage (minor) would be that the URLs of currently archived postings would change.
It's likely still feasible to extract/retrieve mbox format files for the more recent archive stuff, but that's probably infeasible for the much older stuff ... though it may be feasible to effectively cobble together a pretty good fakery of mbox format, from list archive contents ... if we had any particular need/reason to do that. The older list archives don't have e-mail address munging enabled ... so ... the at sign (@) is unmunged. Where such munging is in place, it's not so feasible to reconstruct "originals" or good approximations, as: s/@/ at /g;s/ at /@/g isn't an identity transformation. And if you're looking at a munged version of that line, try this: $ echo 'cy9ALyBhdCAvZztzLyBhdCAvQC9nCg==' | mimencode -u Also, we couldn't stitch all the lists together seamlessly. Not only is there a bit of a gap and difference in list software and archive format that was used at the time, but also, the number of lists and their names changed. About the only two that are relatively contiguous in name and purpose are the balug-talk and balug-announce lists. In the range of approximately 1997-01-27 through approximately 2001-02-11 we had these lists: advocacy-talk balug-announce balug-talk rc5-crack webdevelopers whereas we presently have: balug-admin balug-announce balug-talk
I can describe how to go about that, if interested. Might be best by
I'll certainly keep it in mind.
If your regex-fu is better than mine, you might be able to spot those
My regex-fu (and perl) is pretty good. I was thinking if I can find someone who saved *all* their e-mail, covering the "gap" period(s), I could do up a perl script that would pull out the BALUG list e-mails.
I'll probably also get to exercise some of my perl and/or regex-fu on the stuff extracted from the Internet Archive and/or its mirrors. I'll keep copy, as I retrieved them, ... but those contain some JavaScript code to adjust references ... e.g. so that links go not to the present day Internet, but go to the Internet Archive materials - generally at/around the same time ... as often the original Internet Web URLs point to stuff that's no longer present, or is quite substantially different from the earlier content. Anyway, ... will want to tweak those to go to our retrieved copies, rather than bang on the Internet Archive again. The retrieved copies do also included in comments added by the Internet Archive when they were archived, and when they were retrieved, ... I'd likely want to retain those comments.
Anyway, I/we will likely make these older materials available (at least what we've got or been able to retrieve thus far) in the not-too-horribly-distant future.
I very much respect and admire this effort: It's all too common for the new Web guys to just discard the entire group's history up to that point -- and a significant batch of work to later correct that error.
Thanks. I think having the history there (or at least getting it back) makes a site/organization/(L)UG more credible, and provides a much better sense of history and origins. I also tend to think nasty rough transitions - particularly those that break existing functionality, or require end-users to take actions or repeat actions they've taken before (e.g. resubscribe to lists and set all their preferences again) generally turn people off and away ... so hopefully we can keep any nasty rough changes to the minimum feasible. Certainly since BALUG has had some rough transitions here and there in the past, I tend to think that also tends to reduce the tolerance of folks for additional rough transitions.
I think also, at this point, I've retrieved all the older content from the Internet Archive that does not exist at all and cannot possibly be retrieved/recovered from our existing web site. There's still a fair bunch of stuff to be extracted/retrieved from our existing web site ... once some of the broken bits on it get sufficiently fixed, ... and also some work to be done when we're ready to move the existing mailman stuff to a new host.