I did manage to retrieve a fair chunk of older BALUG history/archives.
Thanks to The Internet Archive, I got what would seem to be much of the older mail list materials. Covering from approximately 1997-01-27 through approximately 2001-02-11. We have the list materials from 2001-06-15 going forward. The gap from 2001-02-11 through 2001-06-15 might be "lost" or very hard to obtain ... unless we find someone who saved an e-mail archive that includes that. That gap is likely from when the Internet Archive made its last pass over the older mail list locations, and when those locations were last used. On 2001-06-15 the lists picked up starting at a new location.
There's still a fair bit of other material to retrieve. Some of the quite old web stuff (which can still probably be pulled from The Internet Archive - I've gotten much of that already, but have some other pieces to retrieve) ... and there's much of the more mid-timeframe range web content (e.g. meeting minutes and materials and lots of other useful/interesting stuff) ... from roughly 2003-09 through 2007-05 ... most of that is "trapped" behind some broken PHP/MySQL on our "legacy" site, ... there may also be some stuff that may be difficult to infeasible to find towards the early end of that range - most notably from the prior site from last pass of The Internet Archive through to the last changes on that older site before it went away and was replaced with the newer site.
Anyway, I/we will likely make these older materials available (at least what we've got or been able to retrieve thus far) in the not-too-horribly-distant future. Would be nice to also close in the gaps of thus far missing or not presently accessible materials ... at least as much as feasible, anyway.
Quoting Michael Paoli (Michael.Paoli@cal.berkeley.edu):
I did manage to retrieve a fair chunk of older BALUG history/archives.
Thanks to The Internet Archive, I got what would seem to be much of the older mail list materials. Covering from approximately 1997-01-27 through approximately 2001-02-11. We have the list materials from 2001-06-15 going forward.
Impressive.
If you have (or can get) the mbox files, it's possible to rebuild the current mailman archives to present the mailing list as a seamless whole. The offsetting disadvantage (minor) would be that the URLs of currently archived postings would change.
I can describe how to go about that, if interested. Might be best by telephone. (Basically, you cat together a composite mbox. Then, you run the $MAILMAN_HOME/bin/arch utility with appropriate options and parameters to build the archive. What then results usually has at least one bit of garbage at the end of the Web archive: This results from misparses of lines _within_ a message body that start with flush-left text "From " -- misparsing that as the beginning of a new message. You then have to track down that line in the mbox and fix it, e.g., by prefacing it with ">". Then, build the archive again. Repeat until clean.
If your regex-fu is better than mine, you might be able to spot those lines programmatically (as distinct from the same string at a legitimate beginning of a message's headers, in its role as the first few characters of an envelope header), and fix them before the _first_ build.
Anyway, I/we will likely make these older materials available (at least what we've got or been able to retrieve thus far) in the not-too-horribly-distant future.
I very much respect and admire this effort: It's all too common for the new Web guys to just discard the entire group's history up to that point -- and a significant batch of work to later correct that error.
Quoting Rick Moen rick@linuxmafia.com:
Quoting Michael Paoli (Michael.Paoli@cal.berkeley.edu):
I did manage to retrieve a fair chunk of older BALUG history/archives.
Thanks to The Internet Archive, I got what would seem to be much of the older mail list materials. Covering from approximately 1997-01-27 through approximately 2001-02-11. We have the list materials from 2001-06-15 going forward.
Impressive.
Thanks :-) Was mostly a matter of: * Finding/saving/noting the old URLs * Catching the material on The Internet Archive and/or mirror(s) thereof when it was available * Some available time to actually get to it (among many other BALUG todo tasks) * and some wee bits of scripting to retrieve the stuff (e.g. over 4,000 individual messages - and web pages from the old balug-talk archive)
Interestingly enough, one could say CABAL lead me to much of it. It was an old broken link to a much earlier BALUG list item about how the CABAL got started that was the first key into my tracking down the old URLs for the older mailing list archive locations ... and that also lead me to all the various older lists we once had, and their archive locations.
If you have (or can get) the mbox files, it's possible to rebuild the current mailman archives to present the mailing list as a seamless whole. The offsetting disadvantage (minor) would be that the URLs of currently archived postings would change.
It's likely still feasible to extract/retrieve mbox format files for the more recent archive stuff, but that's probably infeasible for the much older stuff ... though it may be feasible to effectively cobble together a pretty good fakery of mbox format, from list archive contents ... if we had any particular need/reason to do that. The older list archives don't have e-mail address munging enabled ... so ... the at sign (@) is unmunged. Where such munging is in place, it's not so feasible to reconstruct "originals" or good approximations, as: s/@/ at /g;s/ at /@/g isn't an identity transformation. And if you're looking at a munged version of that line, try this: $ echo 'cy9ALyBhdCAvZztzLyBhdCAvQC9nCg==' | mimencode -u Also, we couldn't stitch all the lists together seamlessly. Not only is there a bit of a gap and difference in list software and archive format that was used at the time, but also, the number of lists and their names changed. About the only two that are relatively contiguous in name and purpose are the balug-talk and balug-announce lists. In the range of approximately 1997-01-27 through approximately 2001-02-11 we had these lists: advocacy-talk balug-announce balug-talk rc5-crack webdevelopers whereas we presently have: balug-admin balug-announce balug-talk
I can describe how to go about that, if interested. Might be best by
I'll certainly keep it in mind.
If your regex-fu is better than mine, you might be able to spot those
My regex-fu (and perl) is pretty good. I was thinking if I can find someone who saved *all* their e-mail, covering the "gap" period(s), I could do up a perl script that would pull out the BALUG list e-mails.
I'll probably also get to exercise some of my perl and/or regex-fu on the stuff extracted from the Internet Archive and/or its mirrors. I'll keep copy, as I retrieved them, ... but those contain some JavaScript code to adjust references ... e.g. so that links go not to the present day Internet, but go to the Internet Archive materials - generally at/around the same time ... as often the original Internet Web URLs point to stuff that's no longer present, or is quite substantially different from the earlier content. Anyway, ... will want to tweak those to go to our retrieved copies, rather than bang on the Internet Archive again. The retrieved copies do also included in comments added by the Internet Archive when they were archived, and when they were retrieved, ... I'd likely want to retain those comments.
Anyway, I/we will likely make these older materials available (at least what we've got or been able to retrieve thus far) in the not-too-horribly-distant future.
I very much respect and admire this effort: It's all too common for the new Web guys to just discard the entire group's history up to that point -- and a significant batch of work to later correct that error.
Thanks. I think having the history there (or at least getting it back) makes a site/organization/(L)UG more credible, and provides a much better sense of history and origins. I also tend to think nasty rough transitions - particularly those that break existing functionality, or require end-users to take actions or repeat actions they've taken before (e.g. resubscribe to lists and set all their preferences again) generally turn people off and away ... so hopefully we can keep any nasty rough changes to the minimum feasible. Certainly since BALUG has had some rough transitions here and there in the past, I tend to think that also tends to reduce the tolerance of folks for additional rough transitions.
I think also, at this point, I've retrieved all the older content from the Internet Archive that does not exist at all and cannot possibly be retrieved/recovered from our existing web site. There's still a fair bunch of stuff to be extracted/retrieved from our existing web site ... once some of the broken bits on it get sufficiently fixed, ... and also some work to be done when we're ready to move the existing mailman stuff to a new host.