[BALUG-Admin] Thanks!: Re: Michael Hubbard - need you (primary account user) to open support ticket with DreamHost: Re: [micpao 138399635] Please provide raw copy of list mbox archive files (we're migrating lists)

Rick Moen rick@linuxmafia.com
Sat Sep 16 21:42:25 PDT 2017


Quoting Michael Paoli (Michael.Paoli@cal.berkeley.edu):

> Thanks!
> 
> Looks like DreamHost assembled and made the files available by
> approximately 2017-09-16T16:10:15-0700.
> I've successfully retrieved 'em and run some basic sanity checks -
> looks like they are in fact what we're expecting from DreamHost
> (at least of what they still have and hadn't otherwise lost earlier).
> 
> I'll update when clear to "pull the plug" on DreamHost (notably after
> I complete a mostly queued set of final migration config changes,
> and then the last of the longest relevant DNS TTLs have expired).
> Anyway - fairly soon (guestimating "all clear" by about
> mid-week +- a bit - will update when it's all done 'n clear).

I've just now tried to do some remedial looking around for third-party
utilities to deal with problems like mbox files needing cleanup.
I've been ffeeling some sense of loss, in that I'm _sure_ Marc Merlin
pointed me to a bunch of such tools years ago, but I failed to capture
that knowledge.

But not quite so fast.  Lookie here:

rick@linuxmafia]
/usr/local/src/rickstuff $ cd svlug
[rick@linuxmafia]
/usr/local/src/rickstuff/svlug $ ls -al
total 27516
drwxrwsr-x 10 rick rick     4096 Sep  3 15:55 .
drwsr-sr-x 27 rick rick     4096 Mar 25 01:38 ..
drwxr-sr-x  3 root rick     4096 May 28  2015 etc
-rw-r--r--  1 rick rick     5880 Jun 22  2009 marc-merlin-mailman-scripts.tar.gz
drwxrwsr-x  2  469  469     4096 Mar 16  2012 mboxes
-rw-r--r--  1 rick rick 27614753 Sep  3 15:55 svlug-rosters
-rw-------  1 rick rick    27960 Jan  5  2016 svlug-rosters~.
-r-xr-x---  1 rick rick     3590 Jun 22  2009 svlug.svlug.org-backup-scripts.tar.gz
-r-xr-x---  1 rick rick   365822 Jun 22  2009 svlug.svlug.org-usr.local.sbin.tar.gz
-r-xr-x---  1 rick rick    58947 Jul 19  2009 svlug.svlug.org-var.lib.scr.tar.gz
-r-xr-x---  1 rick rick     5154 Jun 22  2009 svlug.svlug.org-var.www.bin.tar.gz
drwxr-sr-x  2 root rick     4096 May 28  2015 usr-local-sbin
drwxr-sr-x  3 root rick     4096 May 28  2015 var-local-mailman-backup
drwxr-sr-x  3 root rick     4096 May 28  2015 var-local-mailman-bin
drwxr-sr-x  2 root rick     4096 May 28  2015 var-local-mailman-cron
drwxr-sr-x  3 root rick     4096 May 28  2015 var-local-scr
drwxrwsr-x  2 rick rick     4096 Apr 30  2016 www
[rick@linuxmafia]
/usr/local/src/rickstuff/svlug $


Oh, joy.  Looks like I did something useful in the way of looking ahead
to situations just like this one.

/usr/local/src/rickstuff/svlug $ cp marc-merlin-mailman-scripts.tar.gz /tmp
[rick@linuxmafia]
/usr/local/src/rickstuff/svlug $ cd /tmp
[rick@linuxmafia]
/tmp $ tar xvzf marc-merlin-mailman-scripts.tar.gz
marc-merlin-mailman-scripts/
marc-merlin-mailman-scripts/addusertolists
marc-merlin-mailman-scripts/checkalllists
marc-merlin-mailman-scripts/dumplistconfigs
marc-merlin-mailman-scripts/excludelists
marc-merlin-mailman-scripts/findemail
marc-merlin-mailman-scripts/findemailpattern
marc-merlin-mailman-scripts/findvauser
marc-merlin-mailman-scripts/listlists
marc-merlin-mailman-scripts/listsoutside
marc-merlin-mailman-scripts/mailman_force_settings
marc-merlin-mailman-scripts/mailman_setowner_settings
marc-merlin-mailman-scripts/README
marc-merlin-mailman-scripts/reconfigalllistsfromdump
marc-merlin-mailman-scripts/removevauser
marc-merlin-mailman-scripts/renameuser
marc-merlin-mailman-scripts/renamevauser
marc-merlin-mailman-scripts/resetlistconfigs
marc-merlin-mailman-scripts/savelistinfo
marc-merlin-mailman-scripts/listsconfig/
marc-merlin-mailman-scripts/crontab
[rick@linuxmafia]
/tmp $

Lots of good tricks and script snippets in there.  Here's one I loved 
when I discovered it.  (This is all stuff captured from lists.svlug.org,
by the way.)

The crontab file you see in there is the main system crontab file from
lists.svlug.org, and it includes:

14 5 * * 1-5 root       bash -c 'export IFS=" "; A=`/var/local/mailman/bin/scripts/resetlistconfigs 2>&1`; if [ z"$A" != z ]; then echo $A | Mail -s "~mailman/bin/scripts/resetlistconfigs output" owner-mailman; fi'

And what does resetlistconfigs do?

/tmp/marc-merlin-mailman-scripts $ cat resetlistconfigs
#!/bin/bash

cd ~mailman/lists/
for i in *
do
    ~mailman/bin/config_list -o - $i | tail +3  > /var/tmp/tmp_mm_settings.before
    ~mailman/bin/config_list -i ~mailman/bin/scripts/mailman_force_settings $i
    ~mailman/bin/config_list -o - $i | tail +3  > /var/tmp/tmp_mm_settings.after

    if ! diff -q /var/tmp/tmp_mm_settings.before /var/tmp/tmp_mm_settings.after &>/dev/null; then
        echo "Fixing config of list $i"
        diff -u0 /var/tmp/tmp_mm_settings.before /var/tmp/tmp_mm_settings.after
    fi
done
[rick@linuxmafia]
/tmp/marc-merlin-mailman-scripts $

You get the gist of that.  It parse text file mailman_force_settings as 
input to the config_list utility.  And the input file is thus the punch line:


[rick@linuxmafia]
/tmp/marc-merlin-mailman-scripts $ cat mailman_force_settings
reply_goes_to_list = 0
host_name = 'lists.svlug.org'
archive = 1
[rick@linuxmafia]
/tmp/marc-merlin-mailman-scripts $


Marc suspected well in advance that, eventually, SVLUG would put into 
a position to do harm some officer or other volunteer who'd insist on 
doing stupid things to the Mailman settings like force Reply-To munging, 
or attempt to make lists.svlug.org forge some other FQDN, or have a 
mailing list deliberately non-archived, so Marc set up a nightly cron job 
to un-do the idiot's damage.

The idiot eventually did arrive in the person of raving loon and drunk
President Paul Reiber, who indeed did attempt to force Reply-To munging
and was reputedly driven to distraction by the fact that his change 
kept reverting automatically shortly after he made it.


I'd be glad to give you a copy of the marc-merlin-mailman-scripts.tar.gz 
collection, but the _real_ prize I wanted to give you is cleanarch,
a python script Marc had in /var/local/mailman/bin .  File comments:

  """Clean up an .mbox archive file.

  The archiver looks for Unix-From lines separating messages in an mbox archive
  file.  For compatibility, it specifically looks for lines that start with
  "From " -- i.e. the letters capital-F, lowercase-r, o, m, space, ignoring
  everything else on the line.

  Normally, any lines that start "From " in the body of a message should be
  escaped such that a > character is actually the first on a line.  It is
  possible though that body lines are not actually escaped.  This script
  attempts to fix these by doing a stricter test of the Unix-From lines.  Any
  lines that start "From " but do not pass this stricter test are escaped with a
  > character.

  Usage: cleanarch [options] < inputfile > outputfile

  Options:
      -s n
      --status=n
          Print a # character every n lines processed

      -q / --quiet
          Don't print changed line information to standard error.

      -n / --dry-run
          Don't actually output anything.

      -h / --help
          Print this message and exit
  """

You need this.  I'm attaching a copy (GPLed by author).


-------------- next part --------------
#! /usr/bin/python

# Copyright (C) 2001-2003 by the Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

"""Clean up an .mbox archive file.

The archiver looks for Unix-From lines separating messages in an mbox archive
file.  For compatibility, it specifically looks for lines that start with
"From " -- i.e. the letters capital-F, lowercase-r, o, m, space, ignoring
everything else on the line.

Normally, any lines that start "From " in the body of a message should be
escaped such that a > character is actually the first on a line.  It is
possible though that body lines are not actually escaped.  This script
attempts to fix these by doing a stricter test of the Unix-From lines.  Any
lines that start "From " but do not pass this stricter test are escaped with a
> character.

Usage: cleanarch [options] < inputfile > outputfile
Options:
    -s n
    --status=n
        Print a # character every n lines processed

    -q / --quiet
        Don't print changed line information to standard error.

    -n / --dry-run
        Don't actually output anything.

    -h / --help
        Print this message and exit
"""

import re
import sys
import getopt
import mailbox

import paths
from Mailman.i18n import _

cre = re.compile(mailbox.UnixMailbox._fromlinepattern)

# From RFC 2822, a header field name must contain only characters from 33-126
# inclusive, excluding colon.  I.e. from oct 41 to oct 176 less oct 072.  Must
# use re.match() so that it's anchored at the beginning of the line.
fre = re.compile(r'[\041-\071\073-\0176]+')



def usage(code, msg=''):
    if code:
        fd = sys.stderr
    else:
        fd = sys.stdout
    print >> fd, _(__doc__)
    if msg:
        print >> fd, msg
    sys.exit(code)



def escape_line(line, lineno, quiet, output):
    if output:
        sys.stdout.write('>' + line)
    if not quiet:
        print >> sys.stderr, _('Unix-From line changed: %(lineno)d')
        print >> sys.stderr, line[:-1]



def main():
    try:
        opts, args = getopt.getopt(
            sys.argv[1:], 'hqns:',
            ['help', 'quiet', 'dry-run', 'status='])
    except getopt.error, msg:
        usage(1, msg)

    quiet = False
    output = True
    status = -1

    for opt, arg in opts:
        if opt in ('-h', '--help'):
            usage(0)
        elif opt in ('-q', '--quiet'):
            quiet = True
        elif opt in ('-n', '--dry-run'):
            output = False
        elif opt in ('-s', '--status'):
            try:
                status = int(arg)
            except ValueError:
                usage(1, _('Bad status number: %(arg)s'))

    if args:
        usage(1)

    lineno = 0
    statuscnt = 0
    messages = 0
    prevline = None
    while True:
        lineno += 1
        line = sys.stdin.readline()
        if not line:
            break
        if line.startswith('From '):
            if cre.match(line):
                # This is a real Unix-From line.  But it could be a message
                # /about/ Unix-From lines, so as a second order test, make
                # sure there's at least one RFC 2822 header following
                nextline = sys.stdin.readline()
                lineno += 1
                if not nextline:
                    # It was the last line of the mbox, so it couldn't have
                    # been a Unix-From
                    escape_line(line, lineno, quiet, output)
                    break
                fieldname = nextline.split(':', 1)
                if len(fieldname) < 2 or not fre.match(nextline):
                    # The following line was not a header, so this wasn't a
                    # valid Unix-From
                    escape_line(line, lineno, quiet, output)
                    if output:
                        sys.stdout.write(nextline)
                else:
                    # It's a valid Unix-From line
                    messages += 1
                    if output:
                        # Before we spit out the From_ line, make sure the
                        # previous line was blank.
                        if prevline is not None and prevline <> '\n':
                            sys.stdout.write('\n')
                        sys.stdout.write(line)
                        sys.stdout.write(nextline)
            else:
                # This is a bogus Unix-From line
                escape_line(line, lineno, quiet, output)
        elif output:
            # Any old line
            sys.stdout.write(line)
        if status > 0 and (lineno % status) == 0:
            sys.stderr.write('#')
            statuscnt += 1
            if statuscnt > 50:
                print >> sys.stderr
                statuscnt = 0
        prevline = line
    print >> sys.stderr, _('%(messages)d messages found')



if __name__ == '__main__':
    main()


More information about the BALUG-Admin mailing list