List:General Discussion« Previous MessageNext Message »
From:Van Date:May 18 1999 2:20am
Subject:Most excellent work, Thimble (was) Parsing of Mail files into a Database
View as plain text  
Here's what resulted from:

bash$ cat nsmail/Inbox | perl suckmail.PL > tmp.txt

(the obvious perl hack should be below, courtesy of Tim)
Sorry for the long message, but, wanted to be accurate.  Obvious
previous 57 messages snipped along with 59, up.  Interestingly, my Inbox
shows only 23 messages.  And, Tim's script also accurately detected the
binary attachments that have been snipped.  Very nice hack.  
Requires MIME-tools and, it's requisites.......

<--begin output-->
>> Message #58
>> Headers:
>>>> Content-Type
      text/plain; charset=us-ascii
>>>> Date
      Mon, 17 May 1999 18:45:12 -0700
>>>> From
      Thimble Smith <tim@stripped>
>>>> In-Reply-To
      <3740B6B6.1D1FD290@stripped>; from Van on Mon, May 17, 1999
at 08:39:
18PM -0400
>>>> Message-ID
      <19990517184512.E25813@stripped>
>>>> Mime-Version
      1.0
>>>> Received
      from tim.Desert.NET (tim@stripped [207.182.32.20])  by tempe
(8.8.7/
8.8.7) with ESMTP id VAA28732 for <vanboers@stripped>; Mon,
17 May 1
999 21:45:25 -0400
      (from tim@localhost)  by tim.Desert.NET (8.8.8/8.8.8) id SAA26944
for vanb
oers@stripped; Mon, 17 May 1999 18:45:13 -0700 (MST) 
(envelope-from
 tim)
>>>> References
      <04d201bea0c2$4a2c2820$f22005cf@stripped>
<3740B304.123DFAC0@de
dserius.com> <19990517173257.C25813@stripped>
<3740B6B6.1D1FD290@stripped
>
>>>> Return-Path
      <tim@stripped>
>>>> Subject
      Re: Parsing of Mail files into a Database
>>>> To
      Van <vanboers@stripped>
>>>> X-Mailer
      Mutt 0.93.2
>>>> X-Mozilla-Status
      8013
>>>> X-POP3-Rcpt
      vanboers@tempe

>> Body:
On Mon, May 17, 1999 at 08:39:18PM -0400, Van wrote:
> > Using Perl to parse shouldn't be hard.  And you could easily store
> > the header info in a hash variable, and then use those values as
> > fields in your db table if you wanted it.
> I suck at Perl, Tim, but, ya know....  Maybe in this case you're right.
> I'd always thought hash was a fun thing, not a useful thing.....

I threw this together.  I haven't read the RFC for mail for a long
time, so this could have some bogus stuff in it!  Anyway, it'll be
good for a look through, maybe.


#!/usr/bin/perl -w

use strict;

my @emails;
while (<>) {
    if (/^From / .. /^$/) {
        # inside the header   

        # don't want the line terminator
        chomp;

        # ignore the blank line
        next unless length;

        /^From (.*)/                and do {
            # the From stuff is saved in the $1 variable

            # start a new e-mail
            push @emails, {
                'headers'       => {},  # for instant access by name
                'headers_list'  => [],  # in case you need the order
                'body'          => '',
            };

            # reset line number count
            $. = 1;

            next;
        };

        unless (@emails) {
            warn "non 'From' header before any 'From' line\n";
            next;
        }

        my $email = $emails[-1];

        /^([\S:][^:]*):(.*)$/       and do {
            push @{$email->{'headers_list'}}, $1;
            push @{$email->{'headers'}{$1}}, $2;

            next;
        };

        /^\s/                       and do {
            unless (@{$email->{'headers_list'}}) {
                warn "found continuation line before any headers\n";
                next;
            }

            # do you understand this next line?  ;-P
            $email->{'headers'}{$email->{'headers_list'}[-1]}[-1] .= $_;

            next;
        };

        warn "unrecognized line: $.: $_\n";
    }
    else {
        unless (@emails) {
            warn "body line before any headers\n";
            next;
        }

        my $email = $emails[-1];
        $email->{'body'} .= $_;
    }

--     }
}

# print them all out:

for (my $i = 0; $i < @emails; ++$i) {
    my $headers = $emails[$i]{'headers'};

    print ">> Message #", $i + 1, "\n";

    print ">> Headers:\n";
    foreach (sort keys %$headers) {
        print ">>>> $_\n";
        foreach (@{$headers->{$_}}) {
            print "     $_\n";
        }
    }

    print "\n";

    print ">> Body:\n";
    print $emails[$i]{'body'};

    print "\n";
}

exit 0;


> > Probably this isn't useful.  Sorry if I've wasted your time.  :|
> > If you want to use Perl but your having trouble with some syntax
> > or a pattern match (I don't have any idea if you use Perl or not),
> > I'd be glad to lend some help.
> Actually, was Tim.  You are actually quite useful quite often.  I may
> hit you up on this.
> Headin' to CPAN.  Long nite ahead.......

If you're going to be up late, you might want to find a Tom Petty
album to listen to.  It'll keep you in a good mood.

Tim
<--end output-->
Thanks, again, Tim.  I'll try to work with this.  
Van

=========================================================================
Linux rocks!!!   www.dedserius.com
=========================================================================
Thread
Most excellent work, Thimble (was) Parsing of Mail files into a DatabaseVan18 May