List:General Discussion« Previous MessageNext Message »
From:Van Date:May 18 1999 11:09pm
Subject:Re: Now, How to sed on Perl Vars?
View as plain text  
End of story:
What follows is the ugliest hack I've ever taken credit for, but, it
does the simple job I need it to do.
If any perl guru out there can improve on two things, I'll revisit it,
but, I have other things to move on to.
1.	Can we get the Date separated as it's own field?  Critical for
sorting.
2.	Also subject and sender would be nice.  
Now, the ugliness begins (remember be gentle folks, I'm not a perl
programmer):
<begin perl script>
#!/usr/bin/perl -w
#This was sent to me by Thimble Smith (tim@stripped) on a request I'd
sent
#to the MySQL users group to help finding a decent way of parsing a
mailbox
#file into a MySQL friendly import format.  Much Thanks to him, if I
pull
#this off.

use strict;

my @emails;
while (<>) {
    if (/^From / .. /^$/) {
        # inside the header

        # don't want the line terminator
        chomp;

        # ignore the blank line
        next unless length;

        /^From (.*)/                and do {
            # the From stuff is saved in the $1 variable

            # start a new e-mail
            push @emails, {
                'headers'       => {},  # for instant access by name
                'headers_list'  => [],  # in case you need the order
                'body'          => '',
            };

            # reset line number count
            $. = 1;

            next;
        };

        unless (@emails) {
            warn "non 'From' header before any 'From' line\n";
            next;
        }

        my $email = $emails[-1];

        /^([\S:][^:]*):(.*)$/       and do {
            push @{$email->{'headers_list'}}, $1;
            push @{$email->{'headers'}{$1}}, $2;


            next;
        };

        /^\s/                       and do {
            unless (@{$email->{'headers_list'}}) {
                warn "found continuation line before any headers\n";
                next;
            }

            # do you understand this next line?  ;-P
            $email->{'headers'}{$email->{'headers_list'}[-1]}[-1] .= $_;

            next;
        };

        warn "unrecognized line: $.: $_\n";
    }
    else {
        unless (@emails) {
            warn "body line before any headers\n";
            next;
        }

        my $email = $emails[-1];
        $email->{'body'} .= $_;
    }
}

# print them all out:

for (my $i = 0; $i < @emails; ++$i) {
    my $headers = $emails[$i]{'headers'};
    my $body = $emails[$i]{'body'};

    print "Message #", $i + 1, "";

    print "\tHeaders:";
    foreach (sort keys %$headers) {
        print "$_";
        foreach (@{$headers->{$_}}) {
            $_ =~ s#\t##g;
            print "$_:  ";
        }
    }

    print "\tBody:\n";

    $_ = $body;
    $_ =~ s#\n##g;
    $_ =~ s#\t##g;
    print "$_";

    print "\0";
}

exit 0;
<end perl script>
Now, I do
$cat nsmail/mbox | perl suckmail.PL > tmp.txt
Then:
mysql>load data infile '/home/vanboers/tmp.txt' into table nsmail fields
terminated by '\t' lines terminated by '\0';

The table schema is:
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| msgno   | varchar(12) | YES  |     | NULL    |       |
| headers | longtext    | YES  |     | NULL    |       |
| body    | blob        | YES  |     | NULL    |       |
+---------+-------------+------+-----+---------+-------+   
Thanks to Tim, and all who helped me with this kludge.  
Regards,
Van


-- 
=========================================================================
Linux rocks!!!   www.dedserius.com
=========================================================================
Thread
Per User Password AdministrationVan13 May
  • Per User Password AdministrationMichael Widenius13 May
  • Re: Per User Password AdministrationChristian Mack14 May
  • PHP Tracking of Variables (was) Per User Password AdministrationVan16 May
  • Re: PHP Tracking of Variables (was) Per User Password Administration (Never Mind)Van16 May
  • Re: PHP Tracking of Variables (was) Per User Password AdministrationSasha Pachev17 May
    • Re: PHP Tracking of Variables (was) Per User PasswordAdministrationShafir17 May
  • Re: PHP Tracking of Variables (was) Per User Password AdministrationVan17 May
  • Re: PHP Tracking of Variables (was) Per User PasswordAdministrationVan17 May
  • Re: PHP Tracking of Variables (was) Per User Password AdministrationSasha Pachev17 May
  • Parsing of Mail files into a DatabaseVan18 May
    • Parsing of Mail files into a DatabaseMichael Widenius22 May
    • Parsing of Mail files into a DatabaseMichael Widenius22 May
  • Now, How to sed on Perl Vars?Van18 May
    • Re: Now, How to sed on Perl Vars?Thimble Smith18 May
  • Re: Now, How to sed on Perl Vars?Van19 May
Re: Parsing of Mail files into a DatabaseVan18 May
  • Re: Parsing of Mail files into a DatabaseDaniel E. White18 May
Re: Parsing of Mail files into a DatabaseVan18 May
  • Re: Parsing of Mail files into a DatabaseDaniel E. White18 May
Re: Parsing of Mail files into a DatabaseVan18 May