List:General Discussion« Previous MessageNext Message »
From:Gavin Brown Date:May 22 2002 5:37pm
Subject:Re: Loading massive data set from CSV
View as plain text  
----- Original Message -----
From: "Dan Harrington" <dan@stripped>
To: <mysql@stripped>
Sent: Wednesday, May 22, 2002 5:41 PM
Subject: Loading massive data set from CSV


> Greetings everyone,
>
> I have an ASCII CSV or Tab Delimited file that is
> roughly 3.5 gigabytes, and I want to load it into a mysql
> database so I can do some analysis.
>
> First of all, I'm wondering, is there anything I should be aware of,
> or worried about, size-wise?
>
> I know that I can't even look at the file using basic text functions
> in my Linux box like 'head' or 'split'
>
> Initially, I was thinking I'd use 'split' to break it into smaller chunks
> but split won't read it.
>
> I can't even use 'wc -l' to find out how many lines or records are in the
file.

I thought most linux utils was quite good at dealing with files ie only
reading what was needed into memory, but I suppose it depends on the
program.

But then again I have never tried to manage a 3.5 gig file :)

>
> There is a list of the fields in the file, so I know what my table should
> look like, but I don't want to crash the SQL server if its too large a
file,
> or something else like that.  I didn't know how big it was originally, so
I
> was just going to use phpMyAdmin to load the file through a web
browser....
> though I don't know if that will work either.  Is there a size limitation
> to HTTP-POST (I assume it uses that method to upload).

I wouldn't think that phpMyAdmin  / Apache (guessing at web server here)
would be able to handle such a big file through CGI.
It would depend on how php handles POST cgi - if like most porgrams it just
tries to read all of the uploaded file into memory then something is going
to keel over, either apache or php.

You might be able to load the file into mysql from the mysql client prog -
ie do a select into from (I can't remember the actual command , the one that
lets you load a db from a file) again this will depend on how the program
handles files.

If that doesn't work my advice would be to split the file up into managable
chucks.

I would say a few hundered megs at most, and try them one at a time.

Will 'grep' parse the filet - ie can you do "grep someWord myBigFile.txt" ?
If so is there anything "splitable" in the file - ie if the file is a log
file with a date field then you could try
 grep "Sun 5th May" myBigFile.txt > 5thMay.txt , etc.

Just out of curiosity where did you get the file from?

If grep won't look at it ;

Hows your C?

You could write a c prog to split the file ,  just reading and writing a few
thousand lines at a time.

If you don't know C then you could either, try and learn it (if you are a
programmer) or alternativley give me a shout and I I'll try and knock
something together for you. (I'm a bit rusty with my C though :)

I have no idea about the max size of data mysql can hold , I have never had
to deal with anything that was over a few hundred thousand records
unfortunately. Maybe someone else can shed light on that - I keep hearing
that mysql can handle a LOT of rows.

Hope some of this helps...

Gav Brown

>
> Comments?
>
> Thanks
> Dan
>
>
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
>
> To request this thread, e-mail <mysql-thread109865@stripped>
> To unsubscribe, e-mail
<mysql-unsubscribe-gav=impression.wox.org@stripped>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>

Thread
connecting with myodbcMir S Islam13 Mar
  • connecting with myodbcsinisa14 Mar
  • Re: Pre-release of MySQL 3.23.50Philip Molter23 Apr
    • Re: Pre-release of MySQL 3.23.50(Trond Eivind Glomsrød)23 Apr
  • Re: Pre-release of MySQL 3.23.50Unknown Sender23 Apr
    • Re: Pre-release of MySQL 3.23.50Michael Widenius15 May
      • Loading massive data set from CSVDan Harrington22 May
        • Re: Loading massive data set from CSVEberhard Lisse22 May
        • Re: Loading massive data set from CSVGavin Brown22 May
  • Download of sourceLuciano Barcaro29 Apr
    • Re: Download of sourcePaul DuBois29 Apr
  • Re: Download of sourceLuciano Barcaro29 Apr
Re: Pre-release of MySQL 3.23.50David Huxtable25 Apr