List:General Discussion« Previous MessageNext Message »
From:Michael Widenius Date:January 14 2000 11:18am
Subject:Re: newbe: populating a database
View as plain text  

>>>>> "Sasha" == Sasha Pachev <sasha@stripped> writes:

Sasha> "Yury V. Bukhman" wrote:
>> Hi all!
>> I am relatively new to MySQL: have been using it for a few months.  The
>> application I am developing must scan databases of flat text files
>> (about 10,000 - 100,000 entries in separate files or a single large
>> file) into a relational database.  I am writing perl programs that parse
>> the text files and use DBI to talk to MySQL and insert data into a set
>> of database tables.  For a large set of texts the process can take many
>> hours.  What are the ways to speed it up?  I lock my tables.  I do not
>> want to use multiple inserts, because a failure to insert one entry
>> should produce an error, but it should not prevent insertion of the
>> nearby entries. Would it help to drop all indexes before populating the
>> database and rebuild them afterwards?  Would INSERT DELAYED help?  Are
>> there any other tricks?  Is there a way to test the speed of INSERT
>> statements quickly?  I try to read manuals and mailing list archives,
>> but often my general ignorance gets in the way of fully understanding
>> them.  I would appreciate any help.
>> Yury

Sasha> - make sure that your data is clean before you do an insert -- that way
Sasha> you will never get an error trying to insert unless there is something
Sasha> wrong with your tables , connections, mysqld, etc ..

Sasha> - droping indeces will help quite a bit

The manual describes how you can drop indexes and create them very
fast later with isamchk..

Sasha> - not sure if INSERT DELAYED will be very much help if you are already
Sasha> locking the tables, and doing multiple inserts -- any thoughts, Monty?

If you are locking the table, neither INSERT DELAYED or multiple row
inserts will help that much.

Sasha> - run top as you are importing the data, and see how CPU time is
Sasha> distributed -- if your Perl script is using a significant portion of the
Sasha> CPU, re-write it in C or C++.

Sasha> - benchmarking is quite easy --  just time several thousand inserts and
Sasha> compute how many of them you do per second in your program.

To get more speed:

- If you have many indexes, increase the key buffer.
- If possible, use LOAD DATA INFILE.
- Create indexes at a later stage (see above)

newbe: populating a databaseYury V. Bukhman14 Jan
  • Re: newbe: populating a databaseSasha Pachev14 Jan
    • Re: newbe: populating a databaseMichael Widenius14 Jan
  • Re: newbe: populating a databaseJan Dvorak14 Jan
  • Re: newbe: populating a databaseYury V. Bukhman14 Jan