List:General Discussion« Previous MessageNext Message »
From:Benjamin Pflugmann Date:June 25 1999 2:22am
Subject:Re: adding data is slow
View as plain text  
Hi.

On Thu, Jun 24, 1999 at 03:19:20PM -0700, jsd@stripped wrote:
> At 05:15 AM 6/24/99 , Sasha Pachev wrote:
> >> Dropping the indexes and recreating them after import is not a good
> >> solution as it takes about an hour to rebuild each index.
> >> 
> >> Selects are actually quite quick - it's just importing.  Is there any way
> >> to speed it up?
> >> 
> >
> >Drop all indeces before you import, then add them back when you are
> >done.
> 
[...]
> it's been running for almost an hour now and there is no end in sight.  and
> i need to do this for two indexes!  there's no way i can keep adding and
> removing the indexes every day.  surely there has to be a way to speed up
> indexing.  i am running mysqld with key_buffer=20M.

20MB key buffers do not fit the requirements of your indexes.

You wrote in your first mail:
> CREATE TABLE pageview (
>   url char(128) DEFAULT '' NOT NULL,
>   count mediumint(9) DEFAULT '0' NOT NULL,
>   date date DEFAULT '0000-00-00' NOT NULL,
>   site char(32) DEFAULT '' NOT NULL,
>   KEY urlidx (url),
>   KEY dateidx (date)
> );
> 
> I'm using it store web traffic, of course.  For each day, I store
> every URL requested and the number of requests.  Importing 30k rows
> when the table already has somewhere around 500k rows is incredibly
> slow - 40 minutes!  I load the data using the LOAD DATA INFILE command
> as recommended in the manual.

The problem is, with 500k rows and an indexes of 128 (+4?) bytes per
row your url index needs about 63MB. You key buffer should be about
70-80MB (considering the second index, too) to have a reasonable
buffer for your indexes for 500k rows. You have to increase that
proportially with your table growth.

Though, a better solution is to decrease the needed buffer size by
decreasing the size of your indexes. MySQL works fine with partial
indexes as long as they are still quite different for the field. For
example, the indexes for email fields in my databases are only

INDEX emailidx (email(10))

which makes INSERTs quite fast and even SELECTs are executed faster
because the index file is faster to load.

If you can do the same with your URLs depends on whether they
distinguish enough in the first letters (e.g. if they all start with
"http://your.domain.com..." you should not limit the index on the
first 10 characters or throw away the common part ;-)

If you have a lot of common URLs (e.g. one website I handle has only
about 50 different URLs, the rest is done with parameters to CGIs),
then I strongly second the suggestion of Daniel and you should
normalize your table, so you would only have an id for your URLs which
would result in an index size of only about 1MB for your table (and
also reduce the size of your database).

If none of that fits to your problem, give some more information about
the data in your table.

Bye,

        Benjamin.

Attachment: [application/pgp-signature]
Thread
adding data is slowJon Drukman24 Jun
  • Re: adding data is slowSasha Pachev24 Jun
    • Re: adding data is slowJon Drukman25 Jun
      • Re: adding data is slowBenjamin Pflugmann25 Jun
        • Re: adding data is slowJon Drukman25 Jun
        • Yet Another Optimization QuestionJon Drukman28 Jun
  • Re: adding data is slowDaniel Koch25 Jun
  • adding data is slowMichael Widenius29 Jun
Re: Yet Another Optimization QuestionKevin Smith29 Jun
Re: Yet Another Optimization QuestionJon Drukman29 Jun
  • Re: Yet Another Optimization QuestionBenjamin Pflugmann30 Jun
    • Re: Yet Another Optimization QuestionJon Drukman30 Jun
Re: Yet Another Optimization QuestionKevin Smith29 Jun
Re: Yet Another Optimization QuestionJon Drukman29 Jun
  • Re: Yet Another Optimization QuestionTõnu Samuel29 Jun
    • Re: Yet Another Optimization QuestionJon Drukman29 Jun
      • Re: Yet Another Optimization QuestionMichael Widenius30 Jun
        • Re: Yet Another Optimization QuestionJon Drukman30 Jun
RE: Yet Another Optimization QuestionNigel Parker30 Jun
  • RE: Yet Another Optimization QuestionMichael Widenius30 Jun
Re: Yet Another Optimization Question(David Sklar)1 Jul
RE: Yet Another Optimization Questiontony2 Jul
  • RE: Yet Another Optimization QuestionMichael Widenius2 Jul
  • slow joinsJon Drukman7 Oct