List:General Discussion« Previous MessageNext Message »
From:Bennett Haselton Date:October 2 2009 7:28pm
Subject:Re: many-inserts go faster the second time
View as plain text  
At 02:53 AM 10/2/2009, Joerg Bruehe wrote:
>Hi Bennett, all!
>
>
>Bennett Haselton wrote:
> > At 08:24 AM 9/25/2009, Dan Nelson wrote:
> >> In the last episode (Sep 25), Bennett Haselton said:
> >> > I have a script that runs several times in the evening, and on each
> >> run it
> >> > adds several thousand entries to a table.
> >> >
> >> > On the first run, it adds the entries rather slowly.  But then 
> on all
> >> > subsequent runs (usually about a minute or two later), the many
> >> inserts go
> >> > a lot faster.  This is true regardless of how many entries are 
> added by
> >> > each run -- whether the first and second run both add 50,000 or the
> >> first
> >> > and second run both add 10,000, the first run goes slowly and the
> >> second
> >> > one goes fast.  But by the following evening, the first run is 
> back to
> >> > going slowly again.
> >> >
> >> > It's as if in the minute or two following the first run of the 
> script,
> >> > MySQL catches its breath and realizes, hey, that table is getting a
> >> lot of
> >> > entries added to it, so it waves some magic dust so that the next
> >> time I
> >> > add a lot of entries, it goes a lot faster.  (Hope I'm not losing
> >> anybody
> >> > with the technical terminology here.) Then by the next evening the
> >> > optimization parameter has exp^W^W^W^W the fairy dust has worn off.
> >>
> >> More likely, this is a relatively unused table, and the first batch of
> >> inserts pulls most of the index and some of the table data into RAM,
> >> which
> >> makes for much faster lookups on the next run.  What do top and iostat
> >> stats
> >> show on both runs?  I'd expect heavy disk usage and little CPU on the
> >> first
> >> run, and light disk and heavier CPU usage on the second.
> >
> > That's interesting, I can look at that next time I try it.  But if
> > that's the case, wouldn't the first run go slowly at first, but then
> > pick up speed once all of the indexes etc. have been pulled into
> > memory?  Because that's not what I'm seeing -- if I insert 50,000 
> in the
> > first run, it's slow all the way through, but then the second 
> 50,000 get
> > inserted quickly.
>
>Your "fairy dust" is called "access pattern", evaluated by a LRU or
>similar policy.
>
>Don't forget you may have caching on two levels: database and operating
>system. Both have their own cache aging mechanisms.
>The details about caching and its effects will vary by the table handler
>you are using, MyISAM structures and policies definitely from InnoDB ones.
>
>Even if MySQL would not cache data and index pages, they would still
>reside in the operating system's file I/O cache, so the next access to
>them will be faster than the first one - regardless whether you read
>them or modify them.
>
>However, sooner or later they will be removed from all caches because
>they are not accessed until the next evening, whereas other pages were
>accessed and needed space in RAM.
>(Here, I ignore the case of a RAM which is larger than all data accessed
>for a day, it is too unlikely.)
>In the evening, when your job is run again, this starts anew.

I understand that, but here's the problem:

If the speed increase were just the result of values being recently 
accessed, then the speed increase should be a function of the number 
of inserts that I've already done.  So if I insert 10,000 rows in one 
script run and then 10,000 rows in a second script run, and the 
second script run is a lot faster, then the first 10,000 inserts were 
enough to optimize everything.  So that should mean if I do 50,000 
inserts in a single script run, then the first 10,000 inserts should 
be enough to speed everything up.

But that's not what I'm seeing.  What I'm seeing is that if I do 
10,000 inserts on the first run and 10,000 on the second, then the 
first run is slow and the second run is fast.  On the other hand if I 
do 50,000 inserts on the first run and 50,000 on the second, then the 
entire first run is slow and the entire second run is fast.

In any case, is there anything I can do to force MySQL to manually 
pre-optimize the entire table (even if it takes some time to do so, 
reading indexes into memory or whatever), other than kludgy solutions 
like doing a dummy insert of several thousand rows and then deleting them?

         -Bennett 

Thread
many-inserts go faster the second timeBennett Haselton25 Sep
Re: many-inserts go faster the second timeDan Nelson25 Sep
  • Re: many-inserts go faster the second timeBennett Haselton25 Sep
Re: many-inserts go faster the second timeJoerg Bruehe2 Oct
  • Re: many-inserts go faster the second timeBennett Haselton2 Oct
Re: many-inserts go faster the second timeWm Mussatto2 Oct
Re: many-inserts go faster the second timeJoerg Bruehe6 Oct