List:General Discussion« Previous MessageNext Message »
From:Wm Mussatto Date:October 2 2009 8:18pm
Subject:Re: many-inserts go faster the second time
View as plain text  

On Fri, October 2, 2009 12:28, Bennett Haselton wrote:
> At 02:53
AM 10/2/2009, Joerg Bruehe wrote:
>>Hi Bennett, all!
>>
>>
>>Bennett Haselton wrote:
>> > At 08:24 AM 9/25/2009, Dan Nelson wrote:
>>
>> In the last episode (Sep 25), Bennett Haselton said:
>> >> > I have a script that runs several times in the
evening, and on each
>> >> run it
>> >>
> adds several thousand entries to a table.
>> >>
>
>> >> > On the first run, it adds the entries
rather slowly.  But then
>> on all
>> >> >
subsequent runs (usually about a minute or two later), the many
>> >> inserts go
>> >> > a lot faster. 
This is true regardless of how many entries are
>> added by
>> >> > each run -- whether the first and second run both
add 50,000 or the
>> >> first
>> >> >
and second run both add 10,000, the first run goes slowly and the
>> >> second
>> >> > one goes fast.  But
by the following evening, the first run is
>> back to
>> >> > going slowly again.
>> >>
>
>> >> > It's as if in the minute or two following
the first run of the
>> script,
>> >> >
MySQL catches its breath and realizes, hey, that table is getting a
>> >> lot of
>> >> > entries added to
it, so it waves some magic dust so that the next
>> >>
time I
>> >> > add a lot of entries, it goes a lot
faster.  (Hope I'm not losing
>> >> anybody
>>
>> > with the technical terminology here.) Then by the next
evening the
>> >> > optimization parameter has
exp^W^W^W^W the fairy dust has worn off.
>> >>
>> >> More likely, this is a relatively unused table, and
the first batch
>> of
>> >> inserts pulls most
of the index and some of the table data into RAM,
>> >>
which
>> >> makes for much faster lookups on the next
run.  What do top and
>> iostat
>> >> stats
>> >> show on both runs?  I'd expect heavy disk usage and
little CPU on the
>> >> first
>> >> run,
and light disk and heavier CPU usage on the second.
>> >
>> > That's interesting, I can look at that next time I try it.
 But if
>> > that's the case, wouldn't the first run go
slowly at first, but then
>> > pick up speed once all of the
indexes etc. have been pulled into
>> > memory?  Because
that's not what I'm seeing -- if I insert 50,000
>> in the
>> > first run, it's slow all the way through, but then the
second
>> 50,000 get
>> > inserted quickly.
>>
>>Your "fairy dust" is called "access
pattern", evaluated by a LRU or
>>similar policy.
>>
>>Don't forget you may have caching on two levels:
database and operating
>>system. Both have their own cache
aging mechanisms.
>>The details about caching and its effects
will vary by the table handler
>>you are using, MyISAM
structures and policies definitely from InnoDB
>> ones.
>>
>>Even if MySQL would not cache data and index
pages, they would still
>>reside in the operating system's file
I/O cache, so the next access to
>>them will be faster than the
first one - regardless whether you read
>>them or modify
them.
>>
>>However, sooner or later they will be
removed from all caches because
>>they are not accessed until
the next evening, whereas other pages were
>>accessed and
needed space in RAM.
>>(Here, I ignore the case of a RAM which
is larger than all data accessed
>>for a day, it is too
unlikely.)
>>In the evening, when your job is run again, this
starts anew.
> 
> I understand that, but here's the
problem:
> 
> If the speed increase were just the result
of values being recently
> accessed, then the speed increase
should be a function of the number
> of inserts that I've already
done.  So if I insert 10,000 rows in one
> script run and then
10,000 rows in a second script run, and the
> second script run is
a lot faster, then the first 10,000 inserts were
> enough to
optimize everything.  So that should mean if I do 50,000
> inserts
in a single script run, then the first 10,000 inserts should
> be
enough to speed everything up.
> 
> But that's not what
I'm seeing.  What I'm seeing is that if I do
> 10,000 inserts on
the first run and 10,000 on the second, then the
> first run is
slow and the second run is fast.  On the other hand if I
> do
50,000 inserts on the first run and 50,000 on the second, then the
> entire first run is slow and the entire second run is fast.
> 
> In any case, is there anything I can do to force MySQL
to manually
> pre-optimize the entire table (even if it takes some
time to do so,
> reading indexes into memory or whatever), other
than kludgy solutions
> like doing a dummy insert of several
thousand rows and then deleting them?
> 
>         
-Bennett
Maybe not dummies but rather a subset of the first
group.

Thread
many-inserts go faster the second timeBennett Haselton25 Sep
Re: many-inserts go faster the second timeDan Nelson25 Sep
  • Re: many-inserts go faster the second timeBennett Haselton25 Sep
Re: many-inserts go faster the second timeJoerg Bruehe2 Oct
  • Re: many-inserts go faster the second timeBennett Haselton2 Oct
Re: many-inserts go faster the second timeWm Mussatto2 Oct
Re: many-inserts go faster the second timeJoerg Bruehe6 Oct