At 02:53 AM 10/2/2009, Joerg Bruehe wrote:
>Hi Bennett, all!
>Bennett Haselton wrote:
> > At 08:24 AM 9/25/2009, Dan Nelson wrote:
> >> In the last episode (Sep 25), Bennett Haselton said:
> >> > I have a script that runs several times in the evening, and on each
> >> run it
> >> > adds several thousand entries to a table.
> >> >
> >> > On the first run, it adds the entries rather slowly. But then
> on all
> >> > subsequent runs (usually about a minute or two later), the many
> >> inserts go
> >> > a lot faster. This is true regardless of how many entries are
> added by
> >> > each run -- whether the first and second run both add 50,000 or the
> >> first
> >> > and second run both add 10,000, the first run goes slowly and the
> >> second
> >> > one goes fast. But by the following evening, the first run is
> back to
> >> > going slowly again.
> >> >
> >> > It's as if in the minute or two following the first run of the
> >> > MySQL catches its breath and realizes, hey, that table is getting a
> >> lot of
> >> > entries added to it, so it waves some magic dust so that the next
> >> time I
> >> > add a lot of entries, it goes a lot faster. (Hope I'm not losing
> >> anybody
> >> > with the technical terminology here.) Then by the next evening the
> >> > optimization parameter has exp^W^W^W^W the fairy dust has worn off.
> >> More likely, this is a relatively unused table, and the first batch of
> >> inserts pulls most of the index and some of the table data into RAM,
> >> which
> >> makes for much faster lookups on the next run. What do top and iostat
> >> stats
> >> show on both runs? I'd expect heavy disk usage and little CPU on the
> >> first
> >> run, and light disk and heavier CPU usage on the second.
> > That's interesting, I can look at that next time I try it. But if
> > that's the case, wouldn't the first run go slowly at first, but then
> > pick up speed once all of the indexes etc. have been pulled into
> > memory? Because that's not what I'm seeing -- if I insert 50,000
> in the
> > first run, it's slow all the way through, but then the second
> 50,000 get
> > inserted quickly.
>Your "fairy dust" is called "access pattern", evaluated by a LRU or
>Don't forget you may have caching on two levels: database and operating
>system. Both have their own cache aging mechanisms.
>The details about caching and its effects will vary by the table handler
>you are using, MyISAM structures and policies definitely from InnoDB ones.
>Even if MySQL would not cache data and index pages, they would still
>reside in the operating system's file I/O cache, so the next access to
>them will be faster than the first one - regardless whether you read
>them or modify them.
>However, sooner or later they will be removed from all caches because
>they are not accessed until the next evening, whereas other pages were
>accessed and needed space in RAM.
>(Here, I ignore the case of a RAM which is larger than all data accessed
>for a day, it is too unlikely.)
>In the evening, when your job is run again, this starts anew.
I understand that, but here's the problem:
If the speed increase were just the result of values being recently
accessed, then the speed increase should be a function of the number
of inserts that I've already done. So if I insert 10,000 rows in one
script run and then 10,000 rows in a second script run, and the
second script run is a lot faster, then the first 10,000 inserts were
enough to optimize everything. So that should mean if I do 50,000
inserts in a single script run, then the first 10,000 inserts should
be enough to speed everything up.
But that's not what I'm seeing. What I'm seeing is that if I do
10,000 inserts on the first run and 10,000 on the second, then the
first run is slow and the second run is fast. On the other hand if I
do 50,000 inserts on the first run and 50,000 on the second, then the
entire first run is slow and the entire second run is fast.
In any case, is there anything I can do to force MySQL to manually
pre-optimize the entire table (even if it takes some time to do so,
reading indexes into memory or whatever), other than kludgy solutions
like doing a dummy insert of several thousand rows and then deleting them?