Hi Bennett, all,
sorry about the delay, I was somewhat busy.
Bennett Haselton wrote:
> At 02:53 AM 10/2/2009, Joerg Bruehe wrote:
>> Hi Bennett, all!
>> Bennett Haselton wrote:
>> > [[...]]
>> Your "fairy dust" is called "access pattern", evaluated by a LRU or
>> similar policy.
>> Don't forget you may have caching on two levels: database and operating
>> system. Both have their own cache aging mechanisms.
>> The details about caching and its effects will vary by the table handler
>> you are using, MyISAM structures and policies definitely from InnoDB
>> Even if MySQL would not cache data and index pages, they would still
>> reside in the operating system's file I/O cache, so the next access to
>> them will be faster than the first one - regardless whether you read
>> them or modify them.
>> However, sooner or later they will be removed from all caches because
>> they are not accessed until the next evening, whereas other pages were
>> accessed and needed space in RAM.
>> (Here, I ignore the case of a RAM which is larger than all data accessed
>> for a day, it is too unlikely.)
>> In the evening, when your job is run again, this starts anew.
> I understand that, but here's the problem:
> If the speed increase were just the result of values being recently
> accessed, then the speed increase should be a function of the number of
> inserts that I've already done. So if I insert 10,000 rows in one
> script run and then 10,000 rows in a second script run, and the second
> script run is a lot faster, then the first 10,000 inserts were enough to
> optimize everything. So that should mean if I do 50,000 inserts in a
> single script run, then the first 10,000 inserts should be enough to
> speed everything up.
That reasoning sounds valid, but I guess it is incomplete:
Which pages (data and index) would the first 10,000 row insert touch,
and which does the 50,000 row insert touch?
If your rows are ordered by some criteria (maybe not intentionally, just
by the way you get them), then it is possible that the first part of a
batch touches only part of the data pages, and so the last part doesn't
Yes, this is speculation. Also, I don't know the in-memory or on-disk
data structures good enough to prove my idea.
> But that's not what I'm seeing. What I'm seeing is that if I do 10,000
> inserts on the first run and 10,000 on the second, then the first run is
> slow and the second run is fast. On the other hand if I do 50,000
> inserts on the first run and 50,000 on the second, then the entire first
> run is slow and the entire second run is fast.
Also, are you sure there is no action at the end of your insert batch
run that finishes the accesses, and as long as this hasn't been done
some aspect is missing?
> In any case, is there anything I can do to force MySQL to manually
> pre-optimize the entire table (even if it takes some time to do so,
> reading indexes into memory or whatever), other than kludgy solutions
> like doing a dummy insert of several thousand rows and then deleting them?
If I knew one, I would tell you, but I have no knowledge there.
You might try a "SELECT *" or a "SELECT AVG(some_field)", but I won't
make any claim this helps.
I would not do any manipulation, the risk to miss the cleanup is simply
too high (and writing is more effort, reading should be enough I hope).
Joerg Bruehe, MySQL Build Team, Joerg.Bruehe@stripped
Sun Microsystems GmbH, Komturstraße 18a, D-12099 Berlin
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering Muenchen: HRB161028