In the last episode (Jun 13), jeffrey gage said:
> so the db is very simple - two fields - id(int) and word(varchar).
> I have the word column indexed with 8 characters of depth.
> so say we have a query that is very common:
> select count(*) where word = 'computers';
> I get maybe 150,000 results, but the count takes about 80-120 seconds
> when you give a new common word the old counts are cached and if
> executed again are instantaneous.
> So I am thinking that count(*) should be really fast, even with
> 150,000 records on an indexed column of single words, but it takes
> around two minutes for a query with many results.
Every key in your index is at most 12 bytes, and will probably average
8 bytes due to index compression. So to get a count that returns
150000, mysql will probably have to scan 150000*8 = 1.2 MB of index
Mysql uses 1024-byte key blocks (way too small imho), so that means at
least 1200 separate I/O operations to fetch the data. This isn't so
bad, unless your index is fragmented, which would mean quite a few
random I/Os. Even a random seek per index block won't add up to 120
As a comparison, I have a small database that logs all the traffic on a
high-traffic IRC channel. I have a "words" table, which is just word
and a unique ID, a "lines" table for the actual channel content, and a
"lines_words" linking table. A search for the word "the"
select count(*) from lines_words lw, words w
where lw.wordid=w.id and w.word="the";
returns the number 436616, and takes 14 seconds. The same query run
immediately afterwards (so the index is in memory) takes 8 seconds. So
that's 6 seconds to read from the index, and 8 seconds to calculate the
result (P6/200, 512MB RAM, FreeBSD 5, mysql 3.23.18). The lines_words
linking table has 23 million entries in it and the .MYI file is 200 MB.
You might want to try ktracing mysqld during the query. Run "ktrace -p
<mysqldpid>", run your query, run "ktrace -C", then run "kdump -T -m16
| grep pread". Index reads will show up as calls with the 3rd argument
0x400. Keep an eye on both the second argument (file offset), and the
elapsed time of the syscall (on the RET lines). If the offset jumps
around, you can optimize your index with myisamchk. If the offset
increments steadily but the elapsed time is still high (over 0.0001),
you might want to get a faster disk :)