In the last episode (Jun 03), Dossy said:
> On 2004.06.03, Dan Nelson <dnelson@stripped> wrote:
> > Consider multicolumn indexes; if you always "SELECT field1 from
> > table where field2=blah", creating an index on (field2,field1) will
> > let mysql bypass table lookups completely.
> Our typical queries look something like:
> SELECT level1, level2, level3, SUM(count1), SUM(count2)
> FROM table
> WHERE level1 = 'value1'
> AND level2 = 'value2'
> AND level3 IS NULL
> GROUP BY level1, level2, level3
> In the real data, we have many more than 3 levels. Perhaps I should
> have been including the count columns in the indexes as well -- hmm.
Yes, that would help. Make sure your WHERE clause fields come first in
the index. Assuming you have an index on (lavel1,level2,level3) and
the number of matching rows is relatively low compared to the total
record volume, though, there's no reason mysql should be doing full
table scans here.
> > > Also, joins on this table are miserable since the BLOB columns
> > > make MySQL use tmp disk for sorting instead of keeping everything
> > > in memory.
> > Unless you're selecting those blob fields, I don't think mysql will
> > keep them during the join operation. I could be wrong though.
> The problem is that our longest running queries are this shape:
> INSERT INTO destination_table SELECT
> level1, level2, ... leveN, SUM(count1), SUM(count2) ...
> FROM source_merge_table
> -- no WHERE clause
> GROUP BY level1 ... etc
> source_merge_table is a MERGE table type that can union 12 or more
> tables. (Once we go to 4.1, we can eliminate the MERGE tables and
> use a derived table in the FROM clause with UNION ALL, but for now
> in 4.0, we have to use MERGE tables.)
> Yes, the BLOB columns are included in the SELECT clause. This forces
> the query to go straight to tmpfile on disk.
Since you're GROUPing, how to you pull the blob field? MIN, MAX, and
GROUP_CONCAT are the only aggregate functions I can think of that even
make sense on BLOB columns.
> Well, what seems to happen is the data gets read from the MyISAM
> table, gets written back out to disk in the tmpdir, then when it's
> all done, gets actually placed in the .MYD file -- it's the
> back-and-forthing to disk that I presume is eating a lot of the time.
> Not sure what can be done about making it not go straight to tmpdir
> with a BLOB column in the SELECT clause, though. Probably nothing,
> in 4.0.
Do you actually see a temp file being created? With tmp_table_size set
to 2gb, it shouldn't have to go to disk. Some more interesting data
would be the State column from "show processlist" during the query,
every 10 seconds or so.