On 2004.06.03, Dan Nelson <dnelson@stripped> wrote:
> Consider multicolumn indexes; if you always "SELECT field1 from table
> where field2=blah", creating an index on (field2,field1) will let mysql
> bypass table lookups completely.
Our typical queries look something like:
SELECT level1, level2, level3, SUM(count1), SUM(count2)
WHERE level1 = 'value1'
AND level2 = 'value2'
AND level3 IS NULL
GROUP BY level1, level2, level3
In the real data, we have many more than 3 levels. Perhaps I should
have been including the count columns in the indexes as well -- hmm.
> > Also, joins on this table are miserable since the BLOB columns make
> > MySQL use tmp disk for sorting instead of keeping everything in
> > memory.
> Unless you're selecting those blob fields, I don't think mysql will
> keep them during the join operation. I could be wrong though.
The problem is that our longest running queries are this shape:
INSERT INTO destination_table SELECT
level1, level2, ... leveN, SUM(count1), SUM(count2) ...
-- no WHERE clause
GROUP BY level1 ... etc
source_merge_table is a MERGE table type that can union 12 or more
tables. (Once we go to 4.1, we can eliminate the MERGE tables and
use a derived table in the FROM clause with UNION ALL, but for now
in 4.0, we have to use MERGE tables.)
Yes, the BLOB columns are included in the SELECT clause. This forces
the query to go straight to tmpfile on disk.
> > I've got read_buffer_size and read_rnd_buffer_size cranked up to 64M,
> > which I'm hoping will make these full table scans do better -- I
> > could be wrong, though. Storage is not local disk but on
> > HBA-attached SAN.
> What's your I/O throughput (run "iostat -xcnz 2") during one of these
> full table scans? If you can get 100MB/sec, a scan of a 365MB file
> should take under 5 seconds.
I'll have to look at iostat next time the query runs. Thanks for the
reminder to look at iostat, duh.
> > As I said, obvious steps to take are to rework the schema and
> > introduce composite/conjoint tables where possible and to move those
> > BLOB columns out of the main fact table ... but even then, there
> > should be "optimal" settings for a DB that generally does full table
> > scans on 2M rows ... a lot of the data can be kept in that 10 GB of
> > memory, if I could only force MySQL to use it: those BLOB columns are
> > probably killing me.
> With 10GB of RAM, Solaris should be caching your entire table in
> memory. You will still have the overhead of mysql reading the data
> from the OS cache but you should still get good performance. Make
> soure you're not mounting your filesystem with the forcedirectio
> option, which will disable the OS cache.
Well, what seems to happen is the data gets read from the MyISAM table,
gets written back out to disk in the tmpdir, then when it's all done,
gets actually placed in the .MYD file -- it's the back-and-forthing to
disk that I presume is eating a lot of the time.
Not sure what can be done about making it not go straight to tmpdir with
a BLOB column in the SELECT clause, though. Probably nothing, in 4.0.
Dossy Shiobara mail: dossy@stripped
Panoptic Computer Network web: http://www.panoptic.com/
"He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)