List:General Discussion« Previous MessageNext Message »
From:Jigal van Hemert Date:July 9 2004 12:17pm
Subject:Re: Cost of joins?
View as plain text  
From: "Margaret MacDonald" <scratch65536@stripped>

> Is there a generally-accepted rule of thumb for estimating the
> performance cost of joins?   I can't find one even in Date, but
> intuitively it seems as though there must be one by now.

It's hard to estimate the cost of a join as such. The performance is
influenced by the combination of database lay-out and the query.
If you know the queries you will perform you can optimize the database and
the query itself.

The MySQL documentation contains a lot of articles with directions for
optimizations.

> I'm working on a large information system --probably tens of millions
> of records-- and am guessing that reads should outnumber writes by a
> factor of 10 at least.   High performance from the human user's point
> of view is an absolute requirement for success.
>
> Rather than store category and other  'flavoring' strings repetitively
> in the information records themselves, I've stored indexes into other
> tables.  But that means every lookup joins 3-7 tables.  How much am I
> paying for that?

If the number of 'flavouring' strings are relatively low you can also decide
to do the translation from categoryID to category name in the program you
write around your queries.

> Adding the flavoring strings themselves to each record would increase
> the overall storage requirement by some large fraction less than 50%.
> Since disk space is already relatively cheap, that shouldn't be a
> problem.  But some of the flavoring is guaranteed to change at an
> unpredictable rate,  which would mean repeatedly traversing every
> information record to update it.

Disk space is not your only concern here. Indices are smaller for integer
data (ID numbers) and for integer data MySQL often does not have to access
the database itself if it uses the index for that column (which saves you
disk reads).
Also, smaller data files are easier to cache by the OS.

The best thing will be to see to it that you keep your data as small as
possible.

Analyse the queries using EXPLAIN.

A smaller record set in each step will produce faster queries (see estimated
number of records).

See if you can use indexes for each column in your query. You can make
indexes on two or more columns!

If you don't have any "using filesort" or "using temporary" in the explain
chances are the query is blazingly fast.

Try to avoid columns with very little distinct values (e.g. sex:
male/female) if you use these in your query and MySQL guesses that it will
result in more than apporx. 30% of the records it will perform a full table
scan instead of using the index. If you use such a low cardinality column in
a combined index MySQL can use the index again.

If necessary use USE INDEX or FORCE INDEX to make sure the right index is
used.

Optimize where clauses yourself by using "column operator constant" (e.g.
`birthdate` < NOW() - INTERVAL 16 YEAR). The constant part may be an
expression as long as the result is a constant value. This is a lot faster
than calculating the expression for each record in the table (e.g.
`birthdate` + INTERVAL 16 YEAR < NOW()). Use precalculated values in a
seperate column if necessary to prevent MySQL from having to perform
calculations on every record for a WHERE condition.

The speed of your query also depends on the size of the resulting record set
and the operations you will perform on that. Sorting 10 relatively small
records is done in memory and will be very fast. Sorting 100.000 records
without an index will most likely require a temporary table and wil take
'forever'.

Read all you can find in the MySQL documentation and perhaps take a look at
Jeremy Zawodny's "High Performance MySQL".

Regards, Jigal.

Thread
Cost of joins?Margaret MacDonald9 Jul
  • Re: Cost of joins?Jigal van Hemert9 Jul
  • Re: Cost of joins?Martijn Tonies9 Jul
    • Re: Cost of joins?Alec.Cawley9 Jul
    • RE: Cost of joins?Lachlan Mulcahy12 Jul
      • Re: Cost of joins?Martijn Tonies12 Jul
      • Re: query gets count wrongGerald Taylor12 Jul
  • Re: Cost of joins?Alec.Cawley9 Jul
  • Re: Cost of joins?Jigal van Hemert9 Jul
    • Re: Cost of joins?Jochem van Dieten9 Jul
  • Re: Cost of joins?Martijn Tonies9 Jul
    • Re: Cost of joins?Alec.Cawley9 Jul
      • Re: Cost of joins?Jochem van Dieten9 Jul
    • Re: Cost of joins?(Michael Johnson)9 Jul
  • Re: Cost of joins?Jigal van Hemert9 Jul
  • Re: Cost of joins?Martijn Tonies9 Jul
RE: Cost of joins?Bob)9 Jul
Re: Cost of joins?Martijn Tonies9 Jul
Re: Cost of joins?Martijn Tonies9 Jul
Re: query gets count wrongSGreen12 Jul