On Thu, May 15, 2014 at 11:01 AM, Johan De Meersman <vegivamp@stripped> wrote:
> ----- Original Message -----
>> From: "Larry Martell" <larry.martell@stripped>
>> Subject: Re: Performance boost by splitting up large table?
>> This table is queried based on requests from the users. There are 10
>> different lookup columns they can specify, and they can provide any or
> That makes it rather more of a bother, as MySQL can't (yet) skip columns in an index,
> as far as I'm aware. Someone please correct me if I'm wrong here.
>> all of these. Currently each one of the columns has an index on it.
> I'm a bit fuzzy on multiple-index queries, but I think support isn't all too sharp.
>> Would it be beneficial to create an index with all 10? Rarely are all
>> 10 specified in the query. Typically it's 3 or 4. Would it be
>> worthwhile to see which are much commonly specified and create an
>> index with just those? Or would it be better to put the commonly
>> selected columns on the index with the lookup columns?
> You may want to grab a day or week's worth of queries (either general_log with all
> the overhead and disk space that entails, or tcpdump) and pump that through
> pt-query-digest (Percona Toolkit) to see which combinations of fields are most often used,
> and add the necessary covering indices to help those queries.
We have a log going back over 2 years with over 200,000 queries, and
I've written python scripts to give the frequency of the column pulled
for the report and the column used for filtering. Not surprising, a
very small number of each are used most of the time, and a huge number
used just 1 or 2 times. I'll be digging into that next week.
> A few points to keep in mind during analysis:
> * order of fields in the where clause is largely irrelevant (although
> most-selective-first is preferred)
> * not all the fields in the index must be queried; but you MUST query a full prefix
> -> ie, for index (a, b, c, d) you can query (a, b) but for (a, c) only (a) will
> be used
> * every index constitutes a (small) performance penalty upon table updates, so don't
> go too wild either :-)
> Also helpful:
> * plenty of memory, at least enough to keep the working set in memory
> * faster disks (SSD, or FusionIO things) helps bundles for data that doesn't fit in
The db host is running with 250GB memory, db size is 470GB, spinning
2.5” 15k rpm drives, and 40 cores.
> * if you notice a tendency for multiple users (say, a dozen) to run identical
> queries (and that means /bitwise/ identical query text, down to the last space), the query
> cache might help. Don't make it too big, though, a couple of meg should suffice.