List:General Discussion« Previous MessageNext Message »
From:Larry Martell Date:May 16 2014 12:03am
Subject:Re: Performance boost by splitting up large table?
View as plain text  
On Thu, May 15, 2014 at 11:01 AM, Johan De Meersman <vegivamp@stripped> wrote:
> ----- Original Message -----
>> From: "Larry Martell" <larry.martell@stripped>
>> Subject: Re: Performance boost by splitting up large table?
>>
>> This table is queried based on requests from the users. There are 10
>> different lookup columns they can specify, and they can provide any or
>
> That makes it rather more of a bother, as MySQL can't (yet) skip columns in an index,
> as far as I'm aware. Someone please correct me if I'm wrong here.
>
>
>> all of these. Currently each one of the columns has an index on it.
>
> I'm a bit fuzzy on multiple-index queries, but I think support isn't all too sharp.
>
>
>> Would it be beneficial to create an index with all 10? Rarely are all
>> 10 specified in the query. Typically it's 3 or 4. Would it be
>> worthwhile to see which are much commonly specified and create an
>> index with just those? Or would it be better to put the commonly
>> selected columns on the index with the lookup columns?
>
>
> You may want to grab a day or week's worth of queries (either general_log with all
> the overhead and disk space that entails, or tcpdump) and pump that through
> pt-query-digest (Percona Toolkit) to see which combinations of fields are most often used,
> and add the necessary covering indices to help those queries.

We have a log going back over 2 years with over 200,000 queries, and
I've written python scripts to give the frequency of the column pulled
for the report and the column used for filtering. Not surprising, a
very small number of each are used most of the time, and a huge number
used just 1 or 2 times. I'll be digging into that next week.

> A few points to keep in mind during analysis:
>  * order of fields in the where clause is largely irrelevant (although
> most-selective-first is preferred)
>  * not all the fields in the index must be queried; but you MUST query a full prefix
> set
>    -> ie, for index (a, b, c, d) you can query (a, b) but for (a, c) only (a) will
> be used
>  * every index constitutes a (small) performance penalty upon table updates, so don't
> go too wild either :-)
>
> Also helpful:
>  * plenty of memory, at least enough to keep the working set in memory
>  * faster disks (SSD, or FusionIO things) helps bundles for data that doesn't fit in
> memory

The db host is running with 250GB memory, db size is 470GB, spinning
2.5” 15k rpm drives, and 40 cores.

>  * if you notice a tendency for multiple users (say, a dozen) to run identical
> queries (and that means /bitwise/ identical query text, down to the last space), the query
> cache might help. Don't make it too big, though, a couple of meg should suffice.
Thread
Performance boost by splitting up large table?Larry Martell14 May 2014
  • Re: Performance boost by splitting up large table?Sukhjinder K. Narula14 May 2014
  • Re: Performance boost by splitting up large table?Morgan Tocker14 May 2014
    • Re: Performance boost by splitting up large table?Johan De Meersman15 May 2014
      • Re: Performance boost by splitting up large table?Larry Martell15 May 2014
        • Re: Performance boost by splitting up large table?Johan De Meersman15 May 2014
          • Re: Performance boost by splitting up large table?Larry Martell16 May 2014