List:General Discussion« Previous MessageNext Message »
From:mos Date:July 8 2011 9:24pm
Subject:Re: MySQL and set complements
View as plain text  
Leonardo,
     What happens when you use "force index(user_id)" ?

See http://dev.mysql.com/doc/refman/5.1/en/index-hints.html

Mike

At 09:19 AM 7/8/2011, you wrote:
>Same as before, but with the new index listed in the possible keys:
>
>+----+-------------+-------+-------+----------------------------------------------+----------+---------+------------+--------+-------------+
>| id | select_type | table | type  | possible_keys
>      | key      | key_len | ref        | rows   | Extra       |
>+----+-------------+-------+-------+----------------------------------------------+----------+---------+------------+--------+-------------+
>|  1 | SIMPLE      | u     | index | NULL
>       | id_idx   | 5       | NULL       | 972064 | Using index |
>|  1 | SIMPLE      | a     | ref   |
>user_idx,email_idx,activity_idx,compound_idx | user_idx | 5       |
>getup.u.id |     20 | Using where |
>+----+-------------+-------+-------+----------------------------------------------+----------+---------+------------+--------+-------------+
>
>
>
>On Sat, Jul 9, 2011 at 12:00 AM, Johnny Withers <johnny@stripped>wrote:
>
> > What did the explain output look like after the new index?
> >
> >
> > On Fri, Jul 8, 2011 at 8:53 AM, Leonardo Borges <
> > leonardoborges.rj@stripped> wrote:
> >
> >> Hi Johnny,
> >>
> >> I just gave that a try but it didn't help as I suspected.
> >>
> >> I still believe the problem is in mysql not being able to handle set
> >> subtractions. Therefore, it has to perform the work harder to return the
> >> rows that represent a "no match" with NULL values in place so they can 
> then
> >> be filtered by the WHERE clause.
> >>
> >>
> >> This type of query seems to be a corner case in mysql one should be aware
> >> about when working with large datasets.
> >>
> >> Cheers,
> >> Leonardo Borges
> >> www.leonardoborges.com
> >>
> >>
> >> On Fri, Jul 8, 2011 at 11:18 PM, Johnny Withers 
> <johnny@stripped>wrote:
> >>
> >>> Leonardo,
> >>>
> >>> I think a new compound key on email_id and activity in the activities
> >>> table may help.
> >>>
> >>> I'm not sure if this will help or not, Its hard to test w/o having a
> >>> large data set to test against.
> >>>
> >>>
> >>> On Thu, Jul 7, 2011 at 9:07 PM, Leonardo Borges <
> >>> leonardoborges.rj@stripped> wrote:
> >>>
> >>>> Sure can:
> >>>>
> >>>> show create table activities;
> >>>>
> >>>> CREATE TABLE `activities` (
> >>>>   `id` int(11) NOT NULL AUTO_INCREMENT,
> >>>>   `user_id` int(11) DEFAULT NULL,
> >>>>   `email` varchar(100) DEFAULT NULL,
> >>>>   `country_iso` varchar(2) DEFAULT NULL,
> >>>>   `tags` varchar(255) DEFAULT NULL,
> >>>>   `postcode` int(11) DEFAULT NULL,
> >>>>   `activity` varchar(100) DEFAULT NULL,
> >>>>   `page_id` int(11) DEFAULT NULL,
> >>>>   `donation_frequency` varchar(100) DEFAULT NULL,
> >>>>   `email_id` int(11) DEFAULT NULL,
> >>>>   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON
> UPDATE
> >>>> CURRENT_TIMESTAMP,
> >>>>   PRIMARY KEY (`id`),
> >>>>   KEY `user_idx` (`user_id`),
> >>>>   KEY `email_idx` (`email_id`),
> >>>>   KEY `activity_idx` (`activity`)
> >>>> ) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1
> >>>>
> >>>>
> >>>> And the explain:
> >>>>
> >>>>
> >>>> 
>
> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
> >>>> | id | select_type | table | type  | possible_keys                  
> |
> >>>> key      | key_len | ref        | rows   | Extra       |
> >>>>
> >>>> 
>
> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
> >>>> |  1 | SIMPLE      | u     | index | NULL                           
> |
> >>>> id_idx   | 5       | NULL       | 972064 | Using index |
> >>>> |  1 | SIMPLE      | a     | ref   | user_idx,email_idx,activity_idx
> |
> >>>> user_idx | 5       | getup.u.id |     20 | Using where |
> >>>>
> >>>> 
>
> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
> >>>>
> >>>>
> >>>> Cheers,
> >>>> Leonardo Borges
> >>>> www.leonardoborges.com
> >>>>
> >>>>
> >>>> On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers 
> <johnny@stripped>wrote:
> >>>>
> >>>>> Can you post show create table for activity and explain output
> of the
> >>>>> problem query?
> >>>>>
> >>>>> On Jul 7, 2011 8:51 PM, "Leonardo Borges"
> <leonardoborges.rj@stripped>
> >>>>> wrote:
> >>>>>
> >>>>> Hello everyone,
> >>>>>
> >>>>> I have an increasingly popular web application running on top of
> mysql
> >>>>> and
> >>>>> due to its popularity, I'm running into performance issues.
> After
> >>>>> carefully
> >>>>> examining database indexes and tuning queries I was able to pin
> down
> >>>>> the
> >>>>> slowest part of the system.
> >>>>>
> >>>>> The app's got a user segmentation tool that allows you to filter
> users
> >>>>> based
> >>>>> on a range of criteria from which the slowest is: "Select all
> users
> >>>>> that did
> >>>>> not receive the email of id 100"
> >>>>>
> >>>>> To answer this question we turn to the activities table, which
> is
> >>>>> basically
> >>>>> a denormalized log of actions taken by the user in this format:
> >>>>> user_id  | activity        | email_id | ...
> >>>>> 10         | email_sent   | 100        | ...
> >>>>> 10         | subscribed   | NULL     | ...
> >>>>> 10         | email_open  | 100        | ...
> >>>>>
> >>>>>
> >>>>> Given this table and the question above, the usual way of
> finding out
> >>>>> all
> >>>>> users who did not receive this email is through the use of a
> left outer
> >>>>> join, such as:
> >>>>>
> >>>>> select u.id
> >>>>> from users u
> >>>>> left outer join activities a
> >>>>> on u.id = a.user_id
> >>>>> and a.activity = 'email_sent'
> >>>>> and a.email_id = 100
> >>>>> where a.user_id is null
> >>>>>
> >>>>> That's all fine  for medium-ish tables. However our current
> activities
> >>>>> table
> >>>>> has over 13 million rows, slowing the hell out of this left
> outer join,
> >>>>> taking about 52 seconds in my machine.
> >>>>>
> >>>>> What this query is trying to do is to get the relative
> complement of
> >>>>> set
> >>>>> A(users) to B(activities). As far as I know mysql doesn't
> support set
> >>>>> subtraction, thus the reason for these queries being slow.
> >>>>>
> >>>>> Based on that I've setup a test database on Postgresql, which
> supports
> >>>>> this
> >>>>> very set operation and rewrote the query to look like this:
> >>>>>
> >>>>> select u.id
> >>>>> from users u
> >>>>> except
> >>>>> select a.user_id
> >>>>> from activities a
> >>>>> where a.activity = 'email_sent'
> >>>>> and a.email_id = 100;
> >>>>>
> >>>>> The fact that postgresql knows how to subtract sets brought this
> query
> >>>>> down
> >>>>> to only 4 seconds.
> >>>>>
> >>>>> My question then is: since this is a somewhat common query in
> our
> >>>>> system,
> >>>>> are there any workarounds I could use in mysql to improve
> things?
> >>>>>
> >>>>> I did find one myself, but it's a bit convoluted and might not
> perform
> >>>>> well
> >>>>> under load, but the following sql script gives me similar
> performance
> >>>>> in
> >>>>> mysql:
> >>>>>
> >>>>> create temporary table email_sent_100
> >>>>> select a.user_id
> >>>>> from user_activity_events a
> >>>>> where a.activity = 'email_sent'
> >>>>>
> >>>>>
> >>>>> and a.email_id = 100;
> >>>>>
> >>>>> create index user_id_idx on email_sent_100(user_id);   //this
> could
> >>>>> potentially bring the runtime down in the case of a larg temp
> table.
> >>>>>
> >>>>> select count(u.id)
> >>>>> from users u
> >>>>> left outer join email_sent_100 s
> >>>>>   on u.id = s.user_id
> >>>>>   and s.user_id is null;
> >>>>>
> >>>>> A lot more lines and a lot more complex, but does the job in
> this
> >>>>> example.
> >>>>>
> >>>>> I'd appreciate your thoughts.
> >>>>>
> >>>>> Cheers,
> >>>>> Leonardo Borges
> >>>>> www.leonardoborges.com
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> -----------------------------
> >>> Johnny Withers
> >>> 601.209.4985
> >>> johnny@stripped
> >>>
> >>
> >>
> >
> >
> > --
> > -----------------------------
> > Johnny Withers
> > 601.209.4985
> > johnny@stripped
> >

Thread
MySQL and set complementsLeonardo Borges8 Jul
  • Re: MySQL and set complementsJohnny Withers8 Jul
    • Re: MySQL and set complementsLeonardo Borges8 Jul
      • Re: MySQL and set complementsJohnny Withers8 Jul
        • Re: MySQL and set complementsLeonardo Borges8 Jul
          • Re: MySQL and set complementsJohnny Withers8 Jul
            • Re: MySQL and set complementsLeonardo Borges8 Jul
Re: MySQL and set complementsmos8 Jul
  • Re: MySQL and set complementsLeonardo Borges12 Jul