Leonardo,
What happens when you use "force index(user_id)" ?
See http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Mike
At 09:19 AM 7/8/2011, you wrote:
>Same as before, but with the new index listed in the possible keys:
>
>+----+-------------+-------+-------+----------------------------------------------+----------+---------+------------+--------+-------------+
>| id | select_type | table | type | possible_keys
> | key | key_len | ref | rows | Extra |
>+----+-------------+-------+-------+----------------------------------------------+----------+---------+------------+--------+-------------+
>| 1 | SIMPLE | u | index | NULL
> | id_idx | 5 | NULL | 972064 | Using index |
>| 1 | SIMPLE | a | ref |
>user_idx,email_idx,activity_idx,compound_idx | user_idx | 5 |
>getup.u.id | 20 | Using where |
>+----+-------------+-------+-------+----------------------------------------------+----------+---------+------------+--------+-------------+
>
>
>
>On Sat, Jul 9, 2011 at 12:00 AM, Johnny Withers <johnny@stripped>wrote:
>
> > What did the explain output look like after the new index?
> >
> >
> > On Fri, Jul 8, 2011 at 8:53 AM, Leonardo Borges <
> > leonardoborges.rj@stripped> wrote:
> >
> >> Hi Johnny,
> >>
> >> I just gave that a try but it didn't help as I suspected.
> >>
> >> I still believe the problem is in mysql not being able to handle set
> >> subtractions. Therefore, it has to perform the work harder to return the
> >> rows that represent a "no match" with NULL values in place so they can
> then
> >> be filtered by the WHERE clause.
> >>
> >>
> >> This type of query seems to be a corner case in mysql one should be aware
> >> about when working with large datasets.
> >>
> >> Cheers,
> >> Leonardo Borges
> >> www.leonardoborges.com
> >>
> >>
> >> On Fri, Jul 8, 2011 at 11:18 PM, Johnny Withers
> <johnny@stripped>wrote:
> >>
> >>> Leonardo,
> >>>
> >>> I think a new compound key on email_id and activity in the activities
> >>> table may help.
> >>>
> >>> I'm not sure if this will help or not, Its hard to test w/o having a
> >>> large data set to test against.
> >>>
> >>>
> >>> On Thu, Jul 7, 2011 at 9:07 PM, Leonardo Borges <
> >>> leonardoborges.rj@stripped> wrote:
> >>>
> >>>> Sure can:
> >>>>
> >>>> show create table activities;
> >>>>
> >>>> CREATE TABLE `activities` (
> >>>> `id` int(11) NOT NULL AUTO_INCREMENT,
> >>>> `user_id` int(11) DEFAULT NULL,
> >>>> `email` varchar(100) DEFAULT NULL,
> >>>> `country_iso` varchar(2) DEFAULT NULL,
> >>>> `tags` varchar(255) DEFAULT NULL,
> >>>> `postcode` int(11) DEFAULT NULL,
> >>>> `activity` varchar(100) DEFAULT NULL,
> >>>> `page_id` int(11) DEFAULT NULL,
> >>>> `donation_frequency` varchar(100) DEFAULT NULL,
> >>>> `email_id` int(11) DEFAULT NULL,
> >>>> `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON
> UPDATE
> >>>> CURRENT_TIMESTAMP,
> >>>> PRIMARY KEY (`id`),
> >>>> KEY `user_idx` (`user_id`),
> >>>> KEY `email_idx` (`email_id`),
> >>>> KEY `activity_idx` (`activity`)
> >>>> ) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1
> >>>>
> >>>>
> >>>> And the explain:
> >>>>
> >>>>
> >>>>
>
> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
> >>>> | id | select_type | table | type | possible_keys
> |
> >>>> key | key_len | ref | rows | Extra |
> >>>>
> >>>>
>
> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
> >>>> | 1 | SIMPLE | u | index | NULL
> |
> >>>> id_idx | 5 | NULL | 972064 | Using index |
> >>>> | 1 | SIMPLE | a | ref | user_idx,email_idx,activity_idx
> |
> >>>> user_idx | 5 | getup.u.id | 20 | Using where |
> >>>>
> >>>>
>
> +----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
> >>>>
> >>>>
> >>>> Cheers,
> >>>> Leonardo Borges
> >>>> www.leonardoborges.com
> >>>>
> >>>>
> >>>> On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers
> <johnny@stripped>wrote:
> >>>>
> >>>>> Can you post show create table for activity and explain output
> of the
> >>>>> problem query?
> >>>>>
> >>>>> On Jul 7, 2011 8:51 PM, "Leonardo Borges"
> <leonardoborges.rj@stripped>
> >>>>> wrote:
> >>>>>
> >>>>> Hello everyone,
> >>>>>
> >>>>> I have an increasingly popular web application running on top of
> mysql
> >>>>> and
> >>>>> due to its popularity, I'm running into performance issues.
> After
> >>>>> carefully
> >>>>> examining database indexes and tuning queries I was able to pin
> down
> >>>>> the
> >>>>> slowest part of the system.
> >>>>>
> >>>>> The app's got a user segmentation tool that allows you to filter
> users
> >>>>> based
> >>>>> on a range of criteria from which the slowest is: "Select all
> users
> >>>>> that did
> >>>>> not receive the email of id 100"
> >>>>>
> >>>>> To answer this question we turn to the activities table, which
> is
> >>>>> basically
> >>>>> a denormalized log of actions taken by the user in this format:
> >>>>> user_id | activity | email_id | ...
> >>>>> 10 | email_sent | 100 | ...
> >>>>> 10 | subscribed | NULL | ...
> >>>>> 10 | email_open | 100 | ...
> >>>>>
> >>>>>
> >>>>> Given this table and the question above, the usual way of
> finding out
> >>>>> all
> >>>>> users who did not receive this email is through the use of a
> left outer
> >>>>> join, such as:
> >>>>>
> >>>>> select u.id
> >>>>> from users u
> >>>>> left outer join activities a
> >>>>> on u.id = a.user_id
> >>>>> and a.activity = 'email_sent'
> >>>>> and a.email_id = 100
> >>>>> where a.user_id is null
> >>>>>
> >>>>> That's all fine for medium-ish tables. However our current
> activities
> >>>>> table
> >>>>> has over 13 million rows, slowing the hell out of this left
> outer join,
> >>>>> taking about 52 seconds in my machine.
> >>>>>
> >>>>> What this query is trying to do is to get the relative
> complement of
> >>>>> set
> >>>>> A(users) to B(activities). As far as I know mysql doesn't
> support set
> >>>>> subtraction, thus the reason for these queries being slow.
> >>>>>
> >>>>> Based on that I've setup a test database on Postgresql, which
> supports
> >>>>> this
> >>>>> very set operation and rewrote the query to look like this:
> >>>>>
> >>>>> select u.id
> >>>>> from users u
> >>>>> except
> >>>>> select a.user_id
> >>>>> from activities a
> >>>>> where a.activity = 'email_sent'
> >>>>> and a.email_id = 100;
> >>>>>
> >>>>> The fact that postgresql knows how to subtract sets brought this
> query
> >>>>> down
> >>>>> to only 4 seconds.
> >>>>>
> >>>>> My question then is: since this is a somewhat common query in
> our
> >>>>> system,
> >>>>> are there any workarounds I could use in mysql to improve
> things?
> >>>>>
> >>>>> I did find one myself, but it's a bit convoluted and might not
> perform
> >>>>> well
> >>>>> under load, but the following sql script gives me similar
> performance
> >>>>> in
> >>>>> mysql:
> >>>>>
> >>>>> create temporary table email_sent_100
> >>>>> select a.user_id
> >>>>> from user_activity_events a
> >>>>> where a.activity = 'email_sent'
> >>>>>
> >>>>>
> >>>>> and a.email_id = 100;
> >>>>>
> >>>>> create index user_id_idx on email_sent_100(user_id); //this
> could
> >>>>> potentially bring the runtime down in the case of a larg temp
> table.
> >>>>>
> >>>>> select count(u.id)
> >>>>> from users u
> >>>>> left outer join email_sent_100 s
> >>>>> on u.id = s.user_id
> >>>>> and s.user_id is null;
> >>>>>
> >>>>> A lot more lines and a lot more complex, but does the job in
> this
> >>>>> example.
> >>>>>
> >>>>> I'd appreciate your thoughts.
> >>>>>
> >>>>> Cheers,
> >>>>> Leonardo Borges
> >>>>> www.leonardoborges.com
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> -----------------------------
> >>> Johnny Withers
> >>> 601.209.4985
> >>> johnny@stripped
> >>>
> >>
> >>
> >
> >
> > --
> > -----------------------------
> > Johnny Withers
> > 601.209.4985
> > johnny@stripped
> >