Sure can:
show create table activities;
CREATE TABLE `activities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`email` varchar(100) DEFAULT NULL,
`country_iso` varchar(2) DEFAULT NULL,
`tags` varchar(255) DEFAULT NULL,
`postcode` int(11) DEFAULT NULL,
`activity` varchar(100) DEFAULT NULL,
`page_id` int(11) DEFAULT NULL,
`donation_frequency` varchar(100) DEFAULT NULL,
`email_id` int(11) DEFAULT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_idx` (`user_id`),
KEY `email_idx` (`email_id`),
KEY `activity_idx` (`activity`)
) ENGINE=MyISAM AUTO_INCREMENT=11331976 DEFAULT CHARSET=latin1
And the explain:
+----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
| 1 | SIMPLE | u | index | NULL |
id_idx | 5 | NULL | 972064 | Using index |
| 1 | SIMPLE | a | ref | user_idx,email_idx,activity_idx |
user_idx | 5 | getup.u.id | 20 | Using where |
+----+-------------+-------+-------+---------------------------------+----------+---------+------------+--------+-------------+
Cheers,
Leonardo Borges
www.leonardoborges.com
On Fri, Jul 8, 2011 at 11:58 AM, Johnny Withers <johnny@stripped>wrote:
> Can you post show create table for activity and explain output of the
> problem query?
>
> On Jul 7, 2011 8:51 PM, "Leonardo Borges" <leonardoborges.rj@stripped>
> wrote:
>
> Hello everyone,
>
> I have an increasingly popular web application running on top of mysql and
> due to its popularity, I'm running into performance issues. After carefully
> examining database indexes and tuning queries I was able to pin down the
> slowest part of the system.
>
> The app's got a user segmentation tool that allows you to filter users
> based
> on a range of criteria from which the slowest is: "Select all users that
> did
> not receive the email of id 100"
>
> To answer this question we turn to the activities table, which is basically
> a denormalized log of actions taken by the user in this format:
> user_id | activity | email_id | ...
> 10 | email_sent | 100 | ...
> 10 | subscribed | NULL | ...
> 10 | email_open | 100 | ...
>
>
> Given this table and the question above, the usual way of finding out all
> users who did not receive this email is through the use of a left outer
> join, such as:
>
> select u.id
> from users u
> left outer join activities a
> on u.id = a.user_id
> and a.activity = 'email_sent'
> and a.email_id = 100
> where a.user_id is null
>
> That's all fine for medium-ish tables. However our current activities
> table
> has over 13 million rows, slowing the hell out of this left outer join,
> taking about 52 seconds in my machine.
>
> What this query is trying to do is to get the relative complement of set
> A(users) to B(activities). As far as I know mysql doesn't support set
> subtraction, thus the reason for these queries being slow.
>
> Based on that I've setup a test database on Postgresql, which supports this
> very set operation and rewrote the query to look like this:
>
> select u.id
> from users u
> except
> select a.user_id
> from activities a
> where a.activity = 'email_sent'
> and a.email_id = 100;
>
> The fact that postgresql knows how to subtract sets brought this query down
> to only 4 seconds.
>
> My question then is: since this is a somewhat common query in our system,
> are there any workarounds I could use in mysql to improve things?
>
> I did find one myself, but it's a bit convoluted and might not perform well
> under load, but the following sql script gives me similar performance in
> mysql:
>
> create temporary table email_sent_100
> select a.user_id
> from user_activity_events a
> where a.activity = 'email_sent'
>
>
> and a.email_id = 100;
>
> create index user_id_idx on email_sent_100(user_id); //this could
> potentially bring the runtime down in the case of a larg temp table.
>
> select count(u.id)
> from users u
> left outer join email_sent_100 s
> on u.id = s.user_id
> and s.user_id is null;
>
> A lot more lines and a lot more complex, but does the job in this example.
>
> I'd appreciate your thoughts.
>
> Cheers,
> Leonardo Borges
> www.leonardoborges.com
>
>