List:General Discussion« Previous MessageNext Message »
From:Johnny Withers Date:July 8 2011 1:58am
Subject:Re: MySQL and set complements
View as plain text  
Can you post show create table for activity and explain output of the
problem query?

On Jul 7, 2011 8:51 PM, "Leonardo Borges" <leonardoborges.rj@stripped>
wrote:

Hello everyone,

I have an increasingly popular web application running on top of mysql and
due to its popularity, I'm running into performance issues. After carefully
examining database indexes and tuning queries I was able to pin down the
slowest part of the system.

The app's got a user segmentation tool that allows you to filter users based
on a range of criteria from which the slowest is: "Select all users that did
not receive the email of id 100"

To answer this question we turn to the activities table, which is basically
a denormalized log of actions taken by the user in this format:
user_id  | activity        | email_id | ...
10         | email_sent   | 100        | ...
10         | subscribed   | NULL     | ...
10         | email_open  | 100        | ...


Given this table and the question above, the usual way of finding out all
users who did not receive this email is through the use of a left outer
join, such as:

select u.id
from users u
left outer join activities a
on u.id = a.user_id
and a.activity = 'email_sent'
and a.email_id = 100
where a.user_id is null

That's all fine  for medium-ish tables. However our current activities table
has over 13 million rows, slowing the hell out of this left outer join,
taking about 52 seconds in my machine.

What this query is trying to do is to get the relative complement of set
A(users) to B(activities). As far as I know mysql doesn't support set
subtraction, thus the reason for these queries being slow.

Based on that I've setup a test database on Postgresql, which supports this
very set operation and rewrote the query to look like this:

select u.id
from users u
except
select a.user_id
from activities a
where a.activity = 'email_sent'
and a.email_id = 100;

The fact that postgresql knows how to subtract sets brought this query down
to only 4 seconds.

My question then is: since this is a somewhat common query in our system,
are there any workarounds I could use in mysql to improve things?

I did find one myself, but it's a bit convoluted and might not perform well
under load, but the following sql script gives me similar performance in
mysql:

create temporary table email_sent_100
select a.user_id
from user_activity_events a
where a.activity = 'email_sent'


and a.email_id = 100;

create index user_id_idx on email_sent_100(user_id);   //this could
potentially bring the runtime down in the case of a larg temp table.

select count(u.id)
from users u
left outer join email_sent_100 s
  on u.id = s.user_id
  and s.user_id is null;

A lot more lines and a lot more complex, but does the job in this example.

I'd appreciate your thoughts.

Cheers,
Leonardo Borges
www.leonardoborges.com

Thread
MySQL and set complementsLeonardo Borges8 Jul
  • Re: MySQL and set complementsJohnny Withers8 Jul
    • Re: MySQL and set complementsLeonardo Borges8 Jul
      • Re: MySQL and set complementsJohnny Withers8 Jul
        • Re: MySQL and set complementsLeonardo Borges8 Jul
          • Re: MySQL and set complementsJohnny Withers8 Jul
            • Re: MySQL and set complementsLeonardo Borges8 Jul
Re: MySQL and set complementsmos8 Jul
  • Re: MySQL and set complementsLeonardo Borges12 Jul