List:Internals« Previous MessageNext Message »
From:Stephen Tu Date:February 21 2012 7:29am
Subject:getting aggregate UDFs to execute with temp tables
View as plain text  

Please bear with me as this is going to be kind of a long post.

I have a query that is like (call it query A):

     SELECT fieldA, agg(fieldB) FROM some_tbl WHERE fieldC <= [some 
constant] GROUP BY fieldA;

Where both fieldA, fieldB, and fieldC are *not* part of any index on the 
table, and where agg() is some aggregate UDF.

EXPLAIN tells me that mysql is doing the following (in the extra 
column): "Using where; Using filesort"

Ok, that works. But then I notice in a very similar query (call it query B):

     SELECT fieldA, sum(fieldB) FROM some_tbl WHERE fieldC <= [some 
constant] GROUP BY fieldA;

EXPLAIN says: "Using where; Using temporary table; Using filesort"

When I measure the performance, I notice that query B substantially 
outperforms query A (about an order of magnitude). Note that agg() isn't 
doing anything terribly expensive, so the perf difference is not all 
attributed to the UDF. So I want to get query A to use the same 
execution strategy as query B (namely the use of a temporary table).

Therefore, I took the source code for mysql-5.5.21, and in 
sql/, I commented out the following lines:

15278  //if (!sum_item->quick_group)
15279  //  param->quick_group=0; // UDF SUM function

Upon re-compilation, EXPLAIN now tells me for query A that it is doing 
the following: "Using where; Using temporary table; Using filesort".

Great, I thought, that was easy! However, now when I execute query A, 
agg(fieldB) simply returns NULL. I verified that my UDF isn't even being 
called. Presumably, there is a reason why UDFs don't go down this code 
path :)

As I am not very familiar with the mysql internals, I would like to know 
how I can actually get this to work. Is this even possible?

I realize that in general, UDFs can be non-deterministic etc, so you 
have to be very careful when executing them. However, in my particular 
case, my agg() UDF has the exact same semantics as sum(), so I would 
hopefully like it to also have the same performance characteristics. I'm 
ok with running some modified mysql to get this to work.

I hope what I am asking is clear. I'm willing to provide more 
information if this is not enough.

Stephen Tu
getting aggregate UDFs to execute with temp tablesStephen Tu21 Feb
  • re: getting aggregate UDFs to execute with temp tablesMichael Widenius21 Feb