List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:February 21 2012 6:50pm
Subject:re: getting aggregate UDFs to execute with temp tables
View as plain text  
Hi!

>>>>> "Stephen" == Stephen Tu <stephentu@stripped> writes:

Stephen> Hello,
Stephen> Please bear with me as this is going to be kind of a long post.

Stephen> I have a query that is like (call it query A):

Stephen>      SELECT fieldA, agg(fieldB) FROM some_tbl WHERE fieldC <= [some 
Stephen> constant] GROUP BY fieldA;

Stephen> Where both fieldA, fieldB, and fieldC are *not* part of any index on the 
Stephen> table, and where agg() is some aggregate UDF.

<cut>

Stephen> Therefore, I took the source code for mysql-5.5.21, and in 
Stephen> sql/sql_select.cc, I commented out the following lines:

Stephen> 15278  //if (!sum_item->quick_group)
Stephen> 15279  //  param->quick_group=0; // UDF SUM function

Stephen> Upon re-compilation, EXPLAIN now tells me for query A that it is doing 
Stephen> the following: "Using where; Using temporary table; Using filesort".

Stephen> Great, I thought, that was easy! However, now when I execute query A, 
Stephen> agg(fieldB) simply returns NULL. I verified that my UDF isn't even being 
Stephen> called. Presumably, there is a reason why UDFs don't go down this code 
Stephen> path :)

Stephen> As I am not very familiar with the mysql internals, I would like to know 
Stephen> how I can actually get this to work. Is this even possible?

The reason for the above is the following:

MySQL has two ways to calculate aggreagates:

a)
- Sort things according to the items in the GROUP BY part
- For each row
  - call add() method for all aggregated items
- When the GROUP BY changes value, send the current calculated values
  to the next stage (normally the client) and call the clear() method
  on all aggregate items.

This method is is the one that is supported with the aggregate UDF
interface.

b)
- Create a temporary table to hold one result row. (For each aggregate
  item we use the method create_temp_field() to create a field)

- For each row
  - Generate the GROUP BY KEY
  - Search for a row in the temporary table matching that key
  - If row didn't exists
     - Call reset_field() for all aggregate items
     - Write a row with the group-by key and aggregated item values to
       the temporary table
  - If row existed
    - call update_field() for all aggregate fields
    - Update row in temporary table
- Sort temporary table in group-by order and send result to client.


The reason UDF:s can't use b) is that the UDF interface doesn't
support the reset_field(), update_field(), create_temp_field() and
result_item() methods.

(This was done to keep the udf interface simple).

If you have a high desire for speed, then you should consider creating
a native aggregate function in MariaDB/MySQL.

Check the files item_sum.h and item_sum.cc. You should be able to use
solve your problem relatively easily by inherit from some of the
already existing items.

Regards,
Monty
Creator of MySQL and MariaDB
Thread
getting aggregate UDFs to execute with temp tablesStephen Tu21 Feb
  • re: getting aggregate UDFs to execute with temp tablesMichael Widenius21 Feb