From: Michael Widenius Date: February 21 2012 6:50pm Subject: re: getting aggregate UDFs to execute with temp tables List-Archive: http://lists.mysql.com/internals/38450 Message-Id: <20291.59235.228001.787817@narttu.askmonty.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi! >>>>> "Stephen" == Stephen Tu writes: Stephen> Hello, Stephen> Please bear with me as this is going to be kind of a long post. Stephen> I have a query that is like (call it query A): Stephen> SELECT fieldA, agg(fieldB) FROM some_tbl WHERE fieldC <= [some Stephen> constant] GROUP BY fieldA; Stephen> Where both fieldA, fieldB, and fieldC are *not* part of any index on the Stephen> table, and where agg() is some aggregate UDF. Stephen> Therefore, I took the source code for mysql-5.5.21, and in Stephen> sql/sql_select.cc, I commented out the following lines: Stephen> 15278 //if (!sum_item->quick_group) Stephen> 15279 // param->quick_group=0; // UDF SUM function Stephen> Upon re-compilation, EXPLAIN now tells me for query A that it is doing Stephen> the following: "Using where; Using temporary table; Using filesort". Stephen> Great, I thought, that was easy! However, now when I execute query A, Stephen> agg(fieldB) simply returns NULL. I verified that my UDF isn't even being Stephen> called. Presumably, there is a reason why UDFs don't go down this code Stephen> path :) Stephen> As I am not very familiar with the mysql internals, I would like to know Stephen> how I can actually get this to work. Is this even possible? The reason for the above is the following: MySQL has two ways to calculate aggreagates: a) - Sort things according to the items in the GROUP BY part - For each row - call add() method for all aggregated items - When the GROUP BY changes value, send the current calculated values to the next stage (normally the client) and call the clear() method on all aggregate items. This method is is the one that is supported with the aggregate UDF interface. b) - Create a temporary table to hold one result row. (For each aggregate item we use the method create_temp_field() to create a field) - For each row - Generate the GROUP BY KEY - Search for a row in the temporary table matching that key - If row didn't exists - Call reset_field() for all aggregate items - Write a row with the group-by key and aggregated item values to the temporary table - If row existed - call update_field() for all aggregate fields - Update row in temporary table - Sort temporary table in group-by order and send result to client. The reason UDF:s can't use b) is that the UDF interface doesn't support the reset_field(), update_field(), create_temp_field() and result_item() methods. (This was done to keep the udf interface simple). If you have a high desire for speed, then you should consider creating a native aggregate function in MariaDB/MySQL. Check the files item_sum.h and item_sum.cc. You should be able to use solve your problem relatively easily by inherit from some of the already existing items. Regards, Monty Creator of MySQL and MariaDB