From: Date: June 13 2008 3:36am Subject: Can subqueries be handled smarter? List-Archive: http://lists.mysql.com/internals/35693 Message-Id: <4851CF14.2000305@adaniels.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi all, I've been using lib_mysqludf_xql (http://www.mysqludf.com) to create XML. To get an XML tree I use subqueries which aggregate the data. Unfortunately however this doesn't work well since the performance of using subqueries isn't good. When I perform something like: SELECT xql_agg('Countries' as `xql:root`, 'country' as `xql:child`, `Code`, `Name`, `Continent`, `Region`, `cities` as `xql:raw`, `languages` as `xql:raw`) FROM `Country` LEFT JOIN (SELECT `CountryCode`, xql_agg('cities' AS `xql:root`, `Name` AS `City`) AS `cities` FROM `City` GROUP BY `CountryCode`) AS `City` ON `Country`.`Code` = `City`.`CountryCode` LEFT JOIN (SELECT `CountryCode`, xql_agg('Languages' AS `xql:root`, `Language`) AS `languages` FROM `CountryLanguage` WHERE `IsOfficial` = 'T' GROUP BY `CountryCode`) AS `CountryLanguage` ON `Country`.`Code` = `CountryLanguage`.`CountryCode` WHERE `Country`.`Code` IN ('NLD', 'BEL', 'USA', 'XXY'); An explain shows me that the subqueries are executed for all rows. Also no keys were used to join the subqueries. +----+-------------+-----------------+-------+---------------+---------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+-------+---------------+---------+---------+------+------+----------------+ | 1 | PRIMARY | Country | range | PRIMARY | PRIMARY | 3 | NULL | 3 | Using where | | 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 232 | | | 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 190 | | | 3 | DERIVED | CountryLanguage | ALL | NULL | NULL | NULL | NULL | 984 | Using filesort | | 2 | DERIVED | City | ALL | NULL | NULL | NULL | NULL | 4079 | Using filesort | +----+-------------+-----------------+-------+---------------+---------+---------+------+------+----------------+ This way subqueries are almost useless. It would probably perform a lot better if something like to following was done. SELECT 'Countries' as `xql:root`, 'country' as `xql:child`, `Code`, `Name`, `Continent`, `Region` FROM `Country` WHERE `Country`.`Code` IN ('NLD', 'BEL', 'USA', 'XXY'); -- Index the values `Country`.`Code` as the resultset is created. -- Compare the cardinality of `City`.`CountryCode` (239) to the temporary index of `Country`.`Code` (3). Since it is much smaller add a filter which acts like an IN. SELECT `CountryCode`, xql_agg('cities' AS `xql:root`, `Name` AS `City`) AS `cities` FROM `City` WHERE `CountryCode` IN ('NLD', 'BEL', 'USA') GROUP BY `CountryCode`; -- Don't create a temp table for this, but join each record to the main resultset using the temporary index, adding `cities`. -- Compare the cardinality of `City`.`CountryCode` (246) to the temporary index of `Country`.`Code` (3). Since it is much smaller add a filter which acts like an IN. SELECT `CountryCode`, xql_agg('Languages' AS `xql:root`, `Language`) AS `languages` FROM `CountryLanguage` WHERE `IsOfficial` = 'T' AND `CountryCode` IN ('NLD', 'BEL', 'USA') GROUP BY `CountryCode`; -- Don't create a temp table for this, but join each record to the main resultset using the temporary index, adding `languages`. SELECT xql_agg(`xql:root` as `xql:root`, `xql:child` as `xql:child`, `Code`, `Name`, `Continent`, `Region`, `cities` as `xql:raw`, `languages` as `xql:raw`) FROM
This is just a very rough idea. I'm very aware that this isn't this simple in many situation, like when it isn't a 1-to-1 relationship. However I think something like this should be possible when looked at be even smarter people than me. It could open the road to getting non-flat data using SQL queries. - Arnold