List:Internals« Previous MessageNext Message »
From:Arnold Daniels Date:June 13 2008 3:36am
Subject:Can subqueries be handled smarter?
View as plain text  
Hi all,

I've been using lib_mysqludf_xql (http://www.mysqludf.com) to create 
XML. To get an XML tree I use subqueries which aggregate the data. 
Unfortunately however this doesn't work well since the performance of 
using subqueries isn't good.

When I perform something like:

SELECT xql_agg('Countries' as `xql:root`, 'country' as `xql:child`, 
`Code`, `Name`, `Continent`, `Region`, `cities` as `xql:raw`, 
`languages` as `xql:raw`) FROM `Country`
LEFT JOIN (SELECT `CountryCode`, xql_agg('cities' AS `xql:root`, `Name` 
AS `City`) AS `cities` FROM `City` GROUP BY `CountryCode`) AS `City` ON  
`Country`.`Code` = `City`.`CountryCode`
LEFT JOIN (SELECT `CountryCode`, xql_agg('Languages' AS `xql:root`, 
`Language`) AS `languages` FROM `CountryLanguage` WHERE `IsOfficial` = 
'T' GROUP BY `CountryCode`) AS `CountryLanguage` ON  `Country`.`Code` = 
`CountryLanguage`.`CountryCode`
WHERE `Country`.`Code` IN ('NLD', 'BEL', 'USA', 'XXY');

An explain shows me that the subqueries are executed for all rows. Also 
no keys were used to join the subqueries.
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+----------------+
| id | select_type | table           | type  | possible_keys | key     | 
key_len | ref  | rows | Extra          |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+----------------+
|  1 | PRIMARY     | Country         | range | PRIMARY       | PRIMARY | 
3       | NULL |    3 | Using where    |
|  1 | PRIMARY     | <derived2>      | ALL   | NULL          | NULL    | 
NULL    | NULL |  232 |                |
|  1 | PRIMARY     | <derived3>      | ALL   | NULL          | NULL    | 
NULL    | NULL |  190 |                |
|  3 | DERIVED     | CountryLanguage | ALL   | NULL          | NULL    | 
NULL    | NULL |  984 | Using filesort |
|  2 | DERIVED     | City            | ALL   | NULL          | NULL    | 
NULL    | NULL | 4079 | Using filesort |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+----------------+

This way subqueries are almost useless.


It would probably perform a lot better if something like to following 
was done.

SELECT 'Countries' as `xql:root`, 'country' as `xql:child`, `Code`, 
`Name`, `Continent`, `Region` FROM `Country` WHERE `Country`.`Code` IN 
('NLD', 'BEL', 'USA', 'XXY');
-- Index the values `Country`.`Code` as the resultset is created.

-- Compare the cardinality of `City`.`CountryCode` (239) to the 
temporary index of `Country`.`Code` (3). Since it is much smaller add a 
filter which acts like an IN.
SELECT `CountryCode`, xql_agg('cities' AS `xql:root`, `Name` AS `City`) 
AS `cities` FROM `City` WHERE `CountryCode` IN ('NLD', 'BEL', 'USA') 
GROUP BY `CountryCode`;
-- Don't create a temp table for this, but join each record to the main 
resultset using the temporary index, adding `cities`.

-- Compare the cardinality of `City`.`CountryCode` (246) to the 
temporary index of `Country`.`Code` (3). Since it is much smaller add a 
filter which acts like an IN.
SELECT `CountryCode`, xql_agg('Languages' AS `xql:root`, `Language`) AS 
`languages` FROM `CountryLanguage` WHERE `IsOfficial` = 'T' AND 
`CountryCode` IN ('NLD', 'BEL', 'USA') GROUP BY `CountryCode`;
-- Don't create a temp table for this, but join each record to the main 
resultset using the temporary index, adding `languages`.

SELECT xql_agg(`xql:root` as `xql:root`, `xql:child` as `xql:child`, 
`Code`, `Name`, `Continent`, `Region`, `cities` as `xql:raw`, 
`languages` as `xql:raw`) FROM <main resultset>


This is just a very rough idea. I'm very aware that this isn't this 
simple in many situation, like when it isn't a 1-to-1 relationship. 
However I think something like this should be possible when looked at be 
even smarter people than me. It could open the road to getting non-flat 
data using SQL queries.


 - Arnold
Thread
Can subqueries be handled smarter?Arnold Daniels13 Jun
  • Re: Can subqueries be handled smarter?Sergey Petrunia17 Jun