Hi all,
I've been using lib_mysqludf_xql (http://www.mysqludf.com) to create
XML. To get an XML tree I use subqueries which aggregate the data.
Unfortunately however this doesn't work well since the performance of
using subqueries isn't good.
When I perform something like:
SELECT xql_agg('Countries' as `xql:root`, 'country' as `xql:child`,
`Code`, `Name`, `Continent`, `Region`, `cities` as `xql:raw`,
`languages` as `xql:raw`) FROM `Country`
LEFT JOIN (SELECT `CountryCode`, xql_agg('cities' AS `xql:root`, `Name`
AS `City`) AS `cities` FROM `City` GROUP BY `CountryCode`) AS `City` ON
`Country`.`Code` = `City`.`CountryCode`
LEFT JOIN (SELECT `CountryCode`, xql_agg('Languages' AS `xql:root`,
`Language`) AS `languages` FROM `CountryLanguage` WHERE `IsOfficial` =
'T' GROUP BY `CountryCode`) AS `CountryLanguage` ON `Country`.`Code` =
`CountryLanguage`.`CountryCode`
WHERE `Country`.`Code` IN ('NLD', 'BEL', 'USA', 'XXY');
An explain shows me that the subqueries are executed for all rows. Also
no keys were used to join the subqueries.
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+----------------+
| 1 | PRIMARY | Country | range | PRIMARY | PRIMARY |
3 | NULL | 3 | Using where |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL |
NULL | NULL | 232 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL |
NULL | NULL | 190 | |
| 3 | DERIVED | CountryLanguage | ALL | NULL | NULL |
NULL | NULL | 984 | Using filesort |
| 2 | DERIVED | City | ALL | NULL | NULL |
NULL | NULL | 4079 | Using filesort |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+----------------+
This way subqueries are almost useless.
It would probably perform a lot better if something like to following
was done.
SELECT 'Countries' as `xql:root`, 'country' as `xql:child`, `Code`,
`Name`, `Continent`, `Region` FROM `Country` WHERE `Country`.`Code` IN
('NLD', 'BEL', 'USA', 'XXY');
-- Index the values `Country`.`Code` as the resultset is created.
-- Compare the cardinality of `City`.`CountryCode` (239) to the
temporary index of `Country`.`Code` (3). Since it is much smaller add a
filter which acts like an IN.
SELECT `CountryCode`, xql_agg('cities' AS `xql:root`, `Name` AS `City`)
AS `cities` FROM `City` WHERE `CountryCode` IN ('NLD', 'BEL', 'USA')
GROUP BY `CountryCode`;
-- Don't create a temp table for this, but join each record to the main
resultset using the temporary index, adding `cities`.
-- Compare the cardinality of `City`.`CountryCode` (246) to the
temporary index of `Country`.`Code` (3). Since it is much smaller add a
filter which acts like an IN.
SELECT `CountryCode`, xql_agg('Languages' AS `xql:root`, `Language`) AS
`languages` FROM `CountryLanguage` WHERE `IsOfficial` = 'T' AND
`CountryCode` IN ('NLD', 'BEL', 'USA') GROUP BY `CountryCode`;
-- Don't create a temp table for this, but join each record to the main
resultset using the temporary index, adding `languages`.
SELECT xql_agg(`xql:root` as `xql:root`, `xql:child` as `xql:child`,
`Code`, `Name`, `Continent`, `Region`, `cities` as `xql:raw`,
`languages` as `xql:raw`) FROM <main resultset>
This is just a very rough idea. I'm very aware that this isn't this
simple in many situation, like when it isn't a 1-to-1 relationship.
However I think something like this should be possible when looked at be
even smarter people than me. It could open the road to getting non-flat
data using SQL queries.
- Arnold