List:General Discussion« Previous MessageNext Message »
From:Lucio Chiappetti Date:June 10 2008 4:59pm
Subject:optimizing UNIONs ?
View as plain text  
(mysql  5.0.27 on SuSE Linux)

I recently thought to use UNIONs to allow me to concatenate "vertically" 
some database tables with statements like this

create or replace view combo  as
   (select * from nov06) union
   (select *,ra_corr as ra_cor2,dec_corr as dec_cor2 from jul07) union
   (select *,ra_corr as ra_cor2,dec_corr as dec_cor2 from subaru where
    field<2100);

(a show create view tells me this was created as CREATE 
ALGORITHM=UNDEFINED, and of course gives the full list of columns)

The sense of the above is that "nov06" is a first release of an 
astronomical database containing a sky region, and the related catalogue 
has already been published (so it shall be mantained unchanged).

"jul07" and "subaru" are two incremental additions in other sky regions, 
which I'd like to see "all together" as an union ... and at the same time
to keep physically separate to ease maintenance.

The three tables in the union have the same layout, except that nov06 has 
two columns more. To allow the unions these columns are "mimicked" as 
identical copies of two other columns in the other two tables (per CREATE 
statement above).

I do not expect the CREATE makes any difficulty ... it is shown as a full 
list of columns.

All three tables have two indices, an UNIQUE one on two columns 
(`field`,`id`), and another one on the single column "seq" (a sequence 
number which is also auto_increment ... but the tables are static once 
created). In particular seq runs sequentially from table to table so that 
the first jul07.seq is equal to the last nov06.seq + 1 and so on.

So far so good ...

   ... those unions work nicely, only slightly slower than a single
   table

.............................................................

In the past I had (and still have) also some views which allow 
simultaneous "horizontal" access to more than one single table via a 
glorified correlation table (just a table of pointers, I hope the 
definition below illustrates the usage clearly enough)

create ALGORITHM=TEMPTABLE VIEW XLSS as
  list of column aliases
  from glorlss06 left join nov06   on glorlss06.nov06  =nov06.seq
                 left join nov06b  on glorlss06.nov06b =nov06b.seq
                 left join nov06cd on glorlss06.nov06cd=nov06cd.seq ;

This is just an example with three "horizontal" members. I have more 
complex examples with up to 30 members, and lived satisfactorily with 
them.

(the ALGORITHM=TEMPTABLE and sometimes the usage of SELECT STRAIGHT_JOIN
are the only tricks required to improve efficiency, the latter was even 
discussed on this list ... ah the glorlss06 of course have a couple of 
indices, an unique one on (`seq`,`nov06`) and another on nov06 alone.

Just for reference this is example of EXPLAIN SELECT on such view

explain select Xcatname,Xseq,Xra,Xdec from XLSS where Xseq=13121;
+----+-------------+------------+------+---------------+-----------+---------+-------------------------+------+-------------+
| id | select_type | table      | type | possible_keys | key       | 
key_len | ref                     | rows | Extra       |
+----+-------------+------------+------+---------------+-----------+---------+-------------------------+------+-------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL      | NULL 
| NULL                    | 3385 | Using where |
|  2 | DERIVED     | glorlss06  | ALL  | NULL          | NULL      | NULL 
| NULL                    | 3385 |             |
|  2 | DERIVED     | nov06      | ref  | auxiliary     | auxiliary | 4 
| lssdb.glorlss06.nov06   |   16 |             |
|  2 | DERIVED     | nov06b     | ref  | auxiliary     | auxiliary | 4 
| lssdb.glorlss06.nov06b  |   16 |             |
|  2 | DERIVED     | nov06cd    | ref  | auxiliary     | auxiliary | 4 
| lssdb.glorlss06.nov06cd |   16 |             |
+----+-------------+------------+------+---------------+-----------+---------+-------------------------+------+-------------+

In case this gets wrapped in the mail a copy can be seen at
http://sax.iasf-milano.inaf.it/~lucio/temp/Temp/example.txt

Such a statement (the select, not the explain) takes 0.19 seq

.............................................................

and now the trouble comes ... when I want to put everything together

I create an "horizontal" view whose members are "vertical" unions

create ALGORITHM=TEMPTABLE VIEW INTERIM as
  list of column aliases
  from glorcombo left join combo   on glorcombo.combo  =combo.seq
                 left join combob  on glorcombo.combob =combob.seq
                 left join combocd on glorcombo.combocd=combocd.seq ;

combo with its three members was illustrated above, and combob and combocd 
are fully equivalent unions with 3 members each. glorcombo is instead a 
physical table.

A statement fully analogous to the previous one takes now 49 sec instead 
of a fraction. All the time is spent in the analysis phase of EXPLAIN 
select (which I report below, and, in case of wrap, at the URL given 
above

explain select Xcatname,Xseq,Xra,Xdec,Xlssflag from INTERIM where 
Xseq=13121;

+----+--------------+----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type  | table          | type | possible_keys | key  | 
key_len | ref  | rows  | Extra       |
+----+--------------+----------------+------+---------------+------+---------+------+-------+-------------+
|  1 | PRIMARY      | <derived2>     | ALL  | NULL          | NULL | NULL 
| NULL |  6391 | Using where |
|  2 | DERIVED      | glorcombo      | ALL  | NULL          | NULL | NULL 
| NULL |  6391 |             |
|  2 | DERIVED      | <derived3>     | ALL  | NULL          | NULL | NULL 
| NULL | 18652 |             |
|  2 | DERIVED      | <derived6>     | ALL  | NULL          | NULL | NULL 
| NULL | 12303 |             |
|  2 | DERIVED      | <derived9>     | ALL  | NULL          | NULL | NULL 
| NULL |  8921 |             |
|  9 | DERIVED      | nov06cd        | ALL  | NULL          | NULL | NULL 
| NULL |  5917 |             |
| 10 | UNION        | jul07cd        | ALL  | NULL          | NULL | NULL 
| NULL |  2185 |             |
| 11 | UNION        | subarucd       | ALL  | indice        | NULL | NULL 
| NULL |  1414 | Using where |
| NULL | UNION RESULT | <union9,10,11> | ALL  | NULL          | NULL | 
NULL    | NULL |  NULL |             |
|  6 | DERIVED      | nov06b         | ALL  | NULL          | NULL | NULL 
| NULL |  7986 |             |
|  7 | UNION        | jul07b         | ALL  | NULL          | NULL | NULL 
| NULL |  3262 |             |
|  8 | UNION        | subarub        | ALL  | indice        | NULL | NULL 
| NULL |  2044 | Using where |
| NULL | UNION RESULT | <union6,7,8>   | ALL  | NULL          | NULL | 
NULL    | NULL |  NULL |             |
|  3 | DERIVED      | nov06          | ALL  | NULL          | NULL | NULL 
| NULL | 12380 |             |
|  4 | UNION        | jul07          | ALL  | NULL          | NULL | NULL 
| NULL |  4783 |             |
|  5 | UNION        | subaru         | ALL  | indice        | NULL | NULL 
| NULL |  2793 | Using where |
| NULL | UNION RESULT | <union3,4,5>   | ALL  | NULL          | NULL | 
NULL    | NULL |  NULL |             |
+----+--------------+----------------+------+---------------+------+---------+------+-------+-------------+

in this case replacing the  select with a select straight_join has no
effect.

Is there any way to optimize this stuff (without making a physical copy of 
the union) i.e. to force proper usage of the various indices present in 
the individual tables ?

It shall be noted that the various member tables are all of comparable 
length with just a few thousand records each.

-- 
------------------------------------------------------------------------
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
------------------------------------------------------------------------
Do not blame ME, I did NOT vote Berlusconi.
------------------------------------------------------------------------
Thread
optimizing UNIONs ?Lucio Chiappetti10 Jun
  • using Unix soft linksLucio Chiappetti9 Feb
    • Re: using Unix soft linksJohan De Meersman10 Feb
Re: optimizing UNIONs ?Lucio Chiappetti11 Jun