List:Internals« Previous MessageNext Message »
From:Michael Widenius Date:June 21 2001 10:37am
Subject:Re: Testing MySQL 4.0
View as plain text  

>>>>> "Paul" == Paul Cadach <paul@stripped> writes:

Paul> Hi,
Paul> ----- Original Message -----
Paul> From: "Michael Widenius" <monty@stripped>
Paul> To: "Paul Cadach" <paul@stripped>
Paul> Cc: <internals@stripped>
Paul> Sent: Thursday, June 21, 2001 3:28 AM
Paul> Subject: Re: Testing MySQL 4.0

Paul> [skip]

Paul> Michael, can you explain which function sets already used in the
Paul> stages
Paul> listed above (to find where to split mysql_query())? I'm not
Paul> familiar with
Paul> MySQL sources... :(((
>> sql/ is where the big work should be done.
>> The function mysql_select() should be slit to 3-7 smaller functions
>> to handle the different stages instead of being a big massive
>> function. The init part and free resources part should be pretty
>> obvious, but the rest is a bit harder.

Paul> Back to story about using a class to hold and process select's information.
Paul> Splitting mysql_select() into parts will provide many (at least join, etc.)
Paul> "globals" for those parts. IMHO better to store them in classes' members
Paul> than somewhere else (thd, etc.).

This may be ok, as long as we can do this without loosing any speed...

>> I know; GDB doesn't work really good with threads.
>> try creating the following .gdbinit file in the directory you are
>> doing debugging:

Paul> Ok, I'll try...

Paul> [skip]

>> Rename 'mysql_new_subselect()' to 'mysql_init_subselect()'
>> This is more in line with mysql_init_select()

Paul> I've just track Sinisa's idea about UNIONs. For sub-select it calls
Paul> mysql_new_subselect() which allocates new select_lex and fills some fields
Paul> depended on sub-selects, then call mysql_init_select() to fill other common
Paul> parts (used for regular, sub- selects, and UNIONs) of select_lex structure.

Some of Sinisas functions names are not well thought off and I will
fix them when he gots things to work; You can ignore the name comment
for now.

>> I would be greatfull if you can follow the same coding style as the
>> rest of the code ;  This makes the total much easier to read.
>> - Space around !=
>> - If and command on different rows:

Paul> It's looks like my coding style but I've just copied 'select:' block and
Paul> modified it just a bit.


>> The expr_expr change looks VERY long.
>> Wouldn't some of the things, like
>> expr NOT LIKE '(' subselect_stmt ')'
>> be handle by adding '(' subselect_stmt ')' to 'simple_expr' ?

Paul> I think it's wrong way because it will provide syntax like:
Paul> select (select a from x)
Paul> which is ugly.

Yes, it's ugly but we have to allow that because many other databases
support this.  It will also make some things possible that you couldn't do
easily any other way.

Paul> IMHO better is group all sub-select syntax into one non-terminal like
Paul> subselect_subexpr:
Paul> NOT LIKE '(' subselect_stmt')
Paul> LIKE '(' subselect_stmt ')'
Paul> ...
Paul> and add next to ..._expr:
Paul> expr subselect_subexpr
Paul> {
Paul> $$ = mysql_subselect_make_item($1, $2);
Paul> }
Paul> where mysql_subselect_make_item() must deduce type of operation specified
Paul> before sub-select and create required Item_...

See above.

Paul> At least this will simplify YACC's source but brings complex
Paul> mysql_subselect_make_item() function. :(

Paul> I think for first time of modelling it's better to use YACC's sources...

Ok, as long we can do select (select...) in the final version.

>> I assume you plan to separate functions classes to handle the
>> different 'normal' sub selects like:
>> expr IN_SYM '(' subselect_stmt ')'
>> { $$= new Item_sub_select_in($1, $4); }

Paul> Yep. May be better is to enhance existing Item_... classes to handle
Paul> Item_subselect (for example, $$ = new Item_in($1, $4); ) which will works
Paul> with temporary table generated by sub-select?

Generating temporary tables only works when you don't refer to an
outside table in the sub select.  This is hard to check at the yacc

Another problem is that if $1 is not a constant, you don't want to
generate a temporary table at all. You really want to have:

WHERE 1 in (select a from table) 

to be executed as:

where exists_row(select 1 from table where a=1)

>> What do you plan to do with:
>> expr GT_SYM some_all '(' subselect_stmt ')'
>> It seams kind of stupid to have to do a separate class for
>> each operator just to be able to handle 'some_all'.

Paul> It's just a flag for subselect execution... SOME/ANY will bring 'LIMIT 1' to
Paul> sub-query, while handling of ALL is much harder.

>> We need to do some conversation to do this efficiently.
>> For example:
>> 1000 >= SOME (SELECT a FROM t1 WHERE X )
>> should of be converted to the expression:
>> If there is a row
>> (SELECT 1 from T1 WHERE (X) AND (1000 >= A) LIMIT 1)
>> return 1 else 0

Paul> Yea! I've thinking about converting original query to
Paul> 1000 < (SELECT MAX(a) FROM t1 WHERE X)
Paul> where sub-select will return just single value.

Using MAX/MIN is maybe not that a good idea as these functions may be
very slow with other table handlers.  The LIMIT 1 is implemented at
the MySQL level and is more efficient.

Paul> BTW, your solution brings one side effect: it will create dependences
Paul> between parent and child queries, which will make optimization like "execute
Paul> independed sub-queries before main query" impossible. :(((

We should never execute a sub query before the main query, becasue
if we do that, we can't handle cases like the following efficiently:

SELECT * FROM a where a>0 and (select ...) or a<0 and (select ...)

We will also get a problem with sub queries that refere to tables
outside the sub query.

Instead we should have lazy evaluation of the sub select.

First time -val() is called we should check what to do:

- Generate a single answer and keep it (const arguments)
- Generate a temporary table.
- Do it on the fly.

It's trivial to cache the last answer and keep it in the case of
all arguments are constants.

>> 1000 >= ALL (SELECT a FROM t1)
>> should be converted to
>> If there is a row
>> (SELECT 1 from T1 WHERE (X) AND (1000 < A) LIMIT 1)
>> return 0 else 1

Paul> My idea is to execute
Paul> 1000 >= (SELECT MIN(a) FROM t1 WHERE X)

See above.

>> Note that you should not use NOT as MySQL doesn't use keys when you
>> use Item_func_not().

Paul> Thanks for information.

Paul> BTW, does MIN(), MAX() functions works on non-values (strings, etc.)?

Yes, but see above about my comments about MIN/MAX.

I think instead implementing find-one and find-all functions will make
things much more efficient.