List:Commits« Previous MessageNext Message »
From:Dmitri Lenev Date:June 20 2007 7:57pm
Subject:Re: bk commit - 4.1 tree (istruewing:1.2630) BUG#26379
View as plain text  
Hello Ingo!

* ingo@stripped <ingo@stripped> [07/05/21 19:55]:
> ChangeSet@stripped, 2007-05-21 17:48:22+02:00, istruewing@stripped +23 -0
>   Bug#26379 - Combination of FLUSH TABLE and REPAIR TABLE corrupts
>               a MERGE table
>   
>   This bug report revealed three problems:
>   
>   1. A thread trying to lock a MERGE table performs busy waiting while
>      REPAIR TABLE or a similar table administration task is ongoing on
>      one or more of its MyISAM tables.
>   
>   2. A thread trying to lock a MERGE table performs busy waiting until
>      all threads that did REPAIR TABLE or similar table administration
>      tasks on one or more of its MyISAM tables in LOCK TABLES segments
>      do UNLOCK TABLES. The difference against problem #1 is that the
>      busy waiting takes place *after* the administration task. It is
>      terminated by UNLOCK TABLES only.
>   
>   3. Two FLUSH TABLES within a LOCK TABLES segment can invalidate the
>      lock. This does *not* require a MERGE table. The first FLUSH
>      TABLES can be replaced by any statement that requires other
>      threads to reopen the table. In 5.0 and 5.1 a single FLUSH TABLES
>      can provoke the problem.
>   
>   Problem #1 and #2 are fixed by opening the MERGE child tables
>   through the table cache. So they are visible to other threads
>   and even for certain statements in the same thread. The child
>   tables are chained to the parent table, but do not appear in
>   thd->open_tables nor thd->locked_tables. This does also fix
>   the bugs 8306, 25038, 25700, 26377, 26867, and 27660.
>   I reverted the preliminary fix for 8306.

Altough, as we have discussed on IRC, IMO, such solution is correct
in principle (in other words I agree that we should open child tables
through the table cache) I still think that there is a problem with
the particular way in which you do this. More importantly I think
that the fact that you make table-cache aware about relationship
between merge and child tables will cause us troubles long term.

First problem is caused by the fact that you break meta-data-locking
protocol when you call open_table() function from myrg_open().
Since the former function releases LOCK_open in some cases you may
end-up in situation when some connection has half-open merge table
but does not hold shared meta-data lock on table (i.e. it is not
present in table cache yet) and there is no protection from
LOCK_open. Indeed, this opens gap for various dangerous situations.
Also meta-data-locking protocol assumes that if during obtaining
shared meta-data lock on table (opening it) we discover that there
is active or pending exclusive lock on it (name-lock) we should
release all meta-data locks held by current thread (close all open
tables) and try to obtain them again. Failure to follow this rule
will lead to deadlock. Of course, these issues are small enough to
be fixed without significant digression from your patch, but please
read further.

Problem with making table-cache aware of relationship between merge
and child tables is that by doing this your actually introduce new
kind of object "merge-table-with-all-its-child-tables" which should
be treated more or less atomically by the table cache. This works
fine as long as we use table cache for both meta-data locking and
caching of table objects. But as soon as we want to separate
meta-data locking from table cache (and we are going to do this to
fix certain bugs) we will encounter some issues.

To begin with we will be forced to support this concept in both
meta-data locking subsystem and table cache. And even then there
is no guarantee that the whole scheme will work, since currently
approach with complex "merge-and-its-children" object heavily
relies on the fact that opening of table and obtaining meta-data
lock for it is a single atomic event. Only because of this we are
able start opening a table, discover that table is merge table,
open its children and construct object for the table cache which
also serves as comlex meta-data lock atomically under protection
of LOCK_open. With separation of meta-data locking and table caching
we will be no longer to do this as we will have to obtain meta-data
lock on the table before trying to open it. So it is likely that
we won't be able to create complex lock on the merge table and all
its children atomically.


One of possible approaches to this problem is to move opening of
child tables from myrg_open() up to the open_tables() layer and
make handling of child tables for the merge table much similar
to the handling of underlying tables in a view.

Here is the approximate list of steps which will be required to
do this (each step is prefixed with tentative list of functions
affected by it):


1) myrg_open():

   Change this function in such way that it will only read list of
   child tables from the .MRG and associate this list with handler
   or even TABLE object (this still can be done with the help of
   callback function). The rest of the code from this function,
   which assumes that child tables are already open, should be
   moved to separate function which should be called once SQL-layer
   will open them.

2) ha_myisammrg::extra():

   Introduce a new value of parameter for handler::extra() which will
   inform handler that SQL-layer opened child tables so handler can
   use them. Upon this call handler also may perform those checks
   that we have moved to separate function during the first step.
   Also we might need a separate parameter value that will tell
   handler that it couldn't use child tables as they are about to be
   closed. It is probably a good idea to add asserts which will
   ensure that most of ha_myisammrg methods are not called outside
   of these bracketing calls of handler::extra(). 

3) open_tables()/close_thread_tables() (tentatively):

   Introduce code at SQL-layer that for each merge table add its
   child tables to statement's global table list for further opening,
   much in the same way it is done now in mysql_make_view() for views.
   But unlike for views we also have to remove child tables from the
   statement table list to make the process prepared-statement safe.
   
   Two obvious (but not necessarily the best) places for this are 
   open_tables() and close_thread_tables(). In the first function we
   can add all child tables for each merge table being opened to the
   statement's table list, when doing this we might want to keep
   pointers to the first and last of these tables associated with
   merge table to simplify their removal afterwards. When we open
   last child table for a merge table we should call handler::extra()
   for this merge table to inform handler that all underlying tables
   are open. Alternatively we can do this for all merge tables at
   the end of open_tables() call. During the close_thread_tables()
   we should pass through all merge tables and use pointers stored
   earlier to exclude child tables from the statement's table list.
   We also will have to take some measures to exclude those child
   tables from the privilege checks and from any chance to be used
   by statement under explicit or implicit LOCK TABLES mode (those
   statements should use merge table).
   
4) ha_myisammrg::lock_count()/store_lock() or lock_tables():

   Change ha_myisammrg::lock_count() and store_lock() to do nothing
   and rely on SQL-layer locking child tables. Another possible
   alternative is to change lock_tables() to ignore child tables.

5) open_ltable(), mysql_alter_table(), mysql_checksum_table(), ... :

   Ensure that open open_ltable() returs appropriate error message
   for merged tables. Replace calls to open_ltable() in places which
   really require opening merge tables with calls to open_and_lock_tables()
   (AFAIU we have only two such places: mysql_alter_table() and
   mysql_checksum_table()). We also might have to adjust other code
   in those places as they are likely to rely that there is only one
   table open.
   
6) reopen_table(), reopen_tables():

   Remove those usages of reopen_table()/reopen_tables() which are not
   associated with reopening table under LOCK TABLES (i.e. get rid of
   its usage within wait_for_tables()). This will simplify changing of
   reopen_table() in such way that it will safely handle merge tables.
   In particular I think we should disallow situations when we will
   reopen merge tables in place of non-merge tables or merge tables
   with different child tables (this can lead to deadlock).
   
7) reopen_name_locked_table():

   Finally we should adjust reopen_name_locked_table() in such way
   that it will be able to handle merge tables. I suspect that it
   in places where we use this function it might be enough to open
   only merge table and there is no need to open child tables.


Of course, changes described above are quite intrusive so probably it
makes sense to do them only starting from 5.1.
Also it might be a good idea to do changes described in points 1-4
and in points 5, 6 and 7 as separate patches. This should simplify
review and make the changes more manageable.

As always I (as well as whole Runtime team) will be happy to
provide any additional explanations. We also can lend you a hand
if you encounter some problems with implementation.

What do you think about all this ?

-- 
Dmitri Lenev, Software Developer
MySQL AB, www.mysql.com

Are you MySQL certified?  http://www.mysql.com/certification
Thread
Re: bk commit - 4.1 tree (istruewing:1.2630) BUG#26379Dmitri Lenev20 Jun
  • Re: bk commit - 4.1 tree (istruewing:1.2630) BUG#26379Ingo Strüwing21 Jun
  • Re: bk commit - 4.1 tree (istruewing:1.2630) BUG#26379Ingo Strüwing13 Jul
  • reopen_tables() [Re: bk commit - 4.1 tree (istruewing:1.2630) BUG#26379]Ingo Strüwing29 Aug
    • Re: reopen_tables() [Re: bk commit - 4.1 tree (istruewing:1.2630) BUG#26379]Dmitri Lenev7 Sep
      • Re: reopen_tables() [Re: bk commit - 4.1 tree (istruewing:1.2630)BUG#26379]Ingo Strüwing7 Sep
        • Re: reopen_tables() [Re: bk commit - 4.1 tree (istruewing:1.2630) BUG#26379]Dmitri Lenev11 Sep
Re: bk commit - 4.1 tree (istruewing:1.2630) BUG#26379Dmitri Lenev25 Jul
  • Re: bk commit - 4.1 tree (istruewing:1.2630) BUG#26379Ingo Strüwing26 Jul