List:Internals« Previous MessageNext Message »
From:Ingo Strüwing Date:December 11 2007 9:58pm
Subject:Re: Proposal for a Test Synchronization Facility
View as plain text  
Hi Martin,

Martin Friebe, 11.12.2007 14:24:
...
>   The syncronisation-point 
> is a fixed point (like a hook) in the code, the user variable is only used to 
> toggle it.

right! That's an excellent brief description, though I would say
"control" instead of "toggle". Thanks.

Regarding the previous mentioned BACKUP_BREAKPOINT facility. Summary: I
was not able to meet my synchronization needs with it. Here the full story:

BACKUP_BREAKPOINT
-----------------
It is not usable for me. I have to emphasize and explain "for me". I do
most of debugging by using the DBUG trace facility. I start the server
with --debug and see the path of code execution in the trace file.
BACKUP_BREAKPOINT is based on DBUG_EXECUTE_IF("backup_debug",...). That
is it does not work when running with just --debug. It requires
--debug=d,backup_debug. But when specifying one or more keyword
modifiers of the 'd' flag, then all non-listed keywords are disabled.
This means, I do factually disable tracing when enabling
BACKUP_BREAKPOINT, or I have to give a long list of keywords to the
option. This is too inconvenient for me to take the facility as useful
*for me*. Other developers may prefer using the debugger for most
debugging. They don't have this problem.

DBUG_SYNC_POINT
---------------
But this isn't the end of the story. BACKUP_BREAKPOINT uses the
DBUG_SYNC_POINT facility internally. This facility exists since MySQL
4.0 (!) (and I didn't know it. Sigh. Though it is so widely used. ;-)
There is one single call of it in sql_repl.cc). It follows a similar
approach as my MYSQL_TEST_SYNC proposal. One can set synchronization
points in the code just like I proposed:

    DBUG_SYNC_POINT("debug_lock.created_file_event",10);

But this facility is not based on user variables, but on user locks. So
it is kind of a combination of your description of GET_LOCK() etc and my
MYSQL_TEST_SYNC proposal. But it behaves quite differently:

When code execution reaches DBUG_SYNC_POINT, any lock of the thread is
released and the named user lock "debug_lock.created_file_event" is
tried to be acquired for 10 seconds, but only if it is in use by another
thread.

This can be used as a "signal". The thread acquires a lock (the "signal"
lock) and releases it implicitly when reaching the sync point. The other
thread, which tried to get the "signal" lock after this thread, gets the
lock at the same moment and can continue.

It can be used as a "wait". The other thread has the "sync point" lock
("debug_lock.created_file_event" in this case) and this thread blocks on
it in the sync point.

Unfortunately I was not able to figure out, how to use it for "signal"
*plus* "wait". While the other thread could have the "sync point" lock
and this thread have the "signal" lock, and hence reaching the sync
point would release the "signal" lock and wait on the "sync point" lock,
the other thread would not be able to wait on the "signal" lock, because
it has the "sync point" lock. A thread can have one user lock only. When
the other thread tries to wait for the "signal" lock, it implicitly
releases the "sync point" lock. This would be ok if one could be sure
that this thread reached the sync point before the other thread releases
the "sync point" lock. Otherwise no wait would happen at the sync point.
The test would not test what it should test.

A possible workaround might be a third thread, which takes the "sync
point" lock in the beginning and releases it at the right moment. But I
was not able to solve my problem with it:

CREATE TABLE t1 (c1 INT) ENGINE=MyISAM;
CREATE TABLE m1 (c1 INT) ENGINE=MERGE UNION=(t1) INSERT_METHOD=LAST;
        thread2
        INSERT INTO m1 VALUES (2); This ought to wait after
          open_tables() and before attach_merge_children().
thread1
FLUSH TABLE m1; This ought to wait before flushing until thread2 is at
  the mentioned sync point. Then it can proceed. A positive test result
  would be that it does not crash when it aborts thread2s locks.
  Somewhen after lock aborting thread2 can also proceed.

I tried:

I placed DBUG_SYNC_POINT("debug_lock.before_myisammrg_attach", 10) at
the beginning of ha_myisammrg::attach_children().

                thread3
                GET_LOCK("debug_lock.before_myisammrg_attach", 10);
        thread2
        GET_LOCK("reach.before_myisammrg_attach", 10);
        INSERT INTO m1 VALUES (2);
thread1
GET_LOCK("reach.before_myisammrg_attach", 10);
FLUSH TABLE m1;
                thread3
                RELEASE_LOCK("debug_lock.before_myisammrg_attach");

The result was that thread2, at the sync point, released
'reach.before_myisammrg_attach' and waited for
'debug_lock.before_myisammrg_attach'. thread1 got the lock and ran the
flush at the right moment. But then the flush blocked, waiting for
thread1 to close the flushed table. The situation resolved after the 10
second timeout in thread2. What I would need here is a sync point in
flush tables that signals after lock aborting. This would need to let
thread3 proceed with releasing thread2s sync point lock. Since thread3
cannot wait for another user lock, I would need another SQL lock and a
fourth thread.

I stopped at this point. I think the DBUG_SYNC_POINT facility has the
potential to solve many simple synchronization needs. But it requires a
complexity of the test case, that makes it too difficult to work with.

One remark for a user of this facility. A sync point generally releases
 any user lock (of the thread that hits it). Even if it waits for its
lock and got it finally, the same lock is immediately released again. In
most cases this is probably not a disadvantage. But one should keep it
in mind.

But one real problem of the facility is that it always releases any user
lock. If we would use it widely and place several sync points in the
code, we achieve a high probability that test cases, which want to use
explicit user locks won't work any more. The locks might be released too
often implicitly by the sync points.

And yet another problem is that it might be difficult to write test
cases that work on a debug server as well as on a non-debug server. To
some extent the same is also true for my test synchronization facility,
though I was able to solve it with limited effort. While I did not
really try it with the DBUG_SYNC_POINT facility, I fear that it would be
much more difficult to avoid stalling test cases with the locks, which
cannot be released implicitly in a non-debug server. To stress the
difference, I want to note again that in the test synchronization
facility any waiting is only done inside the synchronization point. In a
non-debug server these points don't exists. Hence there is no waiting.
In the DBUG_SYNC_POINT facility user locks need to be taken/waited for
through SQL statements. They exists whether run on a debug or a
non-debug server.

Regards
Ingo
-- 
Ingo Strüwing, Senior Software Developer
MySQL GmbH, Dachauer Str. 37, D-80335 München
Geschäftsführer: Kaj Arnö - HRB München 162140
Thread
Proposal for a Test Synchronization FacilityIngo Strüwing10 Dec
  • Re: Proposal for a Test Synchronization FacilityMartin Friebe10 Dec
    • Re: Proposal for a Test Synchronization FacilityIngo Strüwing11 Dec
      • Re: Proposal for a Test Synchronization FacilityMartin Friebe11 Dec
Re: Proposal for a Test Synchronization FacilityIngo Strüwing11 Dec