Hi Rafal,
thanks for the review.
Please do not quote huge emails in full. I had to save the message to
find your comments with an editor. :-(
Please find replies below.
Rafal Somla, 20.03.2008 11:19:
...
> Ingo Struewing wrote:
...
>> ChangeSet@stripped, 2008-03-11 18:54:28+01:00, istruewing@stripped
>> +32 -0
>> WL#4259 - Debug Sync Facility
...
>> +/* Debug sync control. Referenced by THD. */
>> +struct st_debug_sync_control
>> +{
>> + st_debug_sync_action *ds_action; /* array of actions */
>
> Wouldn't it be nicer to declare it as "st_debug_sync_action ds_action[]"?
Maybe a matter of taste. Your form would look to me as an in-place
array, not a pointer to the array somewhere in memory.
Anyway I tried it and had problems to cast the malloc() result into it.
>
> Another thing - why not use dynamic array here and reuse the code from
> mysys library?
The key is performance. I could get approval for string sync points only
after proving that the cost is not too high. The main time consumer is
the binary search in that array. It would become much bigger if I had to
access the array elements via get_dynamic().
Also, I think, the code for managing this array is pretty small. It
won't profit much from changing to dynamic array.
...
>> +/**
>> + Callback from wait_for_lock() for debug sync. See thr_lock.c.
>> +
>> + @description
>> + One can use this to signal when a thread is going to wait for a
>> lock.
>
> I was bit confused looking at this callback function. Perhaps in the
> comment you could elaborate more on why you define it here and how it is
> used.
Ok. I added:
We cannot place a sync point directly in thr_lock.c. It is C code
and does not include mysql_priv.h. So it does not know the macro
DEBUG_SYNC(thd, sync_point_name). The macro needs a 'thd' argument.
Hence it cannot be used in files outside of the sql/ directory.
The workaround is to call back simple functions like this one from
non-sql/ files.
We want to allow modules like thr_lock to be used without sql/ and
especially without Debug Sync. So we cannot just do a simple call
of the callback function. Instead we provide a global pointer in
the other file, which is to be set to the callback by Debug Sync.
If the pointer is not set, no call back will be done. If Debug
Sync sets the pointer to a callback function like this one, it will
be called. That way thr_lock.c does not have an undefined reference
to Debug Sync and can be used without it. Debug Sync, in contrast,
has an undefined reference to that pointer and thus requires
thr_lock to be linked too. But this is not a problem as it is part
of the MySQL server anyway.
>
>> +
>> + @note
>> + Beware of waiting for a signal here. The lock has aquired its mutex.
>> + While waiting on a signal here, the locking thread could not aquire
>> + the mutex to release the lock. One could lock up the table
>> + completely.
>
> Hmm, I don't understand this note. But perhaps it's only me...
Ok. I added:
In detail it works so: When thr_lock() tries to acquire a table
lock, it locks the lock->mutex, checks if it can have the lock, and
if not, it calls wait_for_lock(). Here it unlocks the table lock
while waiting on a condition. The sync point is located before this
wait for condition. If we have a waiting action here, we hold the
the table locks mutex all the time. Any attempt to look at the table
lock by another thread blocks it immediately on lock->mutex. This
can easily become an unexpected and unobvious blockage. So be
warned: Do not request a WAIT_FOR action for the 'wait_for_lock'
sync point unless you really know what you do.
...
>> + for (; action < action_end; action++)
>> + {
>> + action->signal.free();
>> + action->wait_for.free();
>> + action->sync_point.free();
>> + }
>> + DBUG_VOID_RETURN;
>> +}
>> +
>
> It is just a suggestion, but adding constructor/destructor to
> st_debug_sync_action structure would mke the code look nicer and easier
> to maintain IMO.
Interesting idea. But how would I call the destructor here? I do not
create the objects with 'new'. So I cannot 'delete' them. For efficiency
I want to allocate and free the actions in bunches.
A compromise might be an action->free() method that includes the above.
...
>> +static st_debug_sync_action *debug_sync_find(st_debug_sync_action
>> *actionarr,
>> + int quantity,
>> + const char* dsp_name,
>> + uint name_len)
>> +{
>> + st_debug_sync_action *action;
>> + int low ;
>> + int high ;
>> + int mid ;
>> + int diff ;
>> +
>> + low= 0;
>> + high= quantity;
>> +
>> + while (low < high)
>> + {
>> + mid= (low + high) / 2;
>> + action= actionarr + mid;
>
> What if actionarr is NULL?
It will crash. Just like most of MySQL functions will crash if called
with a NULL 'thd' argument.
In other words, before calling debug_sync_find() I always check
ds_control->ds_active. If the algorithms in this file are correct,
'ds_active' can be non-zero only after an action array has been
successfully allocated. If they are not correct, we are lost anyway. ;-)
I'll add a DBUG_ASSERT().
...
>> +static void debug_sync_reset(THD *thd)
>> +{
>> + st_debug_sync_control *ds_control= thd->debug_sync_control;
>> + DBUG_ENTER("debug_sync_reset");
>> +
>> + /* Remove all actions of this thread. */
>> + ds_control->ds_active= 0;
>
> What if ds_control is NULL?
Same question, same answer. :)
But how can it become NULL? At thread start it is allocated, at thread
end deallocated. If allocation fails, the whole facility is disabled.
debug_sync_reset() will never be called then. Is there any other place
where thd->debug_sync_control can become NULL?
>
>> +
>> + /* Clear the global signal. */
>> + debug_sync_global.ds_signal.length(0);
>> +
>
> Don't we need to call String destructors for all stored actions?
It depends.
1. debug_sync_global.ds_signal is not an action. It is the container for
the signal sent by the last SIGNAL action.
2. The idea with all the 'String' objects is to keep the allocated
memory for the whole life time of the 'String' object to reduce the
number of allocations/frees.
String::length(0) means that the string length is set to zero, but the
allocated memory is kept for later reuse. String::copy() does not need
to do a new allocation if the new string is not longer than the longest
string assigned to this object before.
Freeing of the allocated memory is only done before the enclosing
container (action or debug_sync_global) is freed. This happens at thread
end for all actions and at server end for debug_sync_global.ds_signal.
One exception is the compression of the actions array when an action in
the middle between other actions is removed. This is done by shifting
the later actions down one slot. This overwrites the string pointers of
the String objects inside the removed action. To avoid loss of the
allocated memory, I free it first. After the shift, the last action is
duplicated. This includes the string pointers. I fix this by bzeroing
the now unused action (including the string pointers).
At the time of this writing I noticed that I could avoid the free at
this point if I copy the to-be-removed action to a temporary variable,
do the shift, and then copy the saved action back to the now free position.
...
>> +static void debug_sync_remove_action(st_debug_sync_control *ds_control,
>> + st_debug_sync_action *action)
>> +{
>
>
> DBUG_ASSERT(ds_control) ?
> DBUG_ASSERT(action) ?
Ok. Will add. Plus:
DBUG_ASSERT(ds_control == current_thd->debug_sync_control);
But please note that most of the functions in this file are static.
Their use is an internal affair. The risk of misuse is limited.
>
>> + uint dsp_idx= action - ds_control->ds_action;
>
> This code will break if we store pointer to an action, then add new
> actions to ds_control->ds_action so that it is re-allocated to a new
> location and then try to remove stored action... Should be documented as
> precondition for this function or... a safer implementation used.
It is always a risk to store a pointer to something that can be deleted,
reallocated or otherwise become invalid.
If we store a pointer to an action over code that can reallocate
ds_action, then all later uses of that pointer is broken from the
beginning. There is nothing special with debug_sync_remove_action().
But if we can have a safer implementation with same or better
performance, that would be good anyway.
...
>> + /* Free strings before copying them over. */
>> + debug_sync_free_actions(action, 1);
>> +
>> + /* Move actions down. */
>> + memmove(ds_control->ds_action + dsp_idx,
>> + ds_control->ds_action + dsp_idx + 1,
>> + (ds_control->ds_active - dsp_idx) *
>> + sizeof(st_debug_sync_action));
>> +
>> + /* Remove double references of String pointers. */
>> + bzero(ds_control->ds_action +
>> + ds_control->ds_active, sizeof(st_debug_sync_action));
>
> To my taste it is too dangerous to move constructed objects like this.
> My objections are:
>
> - I'm not sure if C++ compilers guarantee (and will guarantee that in
> the future) that if x is an object of type X and you move sizeof(X)
> bytes from &x to a different addres b then you will get a valid instance
> of X object at b
>
> - In the future one can change either st_debug_sync_action or String
> class. A possible change is to add member with non-trivial copy
> constructor (e.g. with reference couts). Then above code will break things.
In theory you might be right.
>
> Btw. Have you considered using some pointer structure (binary search
> tree or even a simple linked list) for storing actions. Then there will
> be no need to move objects around.
Yes, I did. But that would introduce an additional layer of indirection
in the binary search algorithm, which is the most performance critical
piece of this patch.
But I could replace String by something handcrafted that doesn't use
constructors.
...
>> +static st_debug_sync_action *debug_sync_get_action(THD *thd,
>> + const char *dsp_name,
>> + uint name_len)
>> +{
>> + st_debug_sync_control *ds_control= thd->debug_sync_control;
>> + st_debug_sync_action *action;
>> + DBUG_ENTER("debug_sync_get_action");
>> + DBUG_PRINT("debug_sync", ("sync_point: '%.*s'", (int) name_len,
>> dsp_name));
>> +
>
> DBUG_ASSERT(ds_control) ?
Ok.
...
>> + void *new_action= my_realloc(ds_control->ds_action,
>> + new_alloc *
>> sizeof(st_debug_sync_action),
>> + MYF(MY_WME | MY_ALLOW_ZERO_PTR));
>
> Is it the case that if my_realloc moves the memory to a different
> location it copies all the bytes from the old location?
Yes.
...
>> +/**
>> + Set a debug sync action.
...
>> + @description
>> + This is called from the debug sync parser.
>
> Perhaps elaborate more what does that mean to "set an action".
I changed:
@description
This is called from the debug sync parser. It arms the action for
the requested sync point. If the action parsed into an empty action,
it is removed instead.
Setting an action for a sync point means to make the sync point
active. When it is hit it will execute this action.
Before parsing, we "get" an action object. This is placed at the
end of the thread's action array unless the requested sync point
has an action already.
Then the parser fills the action object from the request string.
Finally the action is "set" for the sync point. If it was parsed
to be empty, it is removed from the array. If it did belong to a
sync point before, the sync point becomes inactive. If the action
became non-empty and it did not belong to a sync point before (it
was added at the end of the action array), the action array needs
to be sorted by sync point.
If the sync point name is "now", it is executed immediately.
>
>> +*/
>> +
>> +static bool debug_sync_set_action(THD *thd, st_debug_sync_action
>> *action)
>> +{
>> + st_debug_sync_control *ds_control= thd->debug_sync_control;
>> + bool is_dsp_now= FALSE;
>> + DBUG_ENTER("debug_sync_set_action");
>> + DBUG_ASSERT(ds_control);
>
> You also assume action != NULL.
Ok.
...
>> + DEBUG_SYNC(thd, "now");
>> + /* If HIT_LIMIT is 1, return error to suppress send_ok(). */
>
> I don't understand the above comment...
I changed to:
If HIT_LIMIT for sync point "now" was 1, the execution of the sync
point decremented it to 0. In this case an error message was
reported and the statement killed.
If a statement reports an error, it must not call send_ok().
To suppress a call to send_ok(), we return TRUE from this
function. The killed statement is the indication for the hit limit
error.
...
>> +static char *debug_sync_token(char **token_p, uint *token_length_p,
>> char *ptr)
>> +{
>
> How about asserting here that token_p, token_length_p and ptr are not
> NULL? The code below uses that assumption.
Ok. I'll also add DBUG_ASSERT(thd) to all functions that use thd. And
asserts to all pointer input parameters of all functions.
Do you write your own code so that you assert everything? I didn't have
the impression when reviewing one of your patches recently.
...
>> +/**
>> + Evaluate a debug sync action string.
...
>> + @description
>> + This is called when the DEBUG_SYNC system variable is set.
>> + Parse action string, build a debug sync action, activate it.
>
> Mention that the result of parsing is stored inside thd's action table.
I added:
Before parsing, we "get" an action object. This is placed at the
end of the thread's action array unless the requested sync point
has an action already.
Then the parser fills the action object from the request string.
Finally the action is "set" for the sync point. This means that the
sync point becomes active or inactive, depending on the action
values.
>
>> +
>> + @note
>> + The input string needs to be ASCII NUL ('\0') terminated. We split
>> + nul-terminated tokens in it without copy.
>
> Perhaps add here all the other assumptions about action_str. For
> example, that it must be in system character set.
I added:
For more constraints for the string see the function comment of
debug_sync_token().
So I avoided "comment duplication". ;-)
>
>> +*/
>> +
>
> This is the grammar for command strings which I deduced from
> reverse-engenieering the code (there are additional constraints, for
> example EXECUTE must be preceeded by SIGNAL or WAIT_FOR:
>
> RESET
> <point name> TEST
> <point name> CLEAR
> <point name> [SIGNAL <signal name>] [WAIT_FOR <signal name>
> [TIMEOUT
> <seconds>]]
> [EXECUTE <count>] [HIT_LIMIT <count>]
>
> Would be good if this grammar is documented somewhere. In particular,
> the order of actions is fixed which is not clearly documented in the
> existing documentation.
The big comment at top of debug_sync.cc explains the syntax, though not
in the from you suggested. Also it points to the following resources,
which repeat the syntax explanation:
==== Further reading ====
For a discussion of other methods to synchronize threads see
http://forge.mysql.com/wiki/MySQL_Internals_Test_Synchronization
For complete syntax tests, functional tests, and examples see the test
case debug_sync.test.
See also worklog entry WL#4259 - Test Synchronization Facility
I added:
==== Formal Syntax ====
The string to "assign" to the DEBUG_SYNC variable can contain:
{RESET |
<sync point name> TEST |
<sync point name> CLEAR |
<sync point name> {{SIGNAL <signal name> |
WAIT_FOR <signal name> [TIMEOUT <seconds>]}
[EXECUTE <count>] &| HIT_LIMIT <count>}
Here '&|' means 'and/or'. This means that one of the sections
separated by '&|' must be present or both of them.
>
>> +static bool debug_sync_eval_action(THD *thd, char *action_str)
>
> How about calling it debug_sync_parse_action instead?
It does not only parse the string, but also [de-]activate the sync point
or executes other functions. "Evaluate" is usually something like parse
+ execute.
>
>> +{
>> + st_debug_sync_action *action= NULL;
>> + const char *errmsg;
>> + char *ptr;
>> + char *token;
>> + uint token_length;
>> + DBUG_ENTER("debug_sync_eval_action");
>> +
>> + /*
>> + Get debug sync point name. Or a special command.
>> + */
>> + if (!(ptr= debug_sync_token(&token, &token_length, action_str)))
>> + {
>> + errmsg= "Missing synchronization point name";
>
> Why error messages are not internationalized?
Too many new error messages for a debugging facility. Experts, who are
able to make use of this facility, should know English language. We
often use such combinations of of an internationalized error message
(here ER_PARSE_ERROR) with English details. See for example
mysql_binlog_send().
...
>> + /*
>> + Now check for actions that define a new action.
>> + Initialize action. Do not use bzero(). Strings may have malloced.
>> + */
>> + action->activation_count= 0;
>> + action->hit_limit= 0;
>> + action->execute= 0;
>> + action->timeout= 0;
>> + action->signal.length(0);
>> + action->wait_for.length(0);
>> +
>
> Use of constructor would be nice...
But I don't construct action here. I initialize it. And only part of it.
...
>> +static void debug_sync_execute(THD *thd, st_debug_sync_action *action)
>> +{
...
>> + if (action->execute)
>> + {
...
>> + action->execute--;
>
> Wouldn't it be better to decrease the counter at the very beginning?
> This way we don't have to worry about early returns and such...
Ok.
...
>> +++ b/sql/mysql_priv.h 2008-03-11 18:54:24 +01:00
...
>> +/* Command line option --debug-sync. See mysqld.cc. */
>> +extern uint opt_debug_sync_timeout;
>
> Comment should say "--debug-sync-timeout".
Ok.
...
Regards
Ingo
--
Ingo Strüwing, Senior Software Developer
MySQL GmbH, Dachauer Str. 37, D-80335 München
Geschäftsführer: Kaj Arnö - HRB München 162140