Hello Ingo,
Ingo Strüwing a écrit, Le 11/21/2008 05:59 PM:
> Hi Guilhem,
>
> thank you for your thoughts. I made a new patch that includes most of
> your wishes, as I understood them. I did also make it a bit smaller,
> leaving out debug print I don't need any more.
>
> But you write a lot of "could" and "can". Could you please be a bit
> clearer, how much you would like me to make the respective changes?
Normally my "could" or "can" are optional. They propose something, which
you use if you like.
> Please see below for comments.
>
> Guilhem Bichot, 14.11.2008 15:21:
> ...
>> Ingo Struewing a écrit, Le 11/13/2008 09:55 AM:
> ...
>>> 2730 Ingo Struewing 2008-11-13
>>> Bug#38133 - Myisamlog test fails on Windows
>>> Updating of a table from the myisam.log file was not
>>> possible in most cases.
>>> - The initialization of myisamlog set the number of usable file
>>> descriptors
>>> to one (1!) on Windows.
>>> - Every log command to a different file required a close of the
>>> previous file.
>>> After a 'close' command the closing of the previous file
>>> failed because
>>> all files were closed already and myisamlog stopped.
>>> - File locking on Windows failed for the sequence "exclusive
>>> lock, shared lock,
>>> unlock, exclusive lock". It blocked on the last try to acquire
>>> an exclusive lock.
>>> Fixed by
>>> - requesting a reasonable number of file descriptors,
>>> - before re-open, close a file only if all file descriptors are
>>> in use,
>>> - do not re-open a file, if the command is 'close',
>>> - unlock a file before every lock.
>>> per-file messages:
>>> mysys/my_lock.c
>>> Bug#38133 - Myisamlog test fails on Windows
>>> Added unlocking before locking to prevent multiple locks by the
>>> same process.
>>> sql/handler.cc
>>> Bug#38133 - Myisamlog test fails on Windows
>>> Changed abort() to DBUG_ABORT().
>> you could say why this change was needed.
>
> IMHO file comments should mainly tell what was changed. This can be
> valuable information for difficult merges.
>
> The reason for changes is usually better placed in the revision comment,
> in the code (if it can be written as a reason for that implementation,
> not for a change), or perhaps in the bug report.
>
> For a small change like this, none of these seem appropriate. Especially
> as I don't find a written reason for the same change in all the other
> uses within sql/handler.cc.
But you did the change for a good reason (popup under Windows), and it
is good to tell people about this reason, in case they wonder (at review
time, or later when they study revision history); for me the file's
revision comment is appropriate.
> Anyway, the change was unrelated to the bug. So I reverted it.
ok. It would have been fine to keep it, it's so small and unquestionable.
> ...
>>> === modified file 'mysys/my_lock.c'
>>> --- a/mysys/my_lock.c 2008-09-12 12:57:19 +0000
>>> +++ b/mysys/my_lock.c 2008-11-13 08:55:37 +0000
>>> @@ -71,13 +71,23 @@ static int win_lock(File fd, int locktyp
>>> /* write lock is mapped to an exclusive lock. */
>>> dwFlags= LOCKFILE_EXCLUSIVE_LOCK;
>>>
>>> + /*
>>> + Drop old lock first to avoid double locking.
>>> + During analyze of Bug#38133 (Myisamlog test fails on Windows)
>>> + I met the situation that the program myisamlog locked the file
>>> + exclusively, then additionally shared, then did one unlock, and
>>> + then blocked on an attempt to lock it exclusively again.
>>> + Unlocking before every lock fixed the problem.
>>> + */
>>> + (void) UnlockFileEx(hFile, 0, liLength.LowPart, liLength.HighPart,
>>> &ov);
>> I'm anxious to have a fix like this which will in theory affect the
>> entire server, only to fix something exhibited by myisamlog, without
>> having a clear understanding of what is actually wrong (what module is
>> guilty).
>> When I read the comment above, my impression is that myisamlog is doing
>> something wrong:
>> - why lock shared if already has an exclusive lock?
>> - why re-lock exclusive if already has an exclusive lock?
>> It is "expected" that myisamlog does weird things: it's code which was
>> dead for >6 years, is very little tested in mysql-test, it may have old
>> bugs or new bugs due to other code's evolution. So I'd prefer if we dug
>> into myisamlog and fixed myisamlog than the server-wide locking code.
>>
>> In other words, I believe the bug could be in the user of mysys
>> (myisamlog) more than in mysys. If the server didn't have any problem
>> with mysys of before-the-patch, given that the server is much more used
>> than myisamlog, I'd trust mysys more than myisamlog.
>
> Here is a trace of the situation with statements from the test case:
> create table t1 (a int, b varchar(100)) engine=myisam;
> insert into t1 values(1,'life'), (2,'file');
> - open and lock for write (exclusive on Windows)
> - write (two times)
> - unlock
> insert into t1 select a*5, concat("A ",b) from t1;
> - open and lock for write (exclusive on Windows)
> - open second instance and lock for read
> (lock suppressed by mi_lock_database: stronger lock exists)
> - write (two times)
> - unlock first instance (mi_lock_database wants to turn write lock
> to read lock and thus adds a shared lock on Windows. It expects
> a new lock to replace an old lock)
> - unlock second instance (final unlock by mi_lock_database. This
> unlocks the latest lock on Windows: the shared lock. The exclusive
> lock still persists.)
> update t1 set b="A knife" where a=5;
> - lock (first instance) for write (exclusive on Windows, this one
> hangs as another exclusive lock still exists.)
If I understand well, all the locks which you describe above happen in
the Server when running with --external-locking?
So you mean that the hang happens in the Server too, if using
--external-locking? It's not a myisamlog-specific problem?
> My initial proposal was to fix it in my_lock() so that its behavior is
> consistent over platforms: A new lock replaces an old one.
>
> The second best solution would be to make mi_lock_database() aware of
> the different behavior on the platforms IMHO. In case of lock type
> changes on Windows, it would first unlock.
>
> But since this does also affect the server, I implemented kind of a
> "behavior prediction engine" in myisamlog. Depending on the current
> locks on the share, and the platform, I release a lock before calling
> mi_lock_database().
Now I'm confused. If the hang can also exist in the Server, why fix only
myisamlog?
> But I wonder how you can be so sure that "the server didn't have any
> problem with mysys of before-the-patch", given that the server is much
> more used than myisamlog". How much do we test with --external-locking
> on Windows?
I had not understood that the bug was affecting the Server too.
> ...
>>> @@ -186,6 +194,14 @@ int mi_examine_log(MI_EXAMINE_LOG_PARAM (void)
>>> init_key_cache(dflt_key_cache,KEY_CACHE_BLOCK_SIZE,KEY_CACHE_SIZE,
>>> 0, 0);
>>>
>>> + /*
>>> + Initialize members of file_info that are used for pointing to
>>> + allocated memory. At the error labels we want to be able to free it.
>>> + */
>>> + file_info.name= NULL;
>>> + file_info.show_name= NULL;
>>> + file_info.record= NULL;
>> A super-critical comment: you could have put all inits on one line:
>> file_info.name= file_info.show_name= file_info.record= NULL;
>
> Here's one of these "could"s. And yes, you are right. I could have
> written it on one line. And I did know that before. It is just so that
> for my old eyes the one-line writing is worse to parse than the three
> line writing. When reading in the code below and raising the question
> "has file_info.show_name been initialized above?", I would find it among
> the one-assignment-per-line faster than among the
> as-many-assignments-as-you-can-per-line.
Ok.
> So if you "could" is weaker than "it is our coding style", then I'd just
> ignore it here.
It's weaker, it's a suggestion. It's really up to you. Ok for 3 lines.
>>> + /*
>>> + 'command' is a number that is used to index arrays. Better check
>>> + it for range.
>>> + */
>>> + if (command >= MI_LOG_END_SENTINEL)
>>> + {
>>> + my_errno= HA_ERR_WRONG_COMMAND;
>>> + fprintf(stderr,"Unknown command %u in logfile at position %s\n",
>>> + command, llstr(isamlog_filepos, llbuff));
>>> + fflush(stderr);
>>> + goto end;
>> After "end:" we don't make use of my_errno, so no need to set it.
>> Or, set it, but use "goto err", in that case you can remove the
>> fflush(stderr).
>
> Ok, removed. I trust you that nobody will ever expect
> mi_examine_log() to set my_errno on error.
I believe no one should expect this, as it's not announced by the
Doxygen comment of mi_examine_log(). For me, setting my_errno is part of
the API. If it's not documented, it does not have to be...
I looked at the two only callers of mi_examine_log() (myisamlog.c and
myisam_backup_engine.cc and they don't read my_errno.
But, I'm fine if you make mi_examine_log() set my_errno, and document
this in the function's Doxygen comment.
>
> The other option would print a second, confusing error message.
>
> ...
>>> === modified file 'storage/myisam/mi_locking.c'
>>> --- a/storage/myisam/mi_locking.c 2008-09-09 19:02:38 +0000
>>> +++ b/storage/myisam/mi_locking.c 2008-11-13 08:55:37 +0000
>>> @@ -104,7 +104,11 @@ int mi_lock_database(MI_INFO *info, int
>>> mi_print_error(share, HA_ERR_CRASHED);
>>> mi_mark_crashed(info);
>>> }
>>> - if (info->lock_type != F_EXTRA_LCK)
>>> + /*
>>> + If we have a pseudo lock (F_EXTRA_LCK) or a temporary table,
>>> + skip file locking.
>>> + */
>>> + if ((info->lock_type != F_EXTRA_LCK) &&
> !share->temporary)
>>> {
>>> if (share->r_locks)
>>> { /* Only read locks left */
>> You explained that this change's reason was:
>> "I believe, I have seen cases where a temporary table did not take a
>> F_EXTRA_LCK lock. I want to prevent locking in this case too."
>>
>> It is a desirable goal, but more of an optimization than a bug fix.
>> Also, normally, there should not be such cases: temporary tables should
>> always use F_EXTRA_LCK, like here:
>>
>> int ha_myisam::external_lock(THD *thd, int lock_type)
>> {
>> file->in_use.data= thd;
>> return mi_lock_database(file, !table->s->tmp_table ?
>> lock_type : ((lock_type == F_UNLCK) ?
>> F_UNLCK : F_EXTRA_LCK));
>> }
>>
>> There is this code at start of mi_lock_database():
>> if (lock_type == F_EXTRA_LCK) /* Used by TMP tables */
>> {
>> ++share->w_locks;
>> ++share->tot_locks;
>> info->lock_type= lock_type;
>> info->s->in_use= list_add(info->s->in_use,
> &info->in_use);
>> DBUG_RETURN(0);
>> }
>> I believe this is the optimized code which is intended for temporary
>> tables. If myisamlog does not pass F_EXTRA_LCK, it may be because it's a
>> too old program which needs some fixing. By the way, myisamlog should
>> not log anything about temporary tables at all.
>>
>> I suggest to add
>>
>> else
>> DBUG_ASSERT(!share->temporary);
>>
>> to the if() block just above, run the tests and see what happens.
>
> You mean the following?
>
> if (lock_type == F_EXTRA_LCK) /* Used by TMP tables */
> {
> ++share->w_locks;
> ++share->tot_locks;
> info->lock_type= lock_type;
> info->s->in_use= list_add(info->s->in_use, &info->in_use);
> DBUG_RETURN(0);
> }
> else if (lock_type != F_UNLOCK)
> DBUG_ASSERT(!share->temporary);
yes, that would be the correct assertion; I had missed the F_UNLOCK case.
>> Another note: isn't it worrying that if short of file descriptors,
>> close_some_file() closes a file which may be locked? I hope that the
>> mi_close() code is ready for such awkward situation, but I'm not sure.
>> Maybe close_some_file() should, if appropriate, unlock the table before
>> closing it.
>
> mi_close() contains:
>
> if (share->reopen == 1 && share->kfile >= 0)
> _mi_decrement_open_count(info);
>
> if (info->lock_type != F_UNLCK)
> {
> if (mi_lock_database(info,F_UNLCK))
> error=my_errno;
> }
>
> _mi_decrement_open_count() calls mi_lock_database(info,F_WRLCK), updates
> open_count and calls mi_lock_database(info,old_lock).
>
> This means:
>
> Even for a temporary table, a write lock is taken, regardless, what the
> previous lock was. Thereafter the old lock is taken back, regardless,
> what it was. This could make problems on Windows.
Ok, and this would extend beyond your bugfix.
In the code above I see that, if the table is still locked after
_mi_decrement_open_count(), mi_lock_database(info,F_UNLCK) is called. So
when the file is finally closed it has been unlocked already, which I
was not sure about.
>
> At least I need to remove DBUG_ASSERT(!share->temporary). It crashed in
> alter_table.test, mix2_myisam.test, temp_table.test, and
> rpl_temp_table.test.
It's excellent that you tested the idea. So indeed, it proves that
F_WRLCK is used for temp table sometimes... Then could you please change
this comment:
if (lock_type == F_EXTRA_LCK) /* Used by TMP tables */
by adding "sometimes" in the comment?
> The updated patch: http://lists.mysql.com/commits/59568
Could you please tell me against what revision id this patch is, so that
I can apply it?
I recently changed our commit emailer so that such information is
automatically included in commit mails.