rHello Ingo,
Ingo Strüwing a écrit, Le 11/29/2008 04:07 PM:
> Hi Guilhem,
>
> Guilhem Bichot, 27.11.2008 21:42:
> ...
>> Ingo Strüwing a écrit, Le 11/21/2008 05:59 PM:
> ...
>>> Guilhem Bichot, 14.11.2008 15:21:
>>> ...
>>>> Ingo Struewing a écrit, Le 11/13/2008 09:55 AM:
>>> ...
>>>>> 2730 Ingo Struewing 2008-11-13
>>>>> Bug#38133 - Myisamlog test fails on Windows
> ...
>>>>> sql/handler.cc
>>>>> Bug#38133 - Myisamlog test fails on Windows
>>>>> Changed abort() to DBUG_ABORT().
> ...
>> But you did the change for a good reason (popup under Windows), and it
>> is good to tell people about this reason, in case they wonder (at review
>> time, or later when they study revision history); for me the file's
>> revision comment is appropriate.
>
> I learned, how to configure Windows to avoid the popup.
>
> IMHO it is too tedious to get at the file comments. I won't expect
> reasoning of changes there.
>
>>> Anyway, the change was unrelated to the bug. So I reverted it.
>> ok. It would have been fine to keep it, it's so small and unquestionable.
>
> Not that unquestionable. You think it requires a reasoning. Or
> explanation at least. And also it is not required with proper Windows
> configuration.
Proper Windows configuration is not obivous to set up (when one does not
have his Wlad Reference Manual around :-).
I don't insist for a revision comment, do as you like.
> ...
>> If I understand well, all the locks which you describe above happen in
>> the Server when running with --external-locking?
>> So you mean that the hang happens in the Server too, if using
>> --external-locking? It's not a myisamlog-specific problem?
>
> Yes. Adding --external-locking to myisamlog-master.opt makes the server
> hang in my_lock() after the same locking pattern that made myisamlog to
> hang.
And THAT is the news.
> ...
>> Now I'm confused. If the hang can also exist in the Server, why fix only
>> myisamlog?
>
> You know: The reviewer rules. Here again a quote from your review:
> "
>>>> In other words, I believe the bug could be in the user of mysys
>>>> (myisamlog) more than in mysys. If the server didn't have any problem
>>>> with mysys of before-the-patch, given that the server is much more used
>>>> than myisamlog, I'd trust mysys more than myisamlog.
> "
Yes but the confused reviewer should not rule.
Here's how I got confused:
- the bug's title is about a myisamlog.test failure
- this inclined me to think that the patch was fixing only myisamlog
problems and that's all
- I didn't realize, until your last emails, that the server was affected too
- before I realized that, I reasoned that as it's to fix only myisamlog
problems, it sounds suspicious to have to change my_lock.c, given that
server uses my_lock.c quite a lot and does not have problems.
Now, the perspective has changed: I realize that patch is about fixing a
bug which exists in server too, the reasoning collapses.
> Personally, I'm not a friend of such arguments. For me it has
> similarities to "I don't fasten my seat belt, I never had an incident."
> Or "I don't need virus protection. I have never been infected". I admit
> that it may be an unfair analogy. And I respect your longer experience
> with MySQL. So I did just fix it in myisamlog.
Given the new facts which you explained... I am sorry... I am going to
ask you to instead exhume your fix of my_lock.c as in
http://lists.mysql.com/commits/58606 .
By the way, that fix to my_lock.c which unlocks before locking: this
opens the possibility of a race: does the existing usage of my_lock
expect that putting a 2nd lock atomically replaces the first lock (with
no window for any other process acquiring the lock during replacement)?
You mentioned that problem in a comment in
http://bugs.mysql.com/bug.php?id=41124
Imagine that process P1 does my_lock to change from write to read lock:
it first unlocks the file (in your patch), and before it can re-lock it
for read, then another process P2 comes and locks the file and reads its
state for example. When P2 is done, P1 can finish its my_lock (get a
read lock) and then unlock it in mi_lock_database(F_UNLCK), and what
worries me is that things happen in that last call: it may write the
state, but this state is out-of-date (P2 has done some changes to the
table) so P2's state is going to be overwritten by P1's old state.
Perhaps a way to test this is to have a test with external-locking,
multiple clients, and a DBUG_EXECUTE_IF sleep between unlock and re-lock
in my_lock, to see if a client going through the window hurts.
You also showed this scenario:
<quote>
create table t1 (a int, b varchar(100)) engine=myisam;
insert into t1 values(1,'life'), (2,'file');
- open and lock for write (exclusive on Windows)
- write (two times)
- unlock
insert into t1 select a*5, concat("A ",b) from t1;
- open and lock for write (exclusive on Windows)
- open second instance and lock for read
(lock suppressed by mi_lock_database: stronger lock exists)
- write (two times)
- unlock first instance (mi_lock_database wants to turn write lock
to read lock and thus adds a shared lock on Windows. It expects
a new lock to replace an old lock)
- unlock second instance (final unlock by mi_lock_database. This
unlocks the latest lock on Windows: the shared lock. The exclusive
lock still persists.)
</quote>
And this confused me: "(lock suppressed by mi_lock_database: stronger
lock exists)": I see in mi_lock_database() that if info->lock_type is
write-lock, and a read-lock is requested, mi_lock_database() does not
suppress that read-lock request, it rather downgrades write-lock to
read-lock.
And there is a comment in mi_lock_database() which says "mysqld does not
turn write locks to read locks", has it missed external-locking?
I wondered if fcntl(), which my_lock.c uses, made this guarantee of
atomicity when replacing an old lock, but "man fcntl" is unclear, and so
is "man lockf".
And I don't see in
http://msdn.microsoft.com/en-us/library/aa365203(VS.85).aspx
a way to atomically convert the old lock to the new lock.
I fear this needs some investigation: does the race window already exist
in my_lock on non-Windows? Is it a bug?
I however understand that this is going out of scope of BUG#38133. So
I'd go with your my_lock() fix, but putting all worrying info as
comments in BUG#41124 (like you started to do, quite well). You don't
have to investigate callers of my_lock() and test the race window if it
would take you significant time, you can leave this as a todo in the bug
report.
> OTOH I'm still having the concern that our test coverage is weak. Facing
> the complexity of the system, we probably can't do much better. But
> anyway, an argument like "it ran without problems for many years"
> doesn't count much. The only test case that uses --external-locking is
> backup_myisam1. And this is 6.0 only. And even this test doesn't run an
> INSERT ... SELECT, which would show the problem. No wonder that the
> problem has lived undetected for years.
>
> ...
>> I had not understood that the bug was affecting the Server too.
>
> Neither did I. My only worry was that my_lock() behaves differently on
> different platforms. I thought that the main task of mysys is to
> compensate differences between platforms.
Right
> mi_lock_database() tells a clear story that it expects from my_lock()
> that a new lock replaces an existing lock. This isn't true on Windows.
Right
> I reported it as Bug#41124 - Server hangs on Windows with
> --external-locking after INSERT...SELECT.
Thank you
> ...
>>> Ok, removed. I trust you that nobody will ever expect
>>> mi_examine_log() to set my_errno on error.
>> I believe no one should expect this, as it's not announced by the
>> Doxygen comment of mi_examine_log(). For me, setting my_errno is part of
>> the API. If it's not documented, it does not have to be...
>
> It's always amazing to hear an experienced MySQL developer say that. I
> have a limited comprehension for Monty's standpoint: "The source code is
> the ultimate documentation". But Doxygen? How many MyISAM functions have
> Doxygen comments? Not even mysys has complete Doxygen docs. At least I
> don't know about them. It's often unclear, what to expect from mysys
> functions. See our problem with my_lock().
Ah wait, you removed a piece of my reply, which was "I looked at the two
only callers of mi_examine_log() (myisamlog.c and
myisam_backup_engine.cc and they don't read my_errno."
That is:
- in code currently nobody reads my_errno after mi_examine_log()
- so practical use of the API does not require it
- and theoretical use of the API (=comments) neither
- so we are free to declare that mi_examine_log() does not need to set
my_errno: doesn't break code and API.
But if you want to set my_errno, I'm fine!
I do dream of a day where all functions would announce in their preamble
comment what they take in input, what they do, what they return in
output, including my_errno (which is a pretty critical variable in some
code like Maria, unset it and things break badly).
>
> ...
>>> At least I need to remove DBUG_ASSERT(!share->temporary). It crashed in
>>> alter_table.test, mix2_myisam.test, temp_table.test, and
>>> rpl_temp_table.test.
>> It's excellent that you tested the idea. So indeed, it proves that
>> F_WRLCK is used for temp table sometimes... Then could you please change
>> this comment:
>> if (lock_type == F_EXTRA_LCK) /* Used by TMP tables */
>> by adding "sometimes" in the comment?
>
> I'll change to "Temporary tables are usually locked with F_EXTRA_LCK.
> But there are cases, where they use other lock types too."
Perfect.
> "Sometimes used by TMP tables" might raise the wrong assumption that it
> is common to lock them with other types. Or that other table types might
> use that lock type too.
>
> BTW, I wonder why _mi_decrement_open_count() needs to lock temporary
> tables at all?
It does not need to lock the file, but who knows, mi_lock_database() is
more than a file lock, maybe it does important things; I'd stay away
from optimizations.
--
Mr. Guilhem Bichot <guilhem@stripped>
Sun Microsystems / MySQL, Lead Software Engineer
Bordeaux, France
www.sun.com / www.mysql.com