List:Commits« Previous MessageNext Message »
From:Ingo Strüwing Date:May 6 2009 3:59pm
Subject:Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780
View as plain text  
Hi Davi,

Davi Arnaut, 06.05.2009 01:03:

> On 5/5/09 4:33 PM, Ingo Strüwing wrote:
>> Davi Arnaut, 05.05.2009 16:21:
>> ...
>>> On 2/25/09 12:47 PM, Ingo Struewing wrote:
>> ...
>>>>    2779 Ingo Struewing    2009-02-25
>>>>         Bug#37780 - main.kill fails randomly
>>>>         The test case main.kill did not work reliably.
>>> Well, this is not exactly safe or portable either. What actually appears
>>> to do the cancellation tricky on some platforms is the socket shutdown
>>> in vio_close.
>>> Perhaps it's too abstract.. i think we should add a explicit
>>> cancellation operation to Vio to better implement and document the
>>> quirks. What do you think?
>> Please be more verbose. Why is it not exactly safe? If the connection is
>> closed, how can the thread miss it's kill and continue to read more
>> statements? Why is it not portable? If "the socket shutdown in
> You should know this answer. Testing on Linux, for example, what does
> the trick is a combination of shutdown and close. Simply closing won't
> interrupt the thread. On the other hand, just a close causes a interrupt
> on Mac. So, this is definitely not portable, doesn't appear to be widely
> documented... so it all boils down to a best effort.
>> vio_close" is not portable, we shouldn't use vio_close() at all. But how
>> should we then close a connection at session end?
> I said earlier that the ultimate and right solution is using
> notification (through a event fd), but i think we deemed it too complex
> for the scope of this bug fix.
>> Please note that I got rid of the idea to try shutdown of one direction
>> only. I'm not fond of proving that this is portable. But if a full close
>> of a connection isn't portable either, how could MySQL use connections
>> at all?
> I made a simple using a thread that connected to a server and slept
> reading from the socket. Once the thread was sleeping, the main thread
> would call close on the socket. Only with the close, nothing would
> happen and the thread would still wait for i/o. A shutdown followed by a
> close caused the thread to be interrupted.
> It would be nice if you could confirm my results...

Yes, I confirm that a mere close(2) of a socket does not awake another
thread that is already waiting in read(2) on the socket.

If I eliminate sutdown() from vio_close() and thr_alarm_kill() from
THD::awake(), kill.test hangs due to the non-interrupted read(2).

However, I doubt that this test is relevant here. We should distinguish
two cases:

1. connection is waiting in read(2),
2. connection is on its way through the code from the last check of
thd->killed up to that read(2).

THD::awake() uses two mechanisms to awake a connection from its
receiving state:

A. thr_alarm_kill(), which is basically pthread_kill(target_thread,
SIGALRM), which is meant to awake the target thread from 1.
B. close_active_vio(), which is meant to prevent the target thread to
fall asleep if it is in state 2. at the moment.

If we cannot rely on thr_alarm_kill() to awake a thread from read(), or
we cannot rely on close() preventing a thread from falling asleep if it
runs into read(2) after the socket is closed, then we are indeed in trouble.

But then *all* our efforts should have an implicit "best effort" stamp
on them. If we need to expect that operating systems fail on the above,
we also cannot assume that they transport statements and results reliably.

BTW, we have another race condition in awake(). We send the alarm signal
before we close the connection. So it could happen that the target
thread is not in read(2) when the signal comes in, but is already in
read(2) when the connection is closed. Exchanging the operations should
fix it.

> Anyway, this whole thing for me always sounded like a shot at
> interrupting a thread sleeping on a socket operation. I never found a
> clear and portable indication (docs) that this should work at all...

This is a general problem with docs. You can never rely on them:

Many things are not documented. My wife bought a new mobile phone
recently. It came with a manual. It explains in detail, how to send MMS,
listen radio, take pictures and videos, and many, many advanced
features. But it does not even mention that you can do simple phone
calls with it. Let alone, how to do that. Same with simple SMS. There
are always things that are expected to be common knowledge.

Many things are documented, which don't exist.

Many things don't work as they are documented.

Even though one could blame others for improper documentation, there is
no way around working with things in spite of unreliable documentation.

>> ...
>>>> === modified file 'mysql-test/t/kill.test'
>> ...
>>>> @@ -4,329 +4,594 @@
>>>>    # This would work if mysqltest run would be threaded and handle each
>>>>    # connection in a separate thread.
>>>>    #
>>> BTW, is the above remark still true?
>> I haven't read of any change in this area. I'd expect a proper
>> announcement if either the embedded server or mysqltest are changed to
>> run sessions in multiple threads.
> --send on a embedded library linked mysqltest uses threads. It has been
> this way for quite some time.

Well, indeed it seems to be possible to have multiple sessions with an
embedded server. So the description is not accurate. However, running
the test on an embedded server hangs on waiting for the target session
to end. There is no vio connection that can be closed. The user thread
is not in a state where SIGALRM awakes it from. It is not in a wait
condition. So KILL blows out with no effect.

>> ...
>>>> +--echo # Disable concurrent inserts to avoid test failures when
>>>> reading the
>>>> +--echo # connection id which was inserted into a table by another
>>>> thread.
>>>> +SET @old_concurrent_insert= @@global.concurrent_insert;
>>>> +SET @@global.concurrent_insert= 0;
>>> Is this still the case?
>> I haven't read about a change in this area. The server sends ok to the
>> client before it unlocks the tables. That way the client can signal
>> another client to examine the data before the table is unlocked. For
>> MyISAM it means that the other client can read lock the table, where it
>> stores the "currently known" table size, before the server unlocks the
>> table from the concurrent insert, where it updates the "currently known"
>> table size.
> I mean, the test case still inserts the connection id on a table as a
> mean to pass the id between sessions?

Aha. Ok. So your question was if this is still needed? Indeed it's not.
I'd remove it.

>> 1. For me it is much better readable if connection switches are
>> emphasized by indentation. Otherwise the switches don't stand out and I
>> have difficulties to see which actions are done by which connection. I
>> need much more time to read such a test case to be confident that I
>> don't miss actions done by a certain connection.
> OTOH, i find more difficult to follow indentation mixed with switches.
>> 2. We do already have several tests that follow this scheme.
> Really? From a quick glance, I encountered only one. Do other people
> follow the same schema?

Hm. My quick glance produced a lot too much. After going through these
tests, I found just 11, 7 of which are from me. The rest is from the
backup team.

>> 3. Some colleagues find this scheme useful too and use it.
> A large majority of MySQL developers?

No. What do you want to say with the question? Come up with new ideas
only after you convinced a majority of MySQL developers?

>> 4. I'm nor sure what you mean with "code structure" in conjunction with
>> tests. I guess that you mainly mean indentation of continuation lines of
>> long statements. Anyway, in MySQL we indent such things by two columns,
> "code structure" is the demarcation used in programming languages.

Most programming languages that I know, are format free. Coding
guidelines are an agreement between developers, not enforced by the

If you use this term with test cases, do you say that the mysqltest
language is a "programming language"?

And even if so, it contains the concept of executing certain pieces of
code in certain pre-determined threads. I don't see comparable in
(other) programming languages. A different concept may suggest a
different style of indentation.

>> while I (we) use to indent connections by four columns and separate it
>> by an empty line. That way continuation lines can easily be
>> distinguished from connection changes on each level.
> BTW, where is this indentation schema documented? Now i got curious how
> things are indented if there are ifs, loops, etc.

As mentioned already there is no style documentation for test cases.

>> 5. Test case coding style
>> ( does not comment
>> about indentation at all.
> No comment at all? I think you missed the heading "1.1.1 Coding style".

I don't see any discussion about indentation in there. There is just a
pointer to the C/C++ guidelines "when they make sense". And there too
isn't a rule, how to treat dedication of pieces of code to
pre-determined threads.

>> So this style is neither demanded nor
>> forbidden. I don't think, we need to change that.

But still it remains a matter of taste. The goal is to make the tests as
readable a s possible. As we noticed, you and me differ gravely in our
perception of what is better readable. We won't find an agreement in
this. So I suggest to let the second reviewer have the final vote.

>>> Nonetheless, I think we would be better off with a sync point that would
>>> work sometimes, compared to a fixed 2 second sleep.
>> How would you make it into a stable test, in cases where the sync point
>> doesn't work?
> I was under the impression that the sync point would always work. What
> we would sometimes miss testing is the "interrupt thread sleeping" part.

Well, indeed. The output of the test should be the same. But we could
never be sure that we do indeed test that part on every platform at
least sometimes.

>> What we could do, however, is to use both. Send a signal at the very
>> last reasonable moment before going into read(2)/recv(2) and then sleep
>> two seconds after receiving the signal.
> I still don't like the two seconds sleep part.

I understand that. But please be more definite if you want me to get rid
of it.


So, after all, I wonder what is left from my patch?

1. Checking thd->killed after enter_cond() doesn't help as we have no
guarantee that we will see the latest change of this variable in another

2. Closing the connection doesn't help as we have no guarantee that this
will prevent another thread from blocking in read(2).

3. It is not important that we test if we can interrupt read(2) on all

4. Re-formatting of kill.test is not appreciated.

Under these circumstances I don't see a value in this patch any more. I
draw back my proposal. The remaining question is if to reassign to
someone else, to declare it "not a bug: KILL is unreliable by design",
or "won't fix"?

Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder,   Wolfgang Engels,   Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Häring   HRB München 161028
bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Ingo Struewing25 Feb
  • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Davi Arnaut5 May
    • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Ingo Strüwing5 May
      • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Konstantin Osipov5 May
        • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Ingo Strüwing6 May
          • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Konstantin Osipov6 May
            • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Ingo Strüwing6 May
              • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Konstantin Osipov6 May
                • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Ingo Strüwing7 May
                  • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Ingo Strüwing12 May
              • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Davi Arnaut6 May
      • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Davi Arnaut6 May
        • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Ingo Strüwing6 May
          • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Konstantin Osipov6 May
            • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Ingo Strüwing7 May
          • Re: bzr commit into mysql-6.0 branch (ingo.struewing:2779) Bug#37780Davi Arnaut7 May