Hi!
Jørgen Løland wrote:
> (...)
>
> Hmm... an interesting suggestion, but are you sure it works to set sync
> points in a connection that is already processing an operation? Like:
>
> send BACKUP ...
> ...
> SET DEBUG_SYNC ...
> ...
> reap;
>
I'm sure it does not work (and will crash the server most probably). I hope that
in my example I don't do things like this - I switch to 'default' connection to
set additional sync points. If I did set a sync point after "send ..." then that
was unintended mistake.
> Personally, if the tests get complex, I like starting as many clients as
> there are connections, and cut'n paste the test script into the
> different windows. That does not work with the suggested script, but
> maybe it is easier to read.
>
I think it can still work - you just need one additional client to do the
"orchestrating" which in the test is done from the default connection.
> I suggest to do the following:
>
> 1. Add better comments to all operations with sync points, and what they
> are supposed to trigger.
> 2. Indent 2 spaces for each connection
> 3. See if any connection can be removed.
>
I would add to these:
4. Not using "cascading" sync points, i.e., some sync point activates another
sync point when hit. Instead, always send signals explicitly with "SET
DEBUG_SYNC= 'now SIGNAL signal'" as I did in my example.
> If you disagree, and Chuck agrees with you, I'll do as I'm told to :)
>
I will not put my suggestions as requirement, so the last word is yours :)
>> 1. How do we test that things are really blocked by each other.
>> Waiting for reaching given sync point is not a good method.
>
> Why is it not good to wait for sync? I need some suggestion for how to
> do this test if this is a bad idea.
Ok, it depends on the sync points used. For many of them, they will be
eventually reached regardless of whether the statement was blocked or not.
But, examining the code (lock.cc:1130), I can see that
'wait_for_global_read_lock' is placed so that it is hit only if we are going to
wait. At least that is the intention. In practice it will not always work,
because protect_against_glbal_read_lock is examined outside enter_cond() and
thus we can hit the sync point and then, moment later, not wait for anything (a
typical race condition).
One could try to test, e.g., that ongoing INSERT is blocked, by checking that no
data has been inserted to the corresponding table.
>> However, my main concern is that I think this test does not test what
>> it should. I raised this issue few times before but so far no one
>> seems to understand me.
>>
>> I have elaborated on this in an email thread with subject "Commit
>> blocker testing" cc-ed to dev-backup and taking place months
>> 07-08/2008. In short:
>>
>> The task of commit blocker is to block ongoing COMMITS. It also
>> ensures that VP is not created while COMMITs are running. The basic
>> properties:
>>
>> a) when activated, COMMITs can do not commit transactions but wait for
>> its deactivation.
>> b) if there are ongoing COMMITs, activating commit blocker will wait
>> for them to finish and at the same time prevent new COMMITs from
>> happening.
>>
>> I don't see these things being tested in this test (and I see a lot of
>> other things tested, which in my opinion are not that much related to
>> the commit blocker itself).
>>
>> Thus my request would be to:
>>
>> 1. Either change backup_commit_blocker test so that it test the above
>> basic properties of the commit blocker. If doing this, I suggest to
>> simplify it by not testing anything else (leaving it for other tests).
>>
>> 2. Or, rename and redefine the purpose of this test. As it is now, it
>> does not test the commit blocker but rather tests the property that
>> backup image should
>> contain only committed data and only finished non-trx statements. In
>> that case make sure that testing the commit blocker is on our roadmap
>> (WL#4135 most probably - but not sure if the above properties are
>> tested in this WL either).
>
> Regarding commits, I'll see what I can do about scenarios a) and b) above.
>
> However, I think it makes sense to test both non-tx and tx engines in
> the same test file. This test file should test that everything committed
> [1] before VP is restored, but nothing else.
>
Ok. I still want to emphasize that this test should focus on testing the commit
blocker functionality. I would consider testing that backup image contains only
committed data changes as a separate task. Although, I do realise that checking
that only committed data changes are included is one of the ways of testing the
commit blocker.
Anyways, I think we all will be lest confused if we have two separate tests:
1. backup_consistency - basically, does what backup_commit_blocker does now,
i.e. checks that only committed data is included in the image (for trx and
non-trx engines). Alternative name could be backup_validity_point, as another
way of looking at this test is that it checks that VP is what it is supposed to be.
2. backup_commit_blocker - tests blocking of COMMITS by the commit blocker.
In this scenario, I envision that backup_commit_blocker test would be quite
different and way simpler than the current one.
> [1] Including non-tx DMLs; I think of these as operations with implicit
> commits. Maybe that's wrong.
I try to think in a similar way, being aware that things inside the server are
often more complicated and counter-intuitive than we would expect.
Rafal