Ann W. Harrison wrote:
> Kevin,
>
> I'm concerned about your second fix to bug 41194.
>
> "The previous patch [which delayed switching the transaction
> state to committed] does not work. What happens is that the
> gopher thread immediately starts processing the serial log
> commit record and gets all the way to Transaction::writeComplete
> before the Transaction::commit even gets to the state = committed;
> line. So then it asserts in writeComplete that (state == committed).
>
> "The call to database->commit(this); does not have to occur before
> switching the transaction from activeTransaction list to
> comittedTransactions. But it does need to be called before other
> waiting transactions are signaled that this transaction is
> committed."
>
> Obviously, the assertion you saw in the gopher thread is a problem,
> but I worry that your solution introduces another possible failure
> that will be harder to detect. Consider a transaction A that's
> starting to commit. It moves itself to the committed list.
> Transaction B starts. B has never waited for A, but reads a
> record that A created and takes some action based on that record,
> and commits itself. Then A's call to database->commit(this) fails.
> Unless there's a lock that I haven't seen that keeps B from
> starting or committing while A is between moving itself to the
> committed list and actually writing its commit to the serial
> log, there's a problem.
The lock that prevents action being taken on a record being committed by
transaction A is Transaction::syncIsActive.
Transaction::waitForTransaction() gets a shared lock on it if
Transaction::state == active. So now we have this order
1) synchronously move trans from Active to committed lists
2) Transaction::state = committed
3) Durability - write committed record to serial log.
4) Transaction::syncIsActive.unlock()
There is admittedly a gap between 2 and 4 where another transaction can
take action on a record that is "committed" but not yet durable.
> An alternate solution might be to have the gopher check that
> the transaction either no longer exists or has a state of
> committed before it starts to move changes out of the serial
> log.
Good idea. The gopher thread can get a quick shared lock on
Transaction::syncIsActive before processing it. Since the gopher is in
the background the performance cost is acceptable. And the transaction
in the serial log with writePending == true has a predictable path
bewteen Transactin::state == active to committed. So that wait is
deterministic.
I'll try it.
It is still a good idea to do #3-durability before #4-signal, right?
It was the other way around.