From: Kevin Lewis Date: December 17 2008 6:05am Subject: Re: Please review Bug#41194 - Falcon durability should occur earlier in the commit process List-Archive: http://lists.mysql.com/falcon/318 Message-Id: <494896C7.3050407@sun.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Here is the way the timing of a commit works with this change Client1 Gopher ============================== =========================== Transaction::commit call Database::commit(this) call SRLCommit::append() call log->flush <= wait for logger I/O, group commit wakeup a gopher Starts processing the log Transaction::fullyCommitted while isActive(), wait on syncIsActive max 1 sec move from active list to committed, serialized state = Committed signal waiting threads, syncIsActive.unlock() I have a while loop because either of these could happen; 1) The client waits for more than a second to get the exclusive lock on syncActiveTransactions or syncCommittedTransactions. In this case, the gopher should keep waiting since the transaction is still committed. 2) The gopher thread sometimes waits for more time than however long I set as the timeout. Originally, I used waitForTransaction which uses 50 seconds by default. When it waits this long and times out, I see that the transaction state is Committed and the commit thread is no longer around. Usually, running falcon_bug_22173a, there will be a couple more threads in waitForWriteComplete(). So I don't understand why the sync.lock(Shared, 1000); does not get the lock. If I try it again right after it times out, it does get the lock. I assume that the state == committed so isActive is true, then the gopher calls sync->lock() at the same time that the client thread calls syncIsActive->unlock() and the gopher is left waiting having missed the signal. We used to have a problem with this, but it was fixed last spring in Bug#34890. At least I thought it was... Read the comment on 28 Mar 2008. So the 1 second wait allows the gophers to keep busy when this missed signal happens. I am still investigating this apparent missed signal. I could either try to fix it in this bug or open a new bug. It may be an existing issue, which would affect performance because it would cause an incorrect 50 second lockWaitTimout periodically. This patch may actually increase the likelihood that a lock and unlock are called at about the same time for syncIsActive. Kevin Vladislav Vaintroub wrote: > Kevin , I do not fully understand the loop, or more exactly the locking > inside > + while (isActive()) >> + { >> + try >> + { >> + Sync sync(&syncIsActive, >> "Transaction::fullyCommitted"); >> + sync.lock(Shared, 1000); >> + } >> + catch (...) >> + { >> + } >> + } > > Is it polling for the isActive() to change to committed? > Why not just sleep() if so? > > >> -----Original Message----- >> From: Kevin.Lewis@stripped [mailto:Kevin.Lewis@stripped] >> Sent: Wednesday, December 17, 2008 12:12 AM >> To: Ann.Harrison@stripped; Vladislav Vaintroub; Kelly Long >> Cc: FalconDev >> Subject: Please review Bug#41194 - Falcon durability should occur >> earlier in the commit process >> >> Vlad, Ann, Kelly and Jim, >> >> Please take a look at this change to move the point of durability up >> higher in the commit process. I also needed to make the gopher thread >> wait for the transaction. Many bad things would happen when a >> transaction is completed before it is fully committed. >> >> Kevin >> >> >> >> Bug#41194 Falcon durability should occur earlier in the commit process. >> >> http://lists.mysql.com/commits/61832 >> >> 2938 Kevin Lewis 2008-12-16 >> Bug#41194 - Move point of durability up higher in the commit >> so that by the time other waiting threads are signaled that >> this transaction is committed, it will already be durable. >> >> modified: >> storage/falcon/Transaction.cpp >> >> per-file messages: >> storage/falcon/Transaction.cpp >> Bug#41194 - Move point of durability up higher in the commit >> so that by the time other waiting threads are signalled that >> this transaction is committed, it will already be durable. >> === modified file 'storage/falcon/Transaction.cpp' >> --- a/storage/falcon/Transaction.cpp 2008-12-16 20:40:38 +0000 >> +++ b/storage/falcon/Transaction.cpp 2008-12-16 22:48:55 +0000 >> @@ -266,6 +266,11 @@ void Transaction::commit() >> >> database->flushInversion(this); >> >> + // Write the commit message to the serial log for durability. >> + // If a crash happens after this, the recover will commit. >> + >> + database->commit(this); >> + >> // Transfer transaction from active list to committed list, set >> committed state >> >> Sync >> syncActiveTransactions(&transactionManager- >>> activeTransactions.syncObject, >> "Transaction::commit(2)"); >> @@ -294,8 +299,6 @@ void Transaction::commit() >> >> syncIsActive.unlock(); // signal waiting transactions >> >> - database->commit(this); >> - >> delete [] xid; >> xid = NULL; >> xidLength = 0; >> @@ -1413,12 +1416,30 @@ void Transaction::getInfo(InfoTable* inf >> } >> } >> >> +// Called by the gopher thread to complete this transaction >> + >> void Transaction::fullyCommitted(void) >> { >> ASSERT(inList); >> >> if (useCount < 2) >> - Log::debug("Transaction::fullyCommitted: funny use >> count\n"); >> + Log::debug("Transaction::fullyCommitted: Unusual use >> count=%d\n", >> useCount); >> + >> + // The commit record is flushed to the serial log before the >> transaction >> + // is fully committed and waiting threads are signalled. This >> gopher may >> + // have picked up that record too soon, so wait for that >> transaction. >> + >> + while (isActive()) >> + { >> + try >> + { >> + Sync sync(&syncIsActive, >> "Transaction::fullyCommitted"); >> + sync.lock(Shared, 1000); >> + } >> + catch (...) >> + { >> + } >> + } >> >> writeComplete(); >> releaseCommittedTransaction(); >> >> >> >> -- >> Falcon Storage Engine Mailing List >> For list archives: http://lists.mysql.com/falcon >> To unsubscribe: http://lists.mysql.com/falcon?unsub=wlad@stripped > > >