List:Falcon Storage Engine« Previous MessageNext Message »
From:Kevin Lewis Date:December 17 2008 6:05am
Subject:Re: Please review Bug#41194 - Falcon durability should occur earlier
in the commit process
View as plain text  
Here is the way the timing of a commit works with this change

Client1                         Gopher
==============================  ===========================
Transaction::commit
   call Database::commit(this)
     call SRLCommit::append()
       call log->flush  <= wait for logger I/O, group commit
       wakeup a gopher
                                 Starts processing the log
                                 Transaction::fullyCommitted
                                 while isActive(),
                                   wait on syncIsActive max 1 sec

     move from active list to committed, serialized
     state = Committed
     signal waiting threads, syncIsActive.unlock()


I have a while loop because either of these could happen;

1) The client waits for more than a second to get the exclusive lock on 
syncActiveTransactions or syncCommittedTransactions.  In this case, the 
gopher should keep waiting since the transaction is still committed.

2) The gopher thread sometimes waits for more time than however long I 
set as the timeout.  Originally, I used waitForTransaction which uses 50 
seconds by default.  When it waits this long and times out, I see that 
the transaction state is Committed and the commit thread is no longer 
around.  Usually, running falcon_bug_22173a, there will be a couple more 
threads in waitForWriteComplete().  So I don't understand why the 
sync.lock(Shared, 1000); does not get the lock.  If I try it again right 
after it times out, it does get the lock.  I assume that the state == 
committed so isActive is true, then the gopher calls sync->lock() at the 
same time that the client thread calls syncIsActive->unlock() and the 
gopher is left waiting having missed the signal.  We used to have a 
problem with this, but it was fixed last spring in Bug#34890.  At least 
I thought it was...  Read the comment on 28 Mar 2008.

So the 1 second wait allows the gophers to keep busy when this missed 
signal happens.  I am still investigating this apparent missed signal. 
I could either try to fix it in this bug or open a new bug.  It may be 
an existing issue, which would affect performance because it would cause 
an incorrect 50 second lockWaitTimout periodically.  This patch may 
actually increase the likelihood that a lock and unlock are called at 
about the same time for syncIsActive.

Kevin




Vladislav Vaintroub wrote:
> Kevin , I do not fully understand the loop, or more exactly the locking
> inside
> +	while (isActive())
>> +		{
>> +		try
>> +			{
>> +			Sync sync(&syncIsActive,
>> "Transaction::fullyCommitted");
>> +			sync.lock(Shared, 1000);
>> +			}
>> +		catch  (...)
>> +			{
>> +			}
>> +		}
> 
> Is it polling for the  isActive() to change to committed?
> Why not just sleep() if so?
> 
> 
>> -----Original Message-----
>> From: Kevin.Lewis@stripped [mailto:Kevin.Lewis@stripped]
>> Sent: Wednesday, December 17, 2008 12:12 AM
>> To: Ann.Harrison@stripped; Vladislav Vaintroub; Kelly Long
>> Cc: FalconDev
>> Subject: Please review Bug#41194 - Falcon durability should occur
>> earlier in the commit process
>>
>> Vlad, Ann, Kelly and Jim,
>>
>> Please take a look at this change to move the point of durability up
>> higher in the commit process.  I also needed to make the gopher thread
>> wait for the transaction.  Many bad things would happen when a
>> transaction is completed before it is fully committed.
>>
>> Kevin
>>
>>
>>
>> Bug#41194 Falcon durability should occur earlier in the commit process.
>>
>>    http://lists.mysql.com/commits/61832
>>
>> 2938 Kevin Lewis	2008-12-16
>>        Bug#41194 - Move point of durability up higher in the commit
>>        so that by the time other waiting threads are signaled that
>>        this transaction is committed, it will already be durable.
>>
>> modified:
>>    storage/falcon/Transaction.cpp
>>
>> per-file messages:
>>    storage/falcon/Transaction.cpp
>>      Bug#41194 - Move point of durability up higher in the commit
>>      so that by the time other waiting threads are signalled that
>>      this transaction is committed, it will already be durable.
>> === modified file 'storage/falcon/Transaction.cpp'
>> --- a/storage/falcon/Transaction.cpp	2008-12-16 20:40:38 +0000
>> +++ b/storage/falcon/Transaction.cpp	2008-12-16 22:48:55 +0000
>> @@ -266,6 +266,11 @@ void Transaction::commit()
>>
>>   	database->flushInversion(this);
>>
>> +	// Write the commit message to the serial log for durability.
>> +	// If a crash happens after this, the recover will commit.
>> +
>> +	database->commit(this);
>> +
>>   	// Transfer transaction from active list to committed list, set
>> committed state
>>
>>   	Sync
>> syncActiveTransactions(&transactionManager-
>>> activeTransactions.syncObject,
>> "Transaction::commit(2)");
>> @@ -294,8 +299,6 @@ void Transaction::commit()
>>
>>   	syncIsActive.unlock(); // signal waiting transactions
>>
>> -	database->commit(this);
>> -
>>   	delete [] xid;
>>   	xid = NULL;
>>   	xidLength = 0;
>> @@ -1413,12 +1416,30 @@ void Transaction::getInfo(InfoTable* inf
>>   		}
>>   }
>>
>> +// Called by the gopher thread to complete this transaction
>> +
>>   void Transaction::fullyCommitted(void)
>>   {
>>   	ASSERT(inList);
>>
>>   	if (useCount < 2)
>> -		Log::debug("Transaction::fullyCommitted: funny use
>> count\n");
>> +		Log::debug("Transaction::fullyCommitted: Unusual use
>> count=%d\n",
>> useCount);
>> +
>> +	// The commit record is flushed to the serial log before the
>> transaction
>> +	// is fully committed and waiting threads are signalled.  This
>> gopher may
>> +	// have picked up that record too soon, so wait for that
>> transaction.
>> +
>> +	while (isActive())
>> +		{
>> +		try
>> +			{
>> +			Sync sync(&syncIsActive,
>> "Transaction::fullyCommitted");
>> +			sync.lock(Shared, 1000);
>> +			}
>> +		catch  (...)
>> +			{
>> +			}
>> +		}
>>
>>   	writeComplete();
>>   	releaseCommittedTransaction();
>>
>>
>>
>> --
>> Falcon Storage Engine Mailing List
>> For list archives: http://lists.mysql.com/falcon
>> To unsubscribe:    http://lists.mysql.com/falcon?unsub=1
> 
> 
> 
Thread
Please review Bug#41194 - Falcon durability should occur earlier inthe commit processKevin Lewis17 Dec
  • RE: Please review Bug#41194 - Falcon durability should occur earlierin the commit processVladislav Vaintroub17 Dec
    • Re: Please review Bug#41194 - Falcon durability should occur earlierin the commit processKevin Lewis17 Dec