List:Falcon Storage Engine« Previous MessageNext Message »
From:Olav Sandstaa Date:December 9 2008 11:34pm
Subject:Re: New dependency manager - optimization for read-only transactions
View as plain text  
Hi Jim,

Thanks for your comments and suggestions. See inline for my respons:

Jim Starkey wrote:
> Olav Sandstaa wrote:
>>
>> 2. Introduce CAS allocation and batch-allocation of transaction objects:
>>
>>    -This patch reduces the number of times an exclusive lock is 
>> needed on the active transaction list when allocating a transaction 
>> object. The allocation is now done by searching the list for an 
>> "available" (free) transaction object. This is done when only having 
>> a shared lock on the list and the allocation is done by using a CAS 
>> (compare-and-swap) operation.
>>
>>    -If not available transaction is found in the active list, an 
>> exclusive lock is set and a new transaction object is allocated. To 
>> increase the likelihood for the next thread searching for an 
>> available transaction, we allocates ten extra free transaction 
>> objects since we anyway have the exclusive lock on the list.
>>
>>    -The "drawback" about introducing this is that the active 
>> transaction list no longer is guaranteed to be sorted by 
>> transactionId. This complicates the test for determining when a 
>> transaction object can be purged from the committed transaction list. 
>> Until this introduction it was enough to compare the transaction's 
>> commitId (endEvent) with the transactionId (startEvent) of the first 
>> transaction in the active list. After this change we have to search 
>> the active transaction list in order to find the oldest active 
>> transaction (and do this while other threads are doing CAS on it is a 
>> bit, well..  (Yes, Jim, I agree with you about non-blocking, 
>> concurrent data structures....) (and I am actually considering to put 
>> an exclusive lock on it while searching for the oldest transaction... 
>> .- and I am considering to do purging less frequently - but that 
>> might be a topic of another discussion...)
>>
>>    -The saving of this patch is that we in most cases is able to 
>> allocate a transaction object without an exclusive lock on the active 
>> transaction list.
>>
>>    - The patch is available here:
>>
>>        http://lists.mysql.com/commits/61009
> Olav, have you consider pre-assigning transaction ids to the extra 
> transactions added to the list?  That would keep the transaction ids 
> in order and eliminate the hassle about the oldest active transaction.

I am not sure if that would work correctly, at least not after we 
"joined" the transactionId and the startEvent as the same number. We are 
using the transactionId in "visibility" checks like this example:

    // If the other transaction committed after we started then it 
should not
    // be visible to us

    if (transaction->commitId > transactionId)
        return false;

By preallocating transactionId we could/would possibly change the 
visibility of transactions: two following transactions on the same 
connection where the second transaction would not see the changes done 
by the previous transacation....
(this is after midnight so I might not think clearly).

>
> If you were to add cycle locking, you could let the cycle manager 
> remove completed, no-commit transactions from the middle of the active 
> transaction list and move them to the end of the list as available 
> transactions.  (If you added two cycle managers, you could have 
> bicycles or maybe outboard motors)

:-)

Could be a good alternative but I am not quite sure how this would work. 
Do you have an example or explanation of cycle locking ? (if it was not 
this late I could probably have found an example myself).

>>
>>
>> 3. Re-use read-only transaction objects.
>>
>>    -This patch changes Transaction::commitNoUpdates() to not delete 
>> the transaction objects but let them stay in the active transaction 
>> list marked as "available" (so they can be picked up and re-used by 
>> the code used in patch 2). Basically, commitNoUpdates() are back as 
>> it where when I started on this rewrite.
>>
>>    -The advantage of this is that we now are able to run read-only 
>> transaction without having to set exclusive locks on the active 
>> transaction list neither at start of the transaction nor at commit. 
>> As an example: for each of the patches I have made for the new 
>> dependency manager I have run a ten hour stress test. Before this 
>> latest patch this test allocates about 225 million transaction 
>> objects. With the latest patch it allocates in total about 140 
>> transaction objects (yes, it is a read-only test).
>>
>>    -The patch is ready, but not yet committed.
>>
>> The effect of these patches are shown in the attached graph for a 
>> "singel-record select test" (yes, I know it is not representative but 
>> I selected this test because it is simple and a test where the 
>> dependency manager's allocation of transaction objects is really 
>> stressed and it will have huge impact on performance):
>>
>>   -Red line: performance of old transaction manager
>>   -Green and blue: new implementation without any optimizations for 
>> read-only transactions
>>   -Purple: patch 1 above: shorter code path, no exclusive lock on 
>> committed list
>>   -Cyan: patch 2 above: batch allocation of transaction objects and 
>> CAS allocation
>>   -Brown:  patch 3 above: reuse of transaction objects.
>>
>> So far so good, and hadn't it been for this morning a single test 
>> failed on a Windows machine that is most likely related to changes in 
>> the dependency manager, well time for debugging..... (more info about 
>> this is a separate email)...
>>
>> Any questions and comments are as usual welcome (as long as they do 
>> not involve sorting of lists using CAS....)
>>
> Leaving committed no-update transactions in the active transaction 
> list for a cycle manager to move to the end is simple, fast, elegant, 
> and subversive.
>


I will consider it. But I do not think the current implementation is too 
bad either. We only need to find "the oldest active" when we want to 
purge old committed transactions. This we do in two cases in the new 
implementation:

1. For each commit of an update transasaction

2. Every 30 second (or so) by the scavenger.

Only one of these would stricly be necessary (I think). The impact of 
having to read the active list is largest for the purge done by 
commit(). To reduce the impact of it one simple way would be to not do 
this on every commit but on only every N commit.

But I would like to learn how a cycle manager would work and consider 
implement it.

Olav
Thread
New dependency manager - optimization for read-only transactionsOlav Sandstaa9 Dec
  • Re: New dependency manager - optimization for read-only transactionsOlav Sandstaa9 Dec
  • Re: New dependency manager - optimization for read-only transactionsJim Starkey9 Dec
    • Re: New dependency manager - optimization for read-only transactionsOlav Sandstaa10 Dec
      • Re: New dependency manager - optimization for read-only transactionsKevin Lewis10 Dec