List:Falcon Storage Engine« Previous MessageNext Message »
From:Kevin Lewis Date:December 10 2008 4:36am
Subject:Re: New dependency manager - optimization for read-only transactions
View as plain text  

We will have time after Beta to consider implementing cycle managers for 
record verions, transactions, and whatever else.  But that is purely a 
performance improvement.  I have added the descriptions that I have in 
email form from Jim to the new Future section of the Falcon home page.

I think it is a great idea which will helpt our high concurrency 
response time and even throughput.  But let's do it after Beta.


Olav Sandstaa wrote:
> Hi Jim,
> Thanks for your comments and suggestions. See inline for my respons:
> Jim Starkey wrote:
>> Olav Sandstaa wrote:
>>> 2. Introduce CAS allocation and batch-allocation of transaction objects:
>>>    -This patch reduces the number of times an exclusive lock is 
>>> needed on the active transaction list when allocating a transaction 
>>> object. The allocation is now done by searching the list for an 
>>> "available" (free) transaction object. This is done when only having 
>>> a shared lock on the list and the allocation is done by using a CAS 
>>> (compare-and-swap) operation.
>>>    -If not available transaction is found in the active list, an 
>>> exclusive lock is set and a new transaction object is allocated. To 
>>> increase the likelihood for the next thread searching for an 
>>> available transaction, we allocates ten extra free transaction 
>>> objects since we anyway have the exclusive lock on the list.
>>>    -The "drawback" about introducing this is that the active 
>>> transaction list no longer is guaranteed to be sorted by 
>>> transactionId. This complicates the test for determining when a 
>>> transaction object can be purged from the committed transaction list. 
>>> Until this introduction it was enough to compare the transaction's 
>>> commitId (endEvent) with the transactionId (startEvent) of the first 
>>> transaction in the active list. After this change we have to search 
>>> the active transaction list in order to find the oldest active 
>>> transaction (and do this while other threads are doing CAS on it is a 
>>> bit, well..  (Yes, Jim, I agree with you about non-blocking, 
>>> concurrent data structures....) (and I am actually considering to put 
>>> an exclusive lock on it while searching for the oldest transaction... 
>>> .- and I am considering to do purging less frequently - but that 
>>> might be a topic of another discussion...)
>>>    -The saving of this patch is that we in most cases is able to 
>>> allocate a transaction object without an exclusive lock on the active 
>>> transaction list.
>>>    - The patch is available here:
>> Olav, have you consider pre-assigning transaction ids to the extra 
>> transactions added to the list?  That would keep the transaction ids 
>> in order and eliminate the hassle about the oldest active transaction.
> I am not sure if that would work correctly, at least not after we 
> "joined" the transactionId and the startEvent as the same number. We are 
> using the transactionId in "visibility" checks like this example:
>    // If the other transaction committed after we started then it should 
> not
>    // be visible to us
>    if (transaction->commitId > transactionId)
>        return false;
> By preallocating transactionId we could/would possibly change the 
> visibility of transactions: two following transactions on the same 
> connection where the second transaction would not see the changes done 
> by the previous transacation....
> (this is after midnight so I might not think clearly).
>> If you were to add cycle locking, you could let the cycle manager 
>> remove completed, no-commit transactions from the middle of the active 
>> transaction list and move them to the end of the list as available 
>> transactions.  (If you added two cycle managers, you could have 
>> bicycles or maybe outboard motors)
> :-)
> Could be a good alternative but I am not quite sure how this would work. 
> Do you have an example or explanation of cycle locking ? (if it was not 
> this late I could probably have found an example myself).
>>> 3. Re-use read-only transaction objects.
>>>    -This patch changes Transaction::commitNoUpdates() to not delete 
>>> the transaction objects but let them stay in the active transaction 
>>> list marked as "available" (so they can be picked up and re-used by 
>>> the code used in patch 2). Basically, commitNoUpdates() are back as 
>>> it where when I started on this rewrite.
>>>    -The advantage of this is that we now are able to run read-only 
>>> transaction without having to set exclusive locks on the active 
>>> transaction list neither at start of the transaction nor at commit. 
>>> As an example: for each of the patches I have made for the new 
>>> dependency manager I have run a ten hour stress test. Before this 
>>> latest patch this test allocates about 225 million transaction 
>>> objects. With the latest patch it allocates in total about 140 
>>> transaction objects (yes, it is a read-only test).
>>>    -The patch is ready, but not yet committed.
>>> The effect of these patches are shown in the attached graph for a 
>>> "singel-record select test" (yes, I know it is not representative but 
>>> I selected this test because it is simple and a test where the 
>>> dependency manager's allocation of transaction objects is really 
>>> stressed and it will have huge impact on performance):
>>>   -Red line: performance of old transaction manager
>>>   -Green and blue: new implementation without any optimizations for 
>>> read-only transactions
>>>   -Purple: patch 1 above: shorter code path, no exclusive lock on 
>>> committed list
>>>   -Cyan: patch 2 above: batch allocation of transaction objects and 
>>> CAS allocation
>>>   -Brown:  patch 3 above: reuse of transaction objects.
>>> So far so good, and hadn't it been for this morning a single test 
>>> failed on a Windows machine that is most likely related to changes in 
>>> the dependency manager, well time for debugging..... (more info about 
>>> this is a separate email)...
>>> Any questions and comments are as usual welcome (as long as they do 
>>> not involve sorting of lists using CAS....)
>> Leaving committed no-update transactions in the active transaction 
>> list for a cycle manager to move to the end is simple, fast, elegant, 
>> and subversive.
> I will consider it. But I do not think the current implementation is too 
> bad either. We only need to find "the oldest active" when we want to 
> purge old committed transactions. This we do in two cases in the new 
> implementation:
> 1. For each commit of an update transasaction
> 2. Every 30 second (or so) by the scavenger.
> Only one of these would stricly be necessary (I think). The impact of 
> having to read the active list is largest for the purge done by 
> commit(). To reduce the impact of it one simple way would be to not do 
> this on every commit but on only every N commit.
> But I would like to learn how a cycle manager would work and consider 
> implement it.
> Olav
New dependency manager - optimization for read-only transactionsOlav Sandstaa9 Dec
  • Re: New dependency manager - optimization for read-only transactionsOlav Sandstaa9 Dec
  • Re: New dependency manager - optimization for read-only transactionsJim Starkey9 Dec
    • Re: New dependency manager - optimization for read-only transactionsOlav Sandstaa10 Dec
      • Re: New dependency manager - optimization for read-only transactionsKevin Lewis10 Dec