List:Falcon Storage Engine« Previous MessageNext Message »
From:Jim Starkey Date:January 21 2009 3:53pm
Subject:Re: When you have a chance...
View as plain text  
Kevin, let's rethink this.  In the original design, this wasn't an issue 
because the only transaction that would look at the data was in the same 
thread as the chill, so no race condition could occur.  Did I miss 
something, or did something change that made this assumption unwarranted?


Kevin Lewis wrote:
> Jim,
>
> Cycle locking addresses the continued existence of an object while 
> there may be pointers to it.  The problem we have here is that there 
> is a pointer within the record to a data buffer.  That pointer can go 
> away at any time, even an instant after a call to hasData().  It takes 
> two steps to use the data buffer;  1) Ask whether the record is 
> chilled or hasData(),  2) Use the data buffer.
>
> We need to surround these with some kind of SyncObject.  Otherwise, 
> there is no guarantee that a data buffer will be available, even if 
> you check for its existance first.  Cycle locking protects objects tat 
> threads have pointers to.  We need to protect the pointer.
>
> Kevin
>
> Jim Starkey wrote:
>> I suggest you consider cycle locking rather than a syncPrior lock.  
>> The basic idea is this:
>>
>>   1. A thread takes a shared lock on the cycle lock for the duration of
>>      an engine operation.
>>   2. Intermediate and trailing records are removed from record chains
>>      with an interlocked CAS.  The removed records are insert, again
>>      with an interocked CAS, into a "record purgatory".
>>   3. After a suitable period of waiting, the cycle manager flips cycle
>>      locks, waits for an exclusive lock on the previous cycle lock,
>>      then deletes records in record purgatory.
>>
>> In effect, this extends the lifetime of otherwise deleted records to 
>> the duration of the cycle.  When the cycle is complete (cycle manager 
>> gets an exclusive lock on the cycle's lock), there will be no 
>> pointers to the deleted records, and they can be safely deleted.  In 
>> the interim, however, "deleted" record's prior pointer and state 
>> remain valid.
>>
>> This gets rid of syncPrior and a lot addRef/releases.
>>
>> A skeleton for a cycle manager is something like this:
>>
>>    void XXX::cycleManager(void)
>>    {
>>        Thread *thread = Thread::getThread("XXX::cycleManager");
>>              while (!shutdownServer)
>>            {
>>            thread->sleep(CYCLE_SLEEP * 1000);
>>            SyncObject *prior = currentCycle;
>>            currentCycle = (prior == &cycle1) ? &cycle2 : &cycle1;
>>            ++cycleNumber;
>>            Sync sync(prior, "XXX::cycleManager");
>>            sync.lock(Exclusive);
>>            sync.unlock();
>>                          agingCycle(cycleType);      // this is where 
>> post cycle work
>>    gets done
>>                      }
>>                  cycleThread = NULL;
>>    }
>>
>> Kevin Lewis wrote:
>>> Chris,
>>>
>>> I thought a lot about this solution the last couple of days.  Using 
>>> a timeout as a standard expected event in handling a concurrency 
>>> issue is not the best approach, I'm sure Jim would agree.
>>>
>>> But there are extenuating circumstances.  You have shown that we 
>>> need to protect other threads that call hasData() from calling that 
>>> concurrently with an active chill thread because it could be there 
>>> in one instant and not the next. The act of chilling a record is 
>>> very intrusive and dangerous without a record mutex.  But using a 
>>> record mutex is against Jim's MVCC philosophy for many reasons which 
>>> need not be listed here. We just can't do it.  So testing and using 
>>> the syncPrior as an alternative to the record mutex is OK, I think.  
>>> Especially since there is no hard requirement to chill any 
>>> particular record.
>>>
>>> Since the action on the record is not  required, and chilling should 
>>> not happen very often, I think a timeout and move-on approach is 
>>> acceptable here.
>>>
>>> Especially since I cannot think of a better approach! :)
>>>
>>> Jim, this is your invitation to spark a better idea, or think of a 
>>> better approach.  Note that this use of syncPrior during a chill 
>>> actually works, even though it is not pretty.
>>>
>>> Kevin
>>>
>>> Christopher Powers wrote:
>>>> ...call me today 612-729-1519 or my cell, 612-616-1069. I'd like to 
>>>> discuss the changes I did for garbage collect.
>>>>
>>>> I pushed the fixes to mysql-6.0-falcon-chris on PB2, and *both* 
>>>> falcon_online_alter and falcon_chill_thaw passed. I find this to be 
>>>> very encouraging.
>>>>
>>>> Using syncPrior in chill was not straightforward solution because 
>>>> of a deadlock with a low-level thaw, but what I have works.
>>>>
>>>
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> Subject:
>>> bzr commit into mysql-6.0-falcon-team branch 
>>> (christopher.powers:2964) Bug#42131
>>> From:
>>> Christopher Powers <Christopher.Powers@stripped>
>>> Date:
>>> Sat, 17 Jan 2009 02:23:04 -0600 (CST)
>>> To:
>>> commits@stripped
>>>
>>> To:
>>> commits@stripped
>>>
>>>
>>> #At file:///home/cpowers/work/dev/dev-03/mysql/
>>>
>>>  2964 Christopher Powers    2009-01-17
>>>       Bug #42131 "falcon_backlog test crashes in 
>>> Record::getEncodedValue"
>>>             Transaction:chill() must get an exclusive syncPrior lock 
>>> before deleting record data.
>>> modified:
>>>   storage/falcon/Record.h
>>>   storage/falcon/SRLUpdateRecords.cpp
>>>   storage/falcon/SRLUpdateRecords.h
>>>   storage/falcon/Table.h
>>>
>>> per-file messages:
>>>   storage/falcon/Record.h
>>>     Disable CHECK_RECORD_ACTIVITY. This flag has not been 
>>> maintained, and causes
>>>     invalid fatal assertions.
>>>   storage/falcon/SRLUpdateRecords.cpp
>>>     SRLUpdateRecords::chill() now gets an exclusive syncPrior lock 
>>> before
>>>     deleting the record data of a record being chilled.
>>>         To prevent a deadlock with concurrent record thaws, abandon 
>>> the chill for
>>>     any record for which the syncPrior cannot be acquired within 50ms.
>>>         The presumption is that if syncPrior is held for a record 
>>> chain, then
>>>     all records should remain fully intact.
>>>   storage/falcon/SRLUpdateRecords.h
>>>     Change return type of SRLUpdateRecords::chill() to boolean
>>>   storage/falcon/Table.h
>>>     Increase size of syncPrior and syncThaw hash table
>>> === modified file 'storage/falcon/Record.h'
>>> --- a/storage/falcon/Record.h    2009-01-09 22:00:52 +0000
>>> +++ b/storage/falcon/Record.h    2009-01-17 08:22:44 +0000
>>> @@ -27,7 +27,7 @@
>>>  #define ALLOCATE_RECORD(n)        (char*) MemMgrRecordAllocate (n, 
>>> __FILE__, __LINE__)
>>>  #define DELETE_RECORD(record)    MemMgrRecordDelete (record);
>>>  
>>> -#define CHECK_RECORD_ACTIVITY
>>> +//#define CHECK_RECORD_ACTIVITY
>>>  
>>>  #include "SynchronizationObject.h"
>>>  
>>>
>>> === modified file 'storage/falcon/SRLUpdateRecords.cpp'
>>> --- a/storage/falcon/SRLUpdateRecords.cpp    2008-11-19 17:00:02 +0000
>>> +++ b/storage/falcon/SRLUpdateRecords.cpp    2009-01-17 08:22:44 +0000
>>> @@ -28,6 +28,7 @@
>>>  #include "Sync.h"
>>>  #include "SerialLogWindow.h"
>>>  #include "Format.h"
>>> +#include "SQLError.h"
>>>  
>>>  SRLUpdateRecords::SRLUpdateRecords(void)
>>>  {
>>> @@ -37,8 +38,19 @@ SRLUpdateRecords::~SRLUpdateRecords(void
>>>  {
>>>  }
>>>  
>>> -void SRLUpdateRecords::chill(Transaction *transaction, 
>>> RecordVersion *record, uint dataLength)
>>> +bool SRLUpdateRecords::chill(Transaction *transaction, 
>>> RecordVersion *record, uint dataLength)
>>>  {
>>> +    Sync syncPrior(record->getSyncPrior(), "SRLUpdateRecords::chill");
>>> +   +    try
>>> +        {
>>> +        syncPrior.lock(Exclusive, 50);
>>> +        }
>>> +    catch (...)
>>> +        {
>>> +        return false;
>>> +        }
>>> +        // Record data has been written to the serial log, so 
>>> release the data
>>>      // buffer and set the state accordingly
>>>     @@ -55,6 +67,8 @@ void SRLUpdateRecords::chill(Transaction
>>>          transaction->totalRecordData -= dataLength;
>>>      else
>>>          transaction->totalRecordData = 0;
>>> +       +    return true;
>>>  }
>>>  
>>>  int SRLUpdateRecords::thaw(RecordVersion *record, bool *thawed)
>>> @@ -200,15 +214,16 @@ void SRLUpdateRecords::append(Transactio
>>>                      {
>>>                      int chillBytes = record->getEncodedSize();
>>>  
>>> -                    chill(transaction, record, chillBytes);
>>> -                   -                    log->chilledRecords++;
>>> -                    log->chilledBytes += chillBytes;
>>> +                    if (chill(transaction, record, chillBytes))
>>> +                        {
>>> +                        log->chilledRecords++;
>>> +                        log->chilledBytes += chillBytes;
>>>                     -                    
>>> ASSERT(transaction->thawedRecords > 0);
>>> +                        //ASSERT(transaction->thawedRecords > 0);
>>>  
>>> -                    if (transaction->thawedRecords)
>>> -                        transaction->thawedRecords--;
>>> +                        if (transaction->thawedRecords)
>>> +                            transaction->thawedRecords--;
>>> +                        }
>>>                      }
>>>                  else
>>>                      {
>>> @@ -253,9 +268,11 @@ void SRLUpdateRecords::append(Transactio
>>>                          if (chillRecords && record->state != 
>>> recDeleted)
>>>                  {
>>> -                chill(transaction, record, stream.totalLength);
>>> -                chilledRecordsWindow++;
>>> -                chilledBytesWindow += stream.totalLength;
>>> +                if (chill(transaction, record, stream.totalLength))
>>> +                    {
>>> +                    chilledRecordsWindow++;
>>> +                    chilledBytesWindow += stream.totalLength;
>>> +                    }
>>>                  }
>>>              } // next record version
>>>        
>>> === modified file 'storage/falcon/SRLUpdateRecords.h'
>>> --- a/storage/falcon/SRLUpdateRecords.h    2008-11-14 02:30:11 +0000
>>> +++ b/storage/falcon/SRLUpdateRecords.h    2009-01-17 08:22:44 +0000
>>> @@ -33,7 +33,7 @@ public:
>>>      virtual void    read(void);
>>>      virtual void    pass2(void);
>>>      void            append(Transaction *transaction, RecordVersion 
>>> *records, bool chillRecords = false);
>>> -    void            chill(Transaction *transaction, RecordVersion 
>>> *record, uint dataLength);
>>> +    bool            chill(Transaction *transaction, RecordVersion 
>>> *record, uint dataLength);
>>>      int                thaw(RecordVersion *record, bool *thawed);
>>>          const UCHAR        *data;
>>>
>>> === modified file 'storage/falcon/Table.h'
>>> --- a/storage/falcon/Table.h    2009-01-15 20:29:54 +0000
>>> +++ b/storage/falcon/Table.h    2009-01-17 08:22:44 +0000
>>> @@ -42,8 +42,8 @@ static const int PostCommit    = 128;
>>>  
>>>  static const int BL_SIZE            = 128;
>>>  static const int FORMAT_HASH_SIZE    = 20;
>>> -static const int SYNC_VERSIONS_SIZE    = 16;
>>> -static const int SYNC_THAW_SIZE        = 16;
>>> +static const int SYNC_VERSIONS_SIZE    = 32; // 16;
>>> +static const int SYNC_THAW_SIZE        = 32; // 16;
>>>  
>>>  #define FOR_FIELDS(field,table)    {for (Field 
>>> *field=table->fields; field; field = field->next){
>>>  
>>>
>>>
>>>  
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> Subject:
>>> Re: Bug#36631, "Assertion in SerialLogControl::nextRecord"
>>> From:
>>> Christopher Powers <Christopher.Powers@stripped>
>>> Date:
>>> Fri, 16 Jan 2009 00:47:06 -0600
>>> To:
>>> Kevin Lewis <klewis@stripped>
>>>
>>> To:
>>> Kevin Lewis <klewis@stripped>
>>> CC:
>>> FalconDev <falcon@stripped>
>>>
>>>
>>> Christopher Powers wrote:
>>>> Kevin,
>>>>
>>>> This is the reopened chill/thaw stress test bug.
>>>>
>>>> I can easily reproduce an assert in Record::getEncodedValue() that 
>>>> always occurs during a scavenge/record pruning (stack attached.)
>>>>
>>>> At the time of the assert, the record state is 'recChilled', so 
>>>> that explains why getEncodedValue() fails. It appears that the 
>>>> record 'oldie' in Index::duplicateKey() gets chilled while it's 
>>>> being accessed.
>>>>
>>>> Note that hasRecord() should have thawed the record. I added a 
>>>> second thaw to verify this, but the breakpoint never hit, so I 
>>>> think the record is being chilled during makeKey().
>>>>
>>>>
>>>> bool Index::duplicateKey(IndexKey *key, Record * record)
>>>> {
>>>>   for (Record *oldie = record; oldie; oldie = 
>>>> oldie->getPriorVersion())
>>>>     if (oldie->hasRecord())  <<<< will thaw record
>>>>       {
>>>>       IndexKey oldKey(this);
>>>>                  if (oldie->state == recChilled)
>>>>         oldie->thaw();     <<<< breakpoint never hit
>>>>                      makeKey (oldie, &oldKey); <<<<
> assertion 
>>>> here--chill< during makeKey??
>>>>
>>>>       if (oldKey.isEqual(key))
>>>>         return true;
>>>>       }
>>>>   return false;
>>>> }
>>>>
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>>
>>
>>
>


-- 
Jim Starkey
President, NimbusDB, Inc.
978 526-1376

Thread
Re: When you have a chance...Kevin Lewis19 Jan
  • Re: When you have a chance...Jim Starkey19 Jan
    • Re: When you have a chance...Kevin Lewis20 Jan
      • Re: When you have a chance...Christopher Powers20 Jan
      • Re: When you have a chance...Jim Starkey20 Jan
        • Re: When you have a chance...Kevin Lewis21 Jan
      • Re: When you have a chance...Jim Starkey21 Jan
        • Re: When you have a chance...Kevin Lewis21 Jan
          • RE: When you have a chance...Vladislav Vaintroub21 Jan
            • Re: When you have a chance...Jim Starkey21 Jan
            • Re: When you have a chance...Kevin Lewis21 Jan