List:Falcon Storage Engine« Previous MessageNext Message »
From:Christopher Powers Date:January 20 2009 5:43am
Subject:Re: When you have a chance...
View as plain text  
Kevin Lewis wrote:
> Jim,
> 
> Cycle locking addresses the continued existence of an object while there 
> may be pointers to it.  The problem we have here is that there is a 
> pointer within the record to a data buffer.  That pointer can go away at 
> any time, even an instant after a call to hasData().  It takes two steps 
> to use the data buffer;  1) Ask whether the record is chilled or 
> hasData(),  2) Use the data buffer.
> 
> We need to surround these with some kind of SyncObject.  Otherwise, 
> there is no guarantee that a data buffer will be available, even if you 
> check for its existance first.  Cycle locking protects objects tat 
> threads have pointers to.  We need to protect the pointer.
> 
> Kevin

Kevin,

I find cycle locking to be very intriguing, and have been meaning to 
explore it further, but, as you pointed out, it does not appear to be a 
solution here (btw, I like your comment, "The act of chilling a record 
is very intrusive and dangerous...").

I briefly considered the value of a separate chill/thaw task that could 
be signaled like a scavenge from a low-level record alloc, but that 
won't resolve this issue. The thaw task still needs syncWrite and the 
chill still needs the syncPrior.

I think this is more about task priority than locking order. A record 
thaw is critical because the current task cannot continue and very well 
may have other dependencies. A chill operation is a much lower priorty 
task with no direct dependencies.

The timeout approach effectively assigns a lower priority to the chill 
task by deferring to the thaw task (or whatever is operating on the 
record chain.) It is not a pretty solution because it is contrary to the 
Falcon locking paradigm. Does the paradigm need to be updated to 
accommodate task priority?


> Jim Starkey wrote:
>> I suggest you consider cycle locking rather than a syncPrior lock.  
>> The basic idea is this:
>>
>>   1. A thread takes a shared lock on the cycle lock for the duration of
>>      an engine operation.
>>   2. Intermediate and trailing records are removed from record chains
>>      with an interlocked CAS.  The removed records are insert, again
>>      with an interocked CAS, into a "record purgatory".
>>   3. After a suitable period of waiting, the cycle manager flips cycle
>>      locks, waits for an exclusive lock on the previous cycle lock,
>>      then deletes records in record purgatory.
>>
>> In effect, this extends the lifetime of otherwise deleted records to 
>> the duration of the cycle.  When the cycle is complete (cycle manager 
>> gets an exclusive lock on the cycle's lock), there will be no pointers 
>> to the deleted records, and they can be safely deleted.  In the 
>> interim, however, "deleted" record's prior pointer and state remain 
>> valid.
>>
>> This gets rid of syncPrior and a lot addRef/releases.
>>
>> A skeleton for a cycle manager is something like this:
>>
>>    void XXX::cycleManager(void)
>>    {
>>        Thread *thread = Thread::getThread("XXX::cycleManager");
>>              while (!shutdownServer)
>>            {
>>            thread->sleep(CYCLE_SLEEP * 1000);
>>            SyncObject *prior = currentCycle;
>>            currentCycle = (prior == &cycle1) ? &cycle2 : &cycle1;
>>            ++cycleNumber;
>>            Sync sync(prior, "XXX::cycleManager");
>>            sync.lock(Exclusive);
>>            sync.unlock();
>>                          agingCycle(cycleType);      // this is where 
>> post cycle work
>>    gets done
>>                      }
>>                  cycleThread = NULL;
>>    }
>>
>> Kevin Lewis wrote:
>>> Chris,
>>>
>>> I thought a lot about this solution the last couple of days.  Using a 
>>> timeout as a standard expected event in handling a concurrency issue 
>>> is not the best approach, I'm sure Jim would agree.
>>>
>>> But there are extenuating circumstances.  You have shown that we need 
>>> to protect other threads that call hasData() from calling that 
>>> concurrently with an active chill thread because it could be there in 
>>> one instant and not the next. The act of chilling a record is very 
>>> intrusive and dangerous without a record mutex.  But using a record 
>>> mutex is against Jim's MVCC philosophy for many reasons which need 
>>> not be listed here. We just can't do it.  So testing and using the 
>>> syncPrior as an alternative to the record mutex is OK, I think.  
>>> Especially since there is no hard requirement to chill any particular 
>>> record.
>>>
>>> Since the action on the record is not  required, and chilling should 
>>> not happen very often, I think a timeout and move-on approach is 
>>> acceptable here.
>>>
>>> Especially since I cannot think of a better approach! :)
>>>
>>> Jim, this is your invitation to spark a better idea, or think of a 
>>> better approach.  Note that this use of syncPrior during a chill 
>>> actually works, even though it is not pretty.
>>>
>>> Kevin
>>>
>>> Christopher Powers wrote:
>>>> ...call me today 612-729-1519 or my cell, 612-616-1069. I'd like to 
>>>> discuss the changes I did for garbage collect.
>>>>
>>>> I pushed the fixes to mysql-6.0-falcon-chris on PB2, and *both* 
>>>> falcon_online_alter and falcon_chill_thaw passed. I find this to be 
>>>> very encouraging.
>>>>
>>>> Using syncPrior in chill was not straightforward solution because of 
>>>> a deadlock with a low-level thaw, but what I have works.
>>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> Subject:
>>> bzr commit into mysql-6.0-falcon-team branch 
>>> (christopher.powers:2964) Bug#42131
>>> From:
>>> Christopher Powers <Christopher.Powers@stripped>
>>> Date:
>>> Sat, 17 Jan 2009 02:23:04 -0600 (CST)
>>> To:
>>> commits@stripped
>>>
>>> To:
>>> commits@stripped
>>>
>>>
>>> #At file:///home/cpowers/work/dev/dev-03/mysql/
>>>
>>>  2964 Christopher Powers    2009-01-17
>>>       Bug #42131 "falcon_backlog test crashes in 
>>> Record::getEncodedValue"
>>>             Transaction:chill() must get an exclusive syncPrior lock 
>>> before deleting record data.
>>> modified:
>>>   storage/falcon/Record.h
>>>   storage/falcon/SRLUpdateRecords.cpp
>>>   storage/falcon/SRLUpdateRecords.h
>>>   storage/falcon/Table.h
>>>
>>> per-file messages:
>>>   storage/falcon/Record.h
>>>     Disable CHECK_RECORD_ACTIVITY. This flag has not been maintained, 
>>> and causes
>>>     invalid fatal assertions.
>>>   storage/falcon/SRLUpdateRecords.cpp
>>>     SRLUpdateRecords::chill() now gets an exclusive syncPrior lock 
>>> before
>>>     deleting the record data of a record being chilled.
>>>         To prevent a deadlock with concurrent record thaws, abandon 
>>> the chill for
>>>     any record for which the syncPrior cannot be acquired within 50ms.
>>>         The presumption is that if syncPrior is held for a record 
>>> chain, then
>>>     all records should remain fully intact.
>>>   storage/falcon/SRLUpdateRecords.h
>>>     Change return type of SRLUpdateRecords::chill() to boolean
>>>   storage/falcon/Table.h
>>>     Increase size of syncPrior and syncThaw hash table
>>> === modified file 'storage/falcon/Record.h'
>>> --- a/storage/falcon/Record.h    2009-01-09 22:00:52 +0000
>>> +++ b/storage/falcon/Record.h    2009-01-17 08:22:44 +0000
>>> @@ -27,7 +27,7 @@
>>>  #define ALLOCATE_RECORD(n)        (char*) MemMgrRecordAllocate (n, 
>>> __FILE__, __LINE__)
>>>  #define DELETE_RECORD(record)    MemMgrRecordDelete (record);
>>>  
>>> -#define CHECK_RECORD_ACTIVITY
>>> +//#define CHECK_RECORD_ACTIVITY
>>>  
>>>  #include "SynchronizationObject.h"
>>>  
>>>
>>> === modified file 'storage/falcon/SRLUpdateRecords.cpp'
>>> --- a/storage/falcon/SRLUpdateRecords.cpp    2008-11-19 17:00:02 +0000
>>> +++ b/storage/falcon/SRLUpdateRecords.cpp    2009-01-17 08:22:44 +0000
>>> @@ -28,6 +28,7 @@
>>>  #include "Sync.h"
>>>  #include "SerialLogWindow.h"
>>>  #include "Format.h"
>>> +#include "SQLError.h"
>>>  
>>>  SRLUpdateRecords::SRLUpdateRecords(void)
>>>  {
>>> @@ -37,8 +38,19 @@ SRLUpdateRecords::~SRLUpdateRecords(void
>>>  {
>>>  }
>>>  
>>> -void SRLUpdateRecords::chill(Transaction *transaction, RecordVersion 
>>> *record, uint dataLength)
>>> +bool SRLUpdateRecords::chill(Transaction *transaction, RecordVersion 
>>> *record, uint dataLength)
>>>  {
>>> +    Sync syncPrior(record->getSyncPrior(), "SRLUpdateRecords::chill");
>>> +   +    try
>>> +        {
>>> +        syncPrior.lock(Exclusive, 50);
>>> +        }
>>> +    catch (...)
>>> +        {
>>> +        return false;
>>> +        }
>>> +        // Record data has been written to the serial log, so 
>>> release the data
>>>      // buffer and set the state accordingly
>>>     @@ -55,6 +67,8 @@ void SRLUpdateRecords::chill(Transaction
>>>          transaction->totalRecordData -= dataLength;
>>>      else
>>>          transaction->totalRecordData = 0;
>>> +       +    return true;
>>>  }
>>>  
>>>  int SRLUpdateRecords::thaw(RecordVersion *record, bool *thawed)
>>> @@ -200,15 +214,16 @@ void SRLUpdateRecords::append(Transactio
>>>                      {
>>>                      int chillBytes = record->getEncodedSize();
>>>  
>>> -                    chill(transaction, record, chillBytes);
>>> -                   -                    log->chilledRecords++;
>>> -                    log->chilledBytes += chillBytes;
>>> +                    if (chill(transaction, record, chillBytes))
>>> +                        {
>>> +                        log->chilledRecords++;
>>> +                        log->chilledBytes += chillBytes;
>>>                     -                    
>>> ASSERT(transaction->thawedRecords > 0);
>>> +                        //ASSERT(transaction->thawedRecords > 0);
>>>  
>>> -                    if (transaction->thawedRecords)
>>> -                        transaction->thawedRecords--;
>>> +                        if (transaction->thawedRecords)
>>> +                            transaction->thawedRecords--;
>>> +                        }
>>>                      }
>>>                  else
>>>                      {
>>> @@ -253,9 +268,11 @@ void SRLUpdateRecords::append(Transactio
>>>                          if (chillRecords && record->state != 
>>> recDeleted)
>>>                  {
>>> -                chill(transaction, record, stream.totalLength);
>>> -                chilledRecordsWindow++;
>>> -                chilledBytesWindow += stream.totalLength;
>>> +                if (chill(transaction, record, stream.totalLength))
>>> +                    {
>>> +                    chilledRecordsWindow++;
>>> +                    chilledBytesWindow += stream.totalLength;
>>> +                    }
>>>                  }
>>>              } // next record version
>>>        
>>> === modified file 'storage/falcon/SRLUpdateRecords.h'
>>> --- a/storage/falcon/SRLUpdateRecords.h    2008-11-14 02:30:11 +0000
>>> +++ b/storage/falcon/SRLUpdateRecords.h    2009-01-17 08:22:44 +0000
>>> @@ -33,7 +33,7 @@ public:
>>>      virtual void    read(void);
>>>      virtual void    pass2(void);
>>>      void            append(Transaction *transaction, RecordVersion 
>>> *records, bool chillRecords = false);
>>> -    void            chill(Transaction *transaction, RecordVersion 
>>> *record, uint dataLength);
>>> +    bool            chill(Transaction *transaction, RecordVersion 
>>> *record, uint dataLength);
>>>      int                thaw(RecordVersion *record, bool *thawed);
>>>          const UCHAR        *data;
>>>
>>> === modified file 'storage/falcon/Table.h'
>>> --- a/storage/falcon/Table.h    2009-01-15 20:29:54 +0000
>>> +++ b/storage/falcon/Table.h    2009-01-17 08:22:44 +0000
>>> @@ -42,8 +42,8 @@ static const int PostCommit    = 128;
>>>  
>>>  static const int BL_SIZE            = 128;
>>>  static const int FORMAT_HASH_SIZE    = 20;
>>> -static const int SYNC_VERSIONS_SIZE    = 16;
>>> -static const int SYNC_THAW_SIZE        = 16;
>>> +static const int SYNC_VERSIONS_SIZE    = 32; // 16;
>>> +static const int SYNC_THAW_SIZE        = 32; // 16;
>>>  
>>>  #define FOR_FIELDS(field,table)    {for (Field *field=table->fields; 
>>> field; field = field->next){
>>>  
>>>
>>>
>>>  
>>> ------------------------------------------------------------------------
>>>
>>> Subject:
>>> Re: Bug#36631, "Assertion in SerialLogControl::nextRecord"
>>> From:
>>> Christopher Powers <Christopher.Powers@stripped>
>>> Date:
>>> Fri, 16 Jan 2009 00:47:06 -0600
>>> To:
>>> Kevin Lewis <klewis@stripped>
>>>
>>> To:
>>> Kevin Lewis <klewis@stripped>
>>> CC:
>>> FalconDev <falcon@stripped>
>>>
>>>
>>> Christopher Powers wrote:
>>>> Kevin,
>>>>
>>>> This is the reopened chill/thaw stress test bug.
>>>>
>>>> I can easily reproduce an assert in Record::getEncodedValue() that 
>>>> always occurs during a scavenge/record pruning (stack attached.)
>>>>
>>>> At the time of the assert, the record state is 'recChilled', so that 
>>>> explains why getEncodedValue() fails. It appears that the record 
>>>> 'oldie' in Index::duplicateKey() gets chilled while it's being 
>>>> accessed.
>>>>
>>>> Note that hasRecord() should have thawed the record. I added a 
>>>> second thaw to verify this, but the breakpoint never hit, so I think 
>>>> the record is being chilled during makeKey().
>>>>
>>>>
>>>> bool Index::duplicateKey(IndexKey *key, Record * record)
>>>> {
>>>>   for (Record *oldie = record; oldie; oldie =
> oldie->getPriorVersion())
>>>>     if (oldie->hasRecord())  <<<< will thaw record
>>>>       {
>>>>       IndexKey oldKey(this);
>>>>                  if (oldie->state == recChilled)
>>>>         oldie->thaw();     <<<< breakpoint never hit
>>>>                      makeKey (oldie, &oldKey); <<<<
> assertion 
>>>> here--chill< during makeKey??
>>>>
>>>>       if (oldKey.isEqual(key))
>>>>         return true;
>>>>       }
>>>>   return false;
>>>> }
>>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>
>>
Thread
Re: When you have a chance...Kevin Lewis19 Jan
  • Re: When you have a chance...Jim Starkey19 Jan
    • Re: When you have a chance...Kevin Lewis20 Jan
      • Re: When you have a chance...Christopher Powers20 Jan
      • Re: When you have a chance...Jim Starkey20 Jan
        • Re: When you have a chance...Kevin Lewis21 Jan
      • Re: When you have a chance...Jim Starkey21 Jan
        • Re: When you have a chance...Kevin Lewis21 Jan
          • RE: When you have a chance...Vladislav Vaintroub21 Jan
            • Re: When you have a chance...Jim Starkey21 Jan
            • Re: When you have a chance...Kevin Lewis21 Jan