List:Falcon Storage Engine« Previous MessageNext Message »
From:Philip Stoev Date:April 9 2009 10:09am
Subject:Re: Recovery bugs, classified by me
View as plain text  
I do not think a 30-second data loss is very acceptable :-)

If two consequtive forced checkpoints or some other (simple) trick will 
reduce the window, then let's go for it.

Note that by default mysqld is being automatically restarted at every crash 
by the safe_mysqld script. This means that a customer could easily rack up 
repeated restarts and recoveries without even noticing.


Philip Stoev

----- Original Message ----- 
From: "Vladislav Vaintroub" <wlad@stripped>
To: "'Philip Stoev'" <pstoev@stripped>; "'FalconDev'" 
<falcon@stripped>
Sent: Thursday, April 09, 2009 12:59 PM
Subject: RE: Recovery bugs, classified by me


>
>
>> -----Original Message-----
>> From: Philip.Stoev@stripped [mailto:Philip.Stoev@stripped] On Behalf Of
>> Philip Stoev
>> Sent: Thursday, April 09, 2009 10:46 AM
>> To: Vladislav Vaintroub; 'FalconDev'
>> Subject: Re: Recovery bugs, classified by me
>>
>> >> So, this remains a valid bug for me. I do intend to test recovery
>> >> systematically with kill -9 immediately after server startup, so a
>> >> decision
>> >> and a solution must be implemented for that one. Maybe the solution
>> is
>> >> to do
>> >> extra checkpoints after creating the system tables and waiting for
>> the
>> >> gophers to write everything to disk.
>> >
>> > And what you do if you kill before checkpoint has run?
>>
>> It appears to me that the current behavior is as follows:
>>
>> 1. Falcon starts up, system tables are created in memory
>> 2. Server becomes available for connections
>> 3. Queries start arriving
>> 4. A scheduled checkpoint arrives, the gophers write the system tables
>> to
>> disk, etc.
>>
>> If there is a crash in Step #3, you can not use a workaround "delete
>> tablespaces and start from scratch", because you would loose the
>> transactions that were issued by the users.
>
> If step3 took < 30 seconds, I'd think "delete tablespaces and start from
> scratch" is still a reasonable workaround. We are not talking about lost
> terabytes of user data, do we?
>
>
> 

Thread
Blobs, earlyWrite, and All ThatJim Starkey9 Apr
  • Recovery bugs, classified by meVladislav Vaintroub9 Apr
    • PAGE IO bugs, classified by meVladislav Vaintroub9 Apr
    • Recovery and Page IO bugs classified by VladChristopher Powers11 Apr
  • Re: Recovery bugs, classified by mePhilip Stoev9 Apr
    • RE: Recovery bugs, classified by meVladislav Vaintroub9 Apr
      • Re: Recovery bugs, classified by meJames Day13 Apr
        • Re: Recovery bugs, classified by meKevin Lewis13 Apr
    • RE: Recovery bugs, classified by meVladislav Vaintroub9 Apr
    • Re: Recovery bugs, classified by meChristopher Powers11 Apr
      • Re: Recovery bugs, classified by meJames Day13 Apr
  • Re: Recovery bugs, classified by mePhilip Stoev9 Apr
    • RE: Recovery bugs, classified by meVladislav Vaintroub9 Apr
  • Re: Recovery bugs, classified by mePhilip Stoev9 Apr
  • Re: Recovery bugs, classified by mePhilip Stoev11 Apr