List:Falcon Storage Engine« Previous MessageNext Message »
From:Philip Stoev Date:April 9 2009 7:32am
Subject:Re: Recovery bugs, classified by me
View as plain text  
> Category: Undebuggable one time glitch.
> http://bugs.mysql.com/bug.php?id=39703
> Bug #39703 Assertion in Section.cpp failed during recovery after
> falcon_limit test
> Short description. seen in September. Crash dump is not usable for anyone
> but Olav (Sparc, wrong endian).
> Possible fix:
>  Close as none-reproducible

Yes I can confirm we have not had that recently. However I object to the 
"wrong endian" comment. Falcon does not like Sparc, 5.4 will not be released 
on Windows, so what are we left with - Linux only? I suggest that whoever 
feels the least aversion to GDB at least tries to replay the recovery before 
closing this bug.

> Category: Won't fix
> http://bugs.mysql.com/bug.php?id=39130
> Bug #39130 Unbounded serial log growth with online ALTER
> Short description: discussion on how gophers can lag behind
> Suspected cause: gophers lag behind
> Suggested fix: won't fix (works as designed)

I am afraid this is a valid bug. It is not about the gophers lagging behind, 
since, as the original comment says, all gopher threads have *gone to sleep* 
and the serial log files are not getting shrunk. I repeated the test today, 
same results (see my latest comment on the bug 39130 )

It is OK however to re-triage this bug to a lower SR tag.

> http://bugs.mysql.com/bug.php?id=36993
> Bug#39139 Falcon reports Index SCHEDULE..PRIMARY_KEY in SYSTEM.SCHEDULE
> damaged
> Suspected cause:
> Kill -9 before system tables were completely created.
> Suggested fix: won't fix (good workaround)
> Workaround: delete all falcon spaces and serial logs.

Note that this is just an error printed in the log, the database continues 
to run. Therefore "delete all falcon tablespaces" is not a good workaround 
because a person may not even notice the problem, since it does not reveal 
itself in a crash. God knows what else is also damaged.

Also, the kill -9 did not happen while the server was starting up. The 
server had already started and databases and tables were created by the time 
the kill -9 arrived. Therefore, it is not about "killing before system 
tables were completely created", it may be about "killing before gophers 
applied all serial log events related to system tables".

So, this remains a valid bug for me. I do intend to test recovery 
systematically with kill -9 immediately after server startup, so a decision 
and a solution must be implemented for that one. Maybe the solution is to do 
extra checkpoints after creating the system tables and waiting for the 
gophers to write everything to disk.

Philip Stoev 

Thread
Blobs, earlyWrite, and All ThatJim Starkey9 Apr
  • Recovery bugs, classified by meVladislav Vaintroub9 Apr
    • PAGE IO bugs, classified by meVladislav Vaintroub9 Apr
    • Recovery and Page IO bugs classified by VladChristopher Powers11 Apr
  • Re: Recovery bugs, classified by mePhilip Stoev9 Apr
    • RE: Recovery bugs, classified by meVladislav Vaintroub9 Apr
      • Re: Recovery bugs, classified by meJames Day13 Apr
        • Re: Recovery bugs, classified by meKevin Lewis13 Apr
    • RE: Recovery bugs, classified by meVladislav Vaintroub9 Apr
    • Re: Recovery bugs, classified by meChristopher Powers11 Apr
      • Re: Recovery bugs, classified by meJames Day13 Apr
  • Re: Recovery bugs, classified by mePhilip Stoev9 Apr
    • RE: Recovery bugs, classified by meVladislav Vaintroub9 Apr
  • Re: Recovery bugs, classified by mePhilip Stoev9 Apr
  • Re: Recovery bugs, classified by mePhilip Stoev11 Apr