Ann W. Harrison wrote:
> Olav,
>>
>> In Bug #45845 "Falcon crashes while running falcon_bug_36294-big
>> test" (http://bugs.mysql.com/bug.php?id=45845) Ann brings up the
>> issue of that we in Thread::thread() catches an exception and the
>> rethrows it immediately without any other code catching it leading to
>> a process crash. I agree that this is a problem.
>
>
> Jim, Kevin, and I talked about it yesterday. The problem is not this
> double throw, but the fact that the thread (probably a gopher) is
> throwing the "record memory exhausted" error and not catching it.
> Jim and Kevin agreed that uncaught exceptions ought to crash the
> server (not their words). I disagree. Some MySQL engines corrupt
> data when they crash, so crashing the server is not socially
> acceptable, even if it does make debugging easier.
Ann, neither Kevin or I thought that it was acceptable either, but the
place to catch it is in worker threads. The internal threading system
is not the place to implement server friendly semantics, however.
>>
>>
>> 1. The main issue is that we have an Falcon internal thread that has
>> thrown an exception that is likely due to something "very serious"
>> since it is not handled anywhere (or it caused by a coding bug) .
>
> I suspect that the gopher never expected to run out of record memory -
> and I'm not really sure how it can, since, at least in my simplistic
> diagram, it moves records from the serial log into the page cache
> and shouldn't be mucking with the record cache at all.
All it takes is a single memory allocation to find that another thread
has run the pool out of memory.
I hate to say it, but a gopher discovering a memory shortage should
notify a responsible adult (probably SerialLog) that it is exiting, then
exit. If the server recovers, SerialLog can restart gophers after the
crisis has past.
>
>> So at this point in the code we are basically handling an exception
>> that the thread's own code did not manage to handle. Unfortunately,
>> as the code is now it leads to a process crash. What is the
>> alternative? It is fairly easy to avoid the process crash by just
>> catching the exception - but what should we do with the thread (or
>> rather the lack of this thread)? Should we just restart the same
>> thread? Continue without this thread? In many cases the cause for the
>> exception is so serious that Falcon will not be able to continue.
>
> If it were me, I'd put a try/catch all on every call from the server
> to avoid crashes. In debugging mode, the handler could set the machine
> on fire to call attention to the problem.
That's appropriate.
>>
>> The best we can do is probably to try to restart the crashed
>> thread - and hope the issue was temporarily? The second best is
>> probably to let the thread die and just continue as if nothing had
>> happened...
>
> I think we need to return a severe error and ignore further handlers
> calls. Falcon is dead, but something else might survive.
>>
>> 2. Another issue is that if we continue to let this lead to process
>> crashes - the catch/rethrow is mostly annoying (at least for
>> developers) since it leads to a call stack that just shows where the
>> rethrow took place and not where the initial problem was. This makes
>> it much harder to identify what was the real cause for the crash. I
>> did a commit last week where I changed the logging code so that we at
>> least write to the log what kind of exception this was - earlier this
>> catch/rethrow was done "silently" (if there was not debug flags
>> specified).
>>
> That's a start...
>
>
> Cheers,
>
> Ann
--
Jim Starkey
President, NimbusDB, Inc.
978 526-1376