From: Jim Starkey Date: July 2 2009 3:38pm Subject: Re: Catch and rethrow of exceptions in Thread::thread() List-Archive: http://lists.mysql.com/falcon/779 Message-Id: <4A4CD479.7000206@nimbusdb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Ann W. Harrison wrote: > Olav, >> >> In Bug #45845 "Falcon crashes while running falcon_bug_36294-big >> test" (http://bugs.mysql.com/bug.php?id=45845) Ann brings up the >> issue of that we in Thread::thread() catches an exception and the >> rethrows it immediately without any other code catching it leading to >> a process crash. I agree that this is a problem. > > > Jim, Kevin, and I talked about it yesterday. The problem is not this > double throw, but the fact that the thread (probably a gopher) is > throwing the "record memory exhausted" error and not catching it. > Jim and Kevin agreed that uncaught exceptions ought to crash the > server (not their words). I disagree. Some MySQL engines corrupt > data when they crash, so crashing the server is not socially > acceptable, even if it does make debugging easier. Ann, neither Kevin or I thought that it was acceptable either, but the place to catch it is in worker threads. The internal threading system is not the place to implement server friendly semantics, however. >> >> >> 1. The main issue is that we have an Falcon internal thread that has >> thrown an exception that is likely due to something "very serious" >> since it is not handled anywhere (or it caused by a coding bug) . > > I suspect that the gopher never expected to run out of record memory - > and I'm not really sure how it can, since, at least in my simplistic > diagram, it moves records from the serial log into the page cache > and shouldn't be mucking with the record cache at all. All it takes is a single memory allocation to find that another thread has run the pool out of memory. I hate to say it, but a gopher discovering a memory shortage should notify a responsible adult (probably SerialLog) that it is exiting, then exit. If the server recovers, SerialLog can restart gophers after the crisis has past. > >> So at this point in the code we are basically handling an exception >> that the thread's own code did not manage to handle. Unfortunately, >> as the code is now it leads to a process crash. What is the >> alternative? It is fairly easy to avoid the process crash by just >> catching the exception - but what should we do with the thread (or >> rather the lack of this thread)? Should we just restart the same >> thread? Continue without this thread? In many cases the cause for the >> exception is so serious that Falcon will not be able to continue. > > If it were me, I'd put a try/catch all on every call from the server > to avoid crashes. In debugging mode, the handler could set the machine > on fire to call attention to the problem. That's appropriate. >> >> The best we can do is probably to try to restart the crashed >> thread - and hope the issue was temporarily? The second best is >> probably to let the thread die and just continue as if nothing had >> happened... > > I think we need to return a severe error and ignore further handlers > calls. Falcon is dead, but something else might survive. >> >> 2. Another issue is that if we continue to let this lead to process >> crashes - the catch/rethrow is mostly annoying (at least for >> developers) since it leads to a call stack that just shows where the >> rethrow took place and not where the initial problem was. This makes >> it much harder to identify what was the real cause for the crash. I >> did a commit last week where I changed the logging code so that we at >> least write to the log what kind of exception this was - earlier this >> catch/rethrow was done "silently" (if there was not debug flags >> specified). >> > That's a start... > > > Cheers, > > Ann -- Jim Starkey President, NimbusDB, Inc. 978 526-1376