List: | Commits | « Previous MessageNext Message » | |
From: | Andrei Elkin | Date: | September 23 2009 1:05pm |
Subject: | Re: bzr commit into mysql-5.1-bugteam branch (zhenxing.he:2973) Bug#37148 | ||
View as plain text |
Zhen Xing, hello. As we have talked today the whole problem of the bug is proven to be serious indeed. The reporter did not mean it but eventually you showed a good way to "reproduce" the problem. >> CREATE USER test_user; >> SET GLOBAL debug="d,injecting_fault_writing"; >> REVOKE ALL PRIVILEGES, GRANT OPTION FROM test_user; >> >> And there is still assertion in Diagnostics_area::set_error_status(): >> DBUG_ASSERT(! is_set() || can_overwrite_status); So that happens without your patch. The assert issue is in that my_ok() is called after my_error(). And my_ok() in this particual execution branch is called because the caller does not have any clue that `injecting_fault_writing' error has happened. Your patch actually corrects all binlog-error-prone invocations of my_ok(). And that is good. I have one proposal. Why not assert DBUG_ASSERT(! is_error() ) in my_ok()? After all my_ok() does not mean error-to-ok state transition. More radical suggestion, that I would like to hear opinion from Kostja, maybe we can remove my_ok() altogether but rather to check is_error(), as it happens anyway in net_end_statement(), to decide how to terminate the current statement? cheers, Andrei > >>> >>> 1. as said above, to simulate an error >> >> This is easy. and there is already such a mechanism. >> >>> 2. to execute one by one all or most or even some affected query cases >>> that easy to identify >>> >> >> Actually this is the part that I think difficult to do. I think it's >> hard (and implementation dependent) to check if the return value of >> mysql_bin_log.write() is correctly propagated or not. Could you please >> provide an example on how to test this? > > As suggested let's not aim at the most comprehensive testing now. > So cases we could check could include (some) DDL:s and some DML "failing" inside > mysql_bin_log.write(). > > The client shall expect a corresponding error code. > > >>> > I don't agree it's only about optimization, fail to check the return >>> > value early can result in problem. For example, if we got an error in >>> > mysql_bin_log.write(), it will set thd->main_da->is_error(), and > if we >>> > continue the execution and then encounter another error and try to call >>> > my_error() to report it, this will cause the assertion in >>> > Diagnostics_area::set_error_status() to fail because the error status >>> > has already been set. This problem can be avoided by checking the error >>> > and quite early. >>> >>> I must confess some of my standing points have been connected to 5.4 >>> code features. >>> What you have written is actually exclusive 5.1 issue, not 5.4. >>> Multiple my_error() per query is okay in the latter version. >>> So more to the optimization the patch is trying to circumevent this >>> 5.1 limitation as well. >>> >> >> I don't think so, I run the following test and it crashes on 5.4 too: > > First, I really meant the multiple error per query feature is certainly in 6.0. > I am not sure if it in azalea, needs investigating. > > >> >> CREATE USER test_user; >> SET GLOBAL debug="d,injecting_fault_writing"; >> REVOKE ALL PRIVILEGES, GRANT OPTION FROM test_user; >> >> And there is still assertion in Diagnostics_area::set_error_status(): >> DBUG_ASSERT(! is_set() || can_overwrite_status); >> > > The same assertion happens to me on the vanilla mysql-azalea. It seems to be a > separate bug > because along the execution patch there is just one call to my_error(). > > >>> >>> Why not to make it >>> >>> > >>> >> if (thd->binlog_query(..) >>> >> { >>> >> table->file->ha_release_auto_increment(); >>> >> DBUG_RETURN(1); >>> >> } >>> >> table->file->ha_release_auto_increment(); >>> >> >>> >> >> The above should be: >> >> error= <possible handler errors> >> >> if (!error && mysql_bin_log.is_open()) >> { >> if (thd->binlog_query()) >> { >> table->file->ha_release_auto_increment(); >> DBUG_RETURN(1); >> } >> } >> table->file->ha_release_auto_increment(); >> >> if (error) >> { >> table->file->print_error(error,MYF(0)); >> DBUG_RETURN(1); >> } >> >>> error= thd->binlog_query(..) >>> table->file->ha_release_auto_increment(); >>> if (error) >>> DBUG_RETURN(error) >>> >>> one line less then? >>> >> >> I'm afraid this is not correct, please check the code above. >> > > I did not mean a copy-paste ready code, that was just a concept. > >> If we want to only have one ha_release_auto_increment() here, then we >> need a new variable like 'binlog_error', and change the code like this: >> >> error= <possible handler errors> >> >> int binlog_error= 0; >> if (!error && mysql_bin_log.is_open()) >> { >> ... >> binlog_error= thd->binlog_query(); >> } >> table->file->ha_release_auto_increment(); >> >> if (binlog_error) >> DBUG_RETURN(1); >> >> if (error) >> { >> table->file->print_error(error,MYF(0)); >> DBUG_RETURN(1); >> } > > > I see. print_error() should be called solely for engine related > not for binlog related error. > > However, > > error= binlog_error= thd->binlog_query(); > table->file->ha_release_auto_increment(); > ... > > if (error) > if (error != binlog_error) > table->file->print_error(error,MYF(0)); > DBUG_RETURN(1); > > does not have the repeating pattern. That's for your deliberation. > > >>> >> >>> >> Obviously, the old code is okay: >>> >> >>> > >>> > No, actually the old code will cause problem, because if there is >>> > already error in write_bin_log, then try to report another error by >>> > my_message will cause assertion failure. >>> >>> well, in 5.1 only. >>> >> >> See above comments. >> >>> > >>> >> if (result) >>> >> my_message() >>> >> >>> >> because the returned combined result >>> >> + DBUG_RETURN(result || binlog_error); >>> >> is not yet merged at that point. >>> >> >>> >> >>> >> === modified file 'sql/sql_base.cc' >>> >> --- a/sql/sql_base.cc 2009-06-19 08:24:43 +0000 >>> >> +++ b/sql/sql_base.cc 2009-08-20 09:53:53 +0000 >>> >> @@ -1333,7 +1333,8 @@ void close_thread_tables(THD *thd) >>> >> handled either before writing a query log event (inside >>> >> binlog_query()) or when preparing a pending event. >>> >> */ >>> >> - thd->binlog_flush_pending_rows_event(TRUE); >>> >> + /* TODO: I (hezx) think the error here should not be ignored > */ >>> >> + (void)thd->binlog_flush_pending_rows_event(TRUE); >>> >> mysql_unlock_tables(thd, thd->lock); >>> >> thd->lock=0; >>> >> } >>> >> >>> >> I don't think you need such TODO here. The error is not ignored but >>> >> rather its check-out delayed till net_end_statement(). >>> >> >>> > >>> > I mean the code should check the error and return early. >>> > >>> >>> We can say should in 5.1. But you want leave these lines in 5.4. >>> >>> >>> >>> >> Similar hunk follows: >>> >> >>> >> === modified file 'sql/sql_delete.cc' >>> >> --- a/sql/sql_delete.cc 2009-06-19 08:24:43 +0000 >>> >> +++ b/sql/sql_delete.cc 2009-08-20 09:53:53 +0000 >>> >> @@ -841,9 +841,10 @@ void multi_delete::abort() >>> >> if (mysql_bin_log.is_open()) >>> >> { >>> >> int errcode= query_error_code(thd, thd->killed == > THD::NOT_KILLED); >>> >> - thd->binlog_query(THD::ROW_QUERY_TYPE, >>> >> - thd->query, thd->query_length, >>> >> - transactional_tables, FALSE, errcode); >>> >> + /* possible error of writing binary log is ignored > deliberately */ >>> >> + (void) thd->binlog_query(THD::ROW_QUERY_TYPE, >>> >> + thd->query, > thd->query_length, >>> >> + transactional_tables, FALSE, > errcode); >>> >> >>> >> Yet another with the same comment: >>> >> >>> > >>> > This error is ignored deliberately, and is OK. >>> > >>> >> @@ -3784,7 +3794,8 @@ void select_create::abort() >>> >> select_insert::abort(); >>> >> thd->transaction.stmt.modified_non_trans_table= FALSE; >>> >> reenable_binlog(thd); >>> >> - thd->binlog_flush_pending_rows_event(TRUE); >>> >> + /* possible error of writing binary log is ignored deliberately > */ >>> >> + (void)thd->binlog_flush_pending_rows_event(TRUE); >>> >> >>> >> And this one: >>> >> >>> > >>> > here too. >>> > >>> >> >>> >> === modified file 'sql/sql_update.cc' >>> >> --- a/sql/sql_update.cc 2009-06-18 14:16:14 +0000 >>> >> +++ b/sql/sql_update.cc 2009-08-20 09:53:53 +0000 >>> >> @@ -1851,9 +1851,10 @@ void multi_update::abort() >>> >> into repl event. >>> >> */ >>> >> int errcode= query_error_code(thd, thd->killed == > THD::NOT_KILLED); >>> >> - thd->binlog_query(THD::ROW_QUERY_TYPE, >>> >> - thd->query, thd->query_length, >>> >> - transactional_tables, FALSE, errcode); >>> >> + /* the error of binary logging is ignored */ >>> >> + (void)thd->binlog_query(THD::ROW_QUERY_TYPE, >>> >>> ok >>> >>> >> >>> >> >>> >> > =============================================================================== >>> >> COMMENTS: >>> >> >>> >> This hunk seems to be okay >>> >> >>> >> @@ -1543,7 +1544,11 @@ void close_temporary_tables(THD *thd) >>> >> s_query.length() - 1 /* to remove > trailing ',' */, >>> >> 0, FALSE, 0); >>> >> thd->variables.character_set_client= cs_save; >>> >> - mysql_bin_log.write(&qinfo); >>> >> + if (mysql_bin_log.write(&qinfo)) >>> >> + { >>> >> + push_warning(thd, MYSQL_ERROR::WARN_LEVEL_ERROR, MYF(0), >>> >> + "Failed to write the DROP statement for > temporary tables to binary log"); >>> >> + } >>> >> thd->variables.pseudo_thread_id= save_pseudo_thread_id; >>> >> } >>> >> else >>> >> >>> >> though my_error() must have been done inside mysql_bin_log.write(). >>> >> I wonder if the warning is really necessary. >>> >> >>> > >>> > Agree, and since close_temporary_tables() is called when closing the >>> > current connection, so I think the warning is no necessary, will be >>> > removed. >>> > >>> >>> And moreover, this subsequent to the write's my_error() >>> push_warning() should lead to the assertion, should it not? >>> >> >> No, because push_warning will not set error status if it's already set. >> The check is done in THD::raise_condition(). >> >>> >>> >> > ============================================================================ >>> >> QUESTIONS: >>> >> > > >>> > >>> >> > ========================================================================= >>> >> REMARKS >>> >> >>> >> The base code mysql_grant() calls >>> >> >>> >> if (!result) >>> >> { >>> >> write_bin_log(thd, TRUE, thd->query, thd->query_length); >>> >> } >>> >> >>> >> rw_unlock(&LOCK_grant); >>> >> close_thread_tables(thd); >>> >> >>> >> if (!result) >>> >> my_ok(thd); >>> >> >>> >> my_ok() not after examining main_da->is_error(). Sounds like an > issue. >>> >> >>> >> Similarly in mysql_create_db() and some more places. >>> >> >>> > >>> > if (!result) means there is no error, and main_da->is_error() must > be >>> > false in this case. >>> > >>> It's just a hope. net_end_statement() does it right to separate >>> failure from ok via testing main_da. >>> Just one little argument about the error propagation via integer >>> return value. It just can't work with functions that return an integer >>> informal results. Like random(). if that could call my_error() inside >>> we would have to examine main_da not -1 of the return value. >>> > >> I think there is a difference between the return value of a function and >> the error in main_da, main_da->is_error() means there is an error that >> causes current statement to fail, while the return value of a function >> (if the return value indicates success or failure) only means if there >> is a problem when executing that function or not. it may or may not set >> main_da->is_error() when the function fails. > > I would agree, but only in principle. In practice if our code's function fail to > read > it does not supply a valid result to the caller we need to issue my_error() or smth. > And my point is you can't delegate that to the caller. Therefore practically > it's inevitable to have a stack of errors reported. > >> But on the other hand, if >> the function succeeded, it should never set main_da->is_error(). > > agreed. > >> I think >> this must be followed, otherwise a lot of the code would be break on >> this. >> >> And I think check the return value of the function and do error handling >> as early as possible should be a coding style that we should follow. >> > > Yes, callers have to examine main_da at their critical execution points. > And I agree with `as early as possible'. However, `possible' > implies subjective thinking. Most of error can be tolerated until > deciding whether to commit or what send back to the user: net_end_statement(). > > >> For functions like random(), the return value does not indicate failure >> or success, I think this is another issue. >> >> And let's suppose random() would call set is_error() when it failed, >> then I think you should check is_error() **immediately** after calling >> random(), not postpone the check to the end of the execution. >> >> ------- I think this is correct ---------------- >> val= random(); >> if (is_error()) >> // do error handling >> else >> // use the 'val' to continue >> ---------------------------------------------------- >> >> --------- I think this is not ---------------------- >> val= random(); >> // use the 'val' to continue >> .... >> if (is_error()) >> // do error handling >> > > Immediately is too strong. Checking is enough to do at critical points.