Alfranio Correia wrote:
> Hi Jasonh,
>
> Great work.
>
> Two major comments:
> 1 - How about a test a case? You don't need to create a complete test
> case in the context
> of this bug but I think you should create a worklog to handle this later
> and write something
> for this bug.
Hmm, Actually I don't know how to write a test case for this :(, maybe
I'll just create a WL for this.
>
> 2 - I think we should also try to make uniform the use of binlog_query,
> write_bin_log, etc, etc.
> Let's adopt one of them and change the calls.
Good point, I'll try
>
> Find more comments below:
>
>
> === modified file 'sql/sp_head.cc'
> --- a/sql/sp_head.cc 2009-05-30 13:32:28 +0000
> +++ b/sql/sp_head.cc 2009-07-13 04:01:35 +0000
> @@ -1788,6 +1788,7 @@ sp_head::execute_function(THD *thd, Item
> push_warning(thd, MYSQL_ERROR::WARN_LEVEL_WARN, ER_UNKNOWN_ERROR,
> "Invoked ROUTINE modified a transactional table
> but MySQL "
> "failed to reflect this change in the binary log");
> + err_status= TRUE;
> }
>
>
> I am not sure if just setting the err_status is enough.
>
err_status will be returned by sp_head::execute_function, and should be
check by the called of this function.
>
> === modified file 'sql/sql_delete.cc'
> --- a/sql/sql_delete.cc 2009-06-19 08:24:43 +0000
> +++ b/sql/sql_delete.cc 2009-07-13 04:01:35 +0000
> @@ -841,9 +841,10 @@ void multi_delete::abort()
> int errcode= query_error_code(thd, thd->killed == THD::NOT_KILLED);
> - thd->binlog_query(THD::ROW_QUERY_TYPE,
> - thd->query, thd->query_length,
> - transactional_tables, FALSE, errcode);
> + /* possible error of writing binary log is ignored */
> + (void) thd->binlog_query(THD::ROW_QUERY_TYPE,
> + thd->query, thd->query_length,
> + transactional_tables, FALSE, errcode);
>
> There is no problem in ignoring the error at this point. Note that
> an incident log will be written to the binary log.
>
Yes, that's why I added the comment, and add (void) explicitly. Maybe I
should add 'deliberately' to the comment to make it more clear.
>
> === modified file 'sql/sql_insert.cc'
> --- a/sql/sql_insert.cc 2009-06-22 14:01:42 +0000
> +++ b/sql/sql_insert.cc 2009-07-13 04:01:35 +0000
> @@ -3214,9 +3216,13 @@ bool select_insert::send_eof()
>
> thd->clear_error();
> else
> errcode= query_error_code(thd, killed_status == THD::NOT_KILLED);
> - thd->binlog_query(THD::ROW_QUERY_TYPE,
> - thd->query, thd->query_length,
> - trans_table, FALSE, errcode);
> + if (thd->binlog_query(THD::ROW_QUERY_TYPE,
> + thd->query, thd->query_length,
> + trans_table, FALSE, errcode))
> + {
> + table->file->ha_release_auto_increment();
> + DBUG_RETURN(1);
> + }
> }
> table->file->ha_release_auto_increment();
>
> Why don't you just set the error variable?
>
That would call table->file->print_error(), which I think is not correct for binlog
error.
>
> I am not sure if this is completely safe since you are trying to log it
> despite the
> presence of a previous error. So if we fail logging the changes, what
> are the bad
> things that may happen on the slave?
Hmm, you're right, it is suspicious, seems like it should check if there
is no error or have modified non_trans tables.
>
>
> void select_create::store_values(List<Item> &values)
> @@ -3784,7 +3793,8 @@ void select_create::abort()
> select_insert::abort();
> thd->transaction.stmt.modified_non_trans_table= FALSE;
> reenable_binlog(thd);
> - thd->binlog_flush_pending_rows_event(TRUE);
> + /* possible error of writing binary log is ignored */
> + (void)thd->binlog_flush_pending_rows_event(TRUE);
>
> if (m_plock)
> {
>
> I don't see any problem in ignoring the error. Howerver, I am wondering
>
Same here, will add 'deliberately' here to make it clear.
> why we
> need a flush at this point. Can you explain why?
>
This line was added by the fix to bug#35762, but I have to say I do not
understand the description in the commits :( We can ask Andrei to make
sure :)
>
> === modified file 'sql/log.cc'
> --- a/sql/log.cc 2009-06-19 08:24:43 +0000
> +++ b/sql/log.cc 2009-07-13 04:01:35 +0000
> @@ -1446,7 +1446,7 @@ binlog_end_trans(THD *thd, binlog_trx_da
> if (all || !(thd->options & (OPTION_BEGIN | OPTION_NOT_AUTOCOMMIT)))
> {
> if (trx_data->has_incident())
> - mysql_bin_log.write_incident(thd, TRUE);
> + error= mysql_bin_log.write_incident(thd, TRUE);
> trx_data->reset();
> }
> else // ...statement
> @@ -4433,10 +4433,10 @@ bool MYSQL_BIN_LOG::write_incident(THD *
> Incident_log_event ev(thd, incident, write_error_msg);
> if (lock)
> pthread_mutex_lock(&LOCK_log);
> - ev.write(&log_file);
> - if (lock)
> + error= ev.write(&log_file);
> + if (!error && lock)
>
> There is an error in this point. The mutex will not be released
> if the write fails.
>
You're right, will be fixed.
>
> {
> - if (!error && !(error= flush_and_sync()))
> + if (!(error= flush_and_sync()))
> {
> signal_update();
> rotate_and_purge(RP_LOCK_LOG_IS_ALREADY_LOCKED);
>
> == modified file 'sql/log_event.cc'
> --- a/sql/log_event.cc 2009-06-18 17:58:56 +0000
> +++ b/sql/log_event.cc 2009-07-13 04:01:35 +0000
> @@ -5784,7 +5784,7 @@ Slave_log_event::Slave_log_event(const c
>
>
> int Slave_log_event::do_apply_event(Relay_log_info const *rli)
> {
> if (mysql_bin_log.is_open())
> - mysql_bin_log.write(this);
> + return mysql_bin_log.write(this);
> return 0;
> }
> #endif /* !MYSQL_CLIENT */
>
> Is the class Slave_log_event used?
> Can you show me where?
No, I think this event is never used, looks like this class was designed
for failsafe replication.
I think we can considering remove all the related codes, but I think
I'll not do it in this patch.
> I am not sure if you should return TRUE/FALSE or the actual
> error code.
>
I think both are OK, no strong opinion in this case.
>
> - error= ha_autocommit_or_rollback(thd, 0);
> + error|= ha_autocommit_or_rollback(thd, error);
>
> You should also change the ha_autocommit_or_rollback as done
> by you in the next hunk :).
>
Right, will fix it.
>
> === modified file 'sql/log_event_old.cc'
> --- a/sql/log_event_old.cc 2009-02-13 16:41:47 +0000
> +++ b/sql/log_event_old.cc 2009-07-13 04:01:35 +0000
> @@ -1541,7 +1541,15 @@ int Old_rows_log_event::do_apply_event(R
> NOTE: For this new scheme there should be no pending event:
> need to add code to assert that is the case.
> */
>
> - thd->binlog_flush_pending_rows_event(false);
> + error= thd->binlog_flush_pending_rows_event(false);
> + if (error)
> + {
> + rli->report(ERROR_LEVEL, ER_SLAVE_FATAL_ERROR,
> + ER(ER_SLAVE_FATAL_ERROR),
> + "call to binlog_flush_pending_rows_event() failed");
> + thd->is_slave_error= 1;
> + DBUG_RETURN(error);
> + }
> TABLE_LIST *tables= rli->tables_to_lock;
> close_tables_for_reopen(thd, &tables);
>
> I think you should return the error code and not only TRUE or FALSE.
> Can you check this?
>
??? I think that's what the code does, isn't it?
>
> === modified file 'sql/mysql_priv.h'
> --- a/sql/mysql_priv.h 2009-06-05 11:23:58 +0000
> +++ b/sql/mysql_priv.h 2009-07-13 04:01:35 +0000
> @@ -1025,8 +1025,8 @@ check_and_unset_inject_value(int value)
>
> #endif
>
> -void write_bin_log(THD *thd, bool clear_error,
> - char const *query, ulong query_length);
> +int write_bin_log(THD *thd, bool clear_error,
> + char const *query, ulong query_length);
>
> /* sql_connect.cc */
> int check_user(THD *thd, enum enum_server_command command,
>
>
> Here we have a funny function. It is defined in the current file,
> used by several modules, implemented in the sql/sql_table.cc and
> calls functions in the log.cc :)
>
> We need to do a refactory here. I don't know if in the context
> of this bug. What do you think?
>
Yes, I noticed that, I'll try to see if we can make it better.
>
>
> === modified file 'sql/sql_update.cc'
> --- a/sql/sql_update.cc 2009-06-18 14:16:14 +0000
> +++ b/sql/sql_update.cc 2009-07-13 04:01:35 +0000
> @@ -1851,9 +1851,10 @@ void multi_update::abort()
> into repl event.
> */
> int errcode= query_error_code(thd, thd->killed == THD::NOT_KILLED);
> - thd->binlog_query(THD::ROW_QUERY_TYPE,
> - thd->query, thd->query_length,
> - transactional_tables, FALSE, errcode);
> + /* the error of binary logging is ignored */
> + (void)thd->binlog_query(THD::ROW_QUERY_TYPE,
> + thd->query, thd->query_length,
> + transactional_tables, FALSE, errcode);
> }
> thd->transaction.all.modified_non_trans_table= TRUE;
> }
>
> Do you plan to change this? I hope not because I think it is ok as
> it is.
>
Yes, it's OK to ignore the error :)
>
> === modified file 'sql/sql_view.cc'
> --- a/sql/sql_view.cc 2009-06-19 08:24:43 +0000
> +++ b/sql/sql_view.cc 2009-07-13 04:01:35 +0000
> @@ -662,8 +662,9 @@ bool mysql_create_view(THD *thd, TABLE_L
>
> buff.append(views->source.str, views->source.length);
>
>
> int errcode= query_error_code(thd, TRUE);
> - thd->binlog_query(THD::STMT_QUERY_TYPE,
> - buff.ptr(), buff.length(), FALSE, FALSE, errcode);
> + if (thd->binlog_query(THD::STMT_QUERY_TYPE,
> + buff.ptr(), buff.length(), FALSE, FALSE,
> errcode))
> + res= TRUE;
> }
>
> We need to discuss what should be the approach because the write to
> binary log
> failed but the view was created. We should try to write an incident event
> to the binary log or if possible rollback the created view. So I would
> classify
> this case as one of the difficult cases.
>
> Although we can change the code as suggested above, we will never have a
> transactional
> behavior.
>
I think we do not need to do that, because if we failed to write the
event of this CREATE VIEW to binary log, then try to write an incident
event to binary log would most like fail as well. And since DDLs cannot
be rolled back, so I think we do not need to handle this case (as you
also pointed out, we will never have a transactional behavior for this).
>
> VOID(pthread_mutex_unlock(&LOCK_open));
> @@ -1653,7 +1654,8 @@ bool mysql_drop_view(THD *thd, TABLE_LIS
> /* if something goes wrong, bin-log with possible error code,
> otherwise bin-log with error code cleared.
> */
> - write_bin_log(thd, !something_wrong, thd->query, thd->query_length);
> + if (write_bin_log(thd, !something_wrong, thd->query,
> thd->query_length))
> + something_wrong= 1;
> }
>
> VOID(pthread_mutex_unlock(&LOCK_open));
>
> Note that a similar problem happens in here and in other modules.
As I pointed out in my previous comment, I think we do not need to
handle all these cases in this patch.
>
> Specifically,
> in the
> . sql/sql_view.cc
> . sql/sql_udf.cc
> . sql/sql_trigger.cc
> . sql/sql_tablespace.cc
> . sql/sql_table.cc
> . sql/sql_rename.cc
> . sql/sql_parse.cc, the following cases are troublesome:
> SQLCOM_FLUSH, SQLCOM_DROP_PROCEDURE, SQLCOM_DROP_FUNCTION.
> . sql/sql_delete.cc, the truncate case is troublesome (i.e. mysql_truncate).
> . sql/sql_db.cc
> . sql/sql_base.cc, the drop of a temporary table.
> . sql/sql_acl.cc
> . sql/sp.cc
>
> I am not sure about:
>
> . sql/sql_partition.cc
> . sql/ha_ndbcluster_binlog.cc
>
> Please, check them carefully.
>
I think the change to sql_partition.cc should be pretty safe, but I have
to admit that I'm not 100% sure about the changes to
ha_ndbcluster_binlog.cc, we can ask Mats and the cluster team for
advice, if unsure, we probably should not include these in this patch.
> Cheers
>