Thanks for the review, and much good advice, Rafal. Some responses inline.
Rafal Somla wrote:
> STATUS
> ------
> Changes requested
>
> REQUESTS
> --------
> 1. Fix the problem with errors only partially logged by internal server
> functions (see below).
It seems to me that it would be better if backup/restore picked up the
already logged error, than to log a new one. I must admit that I do
not quite like this logging at different levels. It would be better
if errors where reported back to some central "brain", e.g
do_backup()/do_restore, where it was decided what to actually log.
That would make easier to do consistent reporting. Like it is now, I
think it is very likely that same error is logged several times (using
different error codes).
> 2. Change the way errors in Global_iterator constructor are indicated.
> Also, make Backup_info::get_global() return NULL if the iterator could
> not be correctly constructed, instead of checking this outside.
Agree. I will do this.
> 3. Change the error message for binlog position read operation since it
> can happen only during BACKUP operation.
OK. Did not know that. Will do.
> 4. Try to consistently use "// Never errors" comment.
Will do.
> 5. Add information in Backup_restore_ctx::lock_tables_for_restore()
> that mk_table_list() logs errors (so we don't have to do it there).
Will do.
> 6. In Backup_restore_ctx::close() make sure that m_error is set as in
> the original code.
This part I do not quite understand. I do not see how setting m_error
would help. As far as I can tell, the only thing that happens after
close on failure, is that send_error() is called. To affect
send_error(), I think you need to register the error with the logger.
> 7. In Inpit_stream::close() document return values.
Will do.
>
> COMMENTARY
> ----------
>
> In few places you rely on internal functions (such as memory allocators)
> logging errors by pushing them on the error stack. This is not enough
> for our purposes. In online backup the primary destination of error
> messages are the online_backup log tables in mysql database. We should
> assume that something is logged only if backup::Logger::report_error()
> has been called. Only this ensures that the message has reached all the
> required destinations (including the error stack).
>
> So, if a memory allocation function has pushed an error on the error
> stack, this is not yet reported from the perspective of online backup.
> The code which called the allocation function should detect the error
> and log it using the backup::Logger class.
>
>
> Another issue: You wrote that init_alloc_root() never errors. This is
> strange - what if there is not enough memory in the system?
From documentation:
Altough error can happen during execution of this function if
pre_alloc_size is non-0 it won't be reported. Instead it will be
reported as error in first alloc_root() on this memory root.
> SUGGESTIONS
> -----------
>
> 1. Inside Backup_restore_ctx::restore_triggers_and_events() do validity
> checks after DBUG_ENTER().
Make sense. Will do.
>
> 2. Don not interrupt if close() fails inside Stream::open() methods.
Generally, if we do not know a reason for why it is OK for close() to
fail, i do not think we should not continue. A fail-fast strategy
usually makes it much easier to debug errors. However, since we do
not report why open fails anyway, I guess it should be OK to ignore
close failures.
>
> 3. Add new error messages at the end of errmsg.txt
I will.
>
> DETAILS
> -------
>
...
>> @@ -1253,10 +1249,9 @@ int Backup_pump::pump(size_t *howmuch)
>>
>> if (m_buf.size > 0)
>> mode= WRITING;
>> - else
>> - // FIXME: detect errors (perhaps ensure that drop_buf never
>> errors).
>> - // FIXME: error logging.
>> - m_bw.drop_buf(m_buf);
>> + else {
>> + m_bw.drop_buf(m_buf); // Never errors
>> + }
>
> The braces violate our coding conventions. Not that I care much about
> that but others might...
Will fix this. Old habits are difficult to change ...
...
>> @@ -811,24 +807,21 @@ void Image_info::save_binlog_pos(const :
>> inline
>> Image_info::Db_iterator* Image_info::get_dbs() const
>> {
>> - // FIXME: error logging (in case allocation fails).
>> - return new Db_iterator(*this);
>> + return new Db_iterator(*this); // Let caller handle allocation
>> failures
>
> And do the callers handle these failures? This is perhaps a question to
> check when you have another scan of the code.
>
I think I have already checked this, but Chuck was not happy with the
way I did this. He wanted the error to be checked where it happens,
and I see his point. So maybe I will change this ... However, caller
would still have to handle that null values are returned. So I am not
sure how much I achieve.
...
>> @@ -791,6 +782,7 @@ int Backup_restore_ctx::unlock_tables()
>> */ int Backup_restore_ctx::close()
>
> The original implementation of this method was setting m_error member
> which has a meaning - it indicates that the context object is in error
> state. Your patch changes this and I'm not convinced that this will have
> no bad effects.
>
>> {
>> + int error= 0;
>> if (m_state == CLOSED)
>> return 0;
>>
>> @@ -810,15 +802,10 @@ int Backup_restore_ctx::close()
>> }
>>
>> // unlock tables if they are still locked
>> -
>> - // FIXME: detect errors if reported.
>> - unlock_tables();
>> + unlock_tables(); // Never errors
>>
>> // unfreeze meta-data
>> -
>> - // FIXME: detect errors if reported.
>> - // FIXME: error logging.
>> - obs::ddl_blocker_disable();
>> + obs::ddl_blocker_disable(); // Never errors
>>
>> // restore thread options
>>
>> @@ -826,10 +813,11 @@ int Backup_restore_ctx::close()
>>
>> // close stream
>>
>> - if (m_stream)
>> - // FIXME: detect errors if reported.
>> - // FIXME: error logging.
>> - m_stream->close();
>> + if (m_stream && !m_stream->close())
>> + {
>> + // Note error, but complete clean-up
>> + error= ER_BACKUP_CLOSE;
>> + }
>>
>> if (m_catalog)
>> m_catalog->save_end_time(when); // Note: no errors.
>> @@ -861,19 +849,27 @@ int Backup_restore_ctx::close()
>> */
>> if (res && my_errno != ENOENT)
>> {
>> - report_error(ER_CANT_DELETE_FILE, m_path, my_errno);
>> - if (!m_error)
>> - m_error= ER_CANT_DELETE_FILE;
>> + error= ER_CANT_DELETE_FILE;
>
> Originally I was setting m_error to ER_CANT_DELETE_FILE only if no error
> was detected earlier. This is because in the situation when we have
> detected an error and then have another error while deleting the file,
> it is more important to report the first error IMO.
As I said, I do not think m_error will be picked up either.
--
Øystein