Hi Alfranio,
please have a look at the patch (from 21st Apr.):
http://lists.mysql.com/commits/106164
It should address all your comments.
Thanks,
Luís
On Fri, 2010-04-23 at 16:36 +0100, Alfranio Correia wrote:
> Hi Luis,
>
> See some comments in-line.
>
> Cheers.
>
> Luís Soares wrote:
> > Hi Alfranio,
> >
> > Before proceeding some considerations about the server
> > behavior before this patch and when tested against the
> > tests cases presented:
> >
> > case #1: Master and slave(s) become out of sync.
> >
> > case #2: Master and first slave in sync, but third slave is out
> > of sync.
> >
> > case #3: triggers assertion in
> > #7 0x00000000007ef82d in Rows_log_event::unpack_current_row
> > (this=0x32f95d8, rli=0x331ac60, cols=0x32f9660,
> > abort_on_warning=false) at log_event.h:3631
> >
> > case #4: although AI has data only for columns that slave does
> > not know about (thence it should do nothing), it still
> > reports an error.
> >
> > Last_SQL_Error Could not execute Update_rows event on table test.t1;
> > Can't find record in 't1', Error_code: 1032; handler error
> > HA_ERR_END_OF_FILE; the event's master log mysqld-bin.000001,
> > end_log_pos 555
> >
> > case #5: there is no bug here. We are just adding the test case
> > to cover this case. In this case, despite the fact that
> > there is valid data for columns on the slave's table, the
> > slave cannot find the record because there is no valid
> > data in the BI to use when searching for the record.
> > Thence we just test that the slave stops (by adding the
> > test case).
> >
> > Some more comments inline.
> >
> > On Tue, 2010-04-20 at 17:33 +0100, Alfranio Correia wrote:
> >> Hi Luis,
> >>
> >> Not approved.
> >>
> >> The code seems ok but we need to improve the comments and clarify
> >> some points. See comments in-line
> >>
> >> Cheers.
> >>
> >> Luis Soares wrote:
> >>> #At
> file:///home/lsoares/Workspace/bzr/work/features/wl5092/mysql-5.1-rpl-wl5092/ based on
> revid:luis.soares@stripped
> >>>
> >>> 3186 Luis Soares 2010-03-31
> >>> BUG#49100: RBR: Unexpected behavior when AI contains no usable
> >>> data for slave columns
> >>>
> >>> RBR was not fully compliant with empty before and after
> >>> images (BI and AI) in row events. Several faulty scenarios
> >>> surfaced during WL#5092 implementation, especially when MINIMAL
> >>> images were used:
> >>>
> >>> S1. Inserting an empty row on the master would cause slave to
> >>> skip the event (empty AI);
> >>>
> >>> Expected: slave(s) shall apply the insert and fill its
> >>> table with default values as master did.
> >>>
> >>> S2. Master inserts values just for columns that slave doesn't
> >>> have (unusable AI at the slave).
> >> Please, describe what was the bug. Most likely, it is equivalent to S1
> >> but it is better to make this clear.
> >
> > No it's not the same thing. In S1 the event is skipped, while in
> > S2 the event is not skipped.
>
> As you said S1 and S2 go out of sync.
> This is exactly what I've requested to be added to the changeset comments and
> what I meant by equivalent.
>
> >
> >>>
> >>> Expected: slave(s) shall apply the insert and fill its
> >>> table with default values as master did for columns that
> >>> user did not provide data.
> >>>
> >>> S3. Master updates columns that only it knows about (unusable
> >>> AI at the slave) but references columns that it shares with
> >>> the slave (usable BI at the slave). This would trigger an
> >>> assertion on a third slave in chained replication: m ->
> s1
> >>> -> s2, s2 would abort.
> >>>
> >>> Expected: slave s1 shall not update any row and slave s2
> >>> shall not assert.
> >>>
> >>> S4. Master updates and references columns that only it knows
> >>> about (unusable AI and BI at the slave).
> >>
> >> The same here. See above.
> >
> > No this (S4) is the case the master uses for reading and writing columns
> > that only it knows about. Whereas the previous (S3) is master reads
> > columns that the slave knows about but writes into columns that only
> > master is aware of...
>
> I just asked to describe the bug. I did not say that they are equivalent.
>
> "The same here. See above" means "please, describe the visible effects of the
> bug.", i.e. slave stops, crashes, whatever happens.
>
> >
> >>>
> >>> Expected: slave(s) shall skip the event and not update its
> >>> tables with any value.
> >>>
> >>> S5. Master references columns that only it knows
> >>> about (unusable BI at the slave) and updates columns shared
> >>> with the slave (usable AI at the slave). Slave must stop.
> >>>
> >> What was the error before the patch?
> >
> > No error (in WL#5092 tree, have not tested in legacy 6.0), we're
> > just adding test coverage in this patch.
>
>
> ok. Add this to the comments.
>
> >
> >>> Expected: slave(s) shall fail with HA_ERR_END_OF_FILE.
> >>>
> >>> We fix S1 to S5 by relaxing some asserts that were triggered on
> >>> when processing empty images. Additionally, on the
> >>> Update_rows_log_event::do_exec_row we deployed a check the
> >>> verifies if there is any value for AI. If there is not, then the
> >>> event is skipped (neither BI nor AI is are processed).
> >> Are S3 and S4 only about asserts? I don't think so. Please, update the
> summary of
> >> the solutions.
> >
> > They are completely different issues. S3 is about the assertion
> > on the 2nd slave. While S4 is about the slave stopping unexpectedly.
>
> That's what I thought but reading your comments it seems that S1 - S5 are fixed
> by only relaxing assertions. So, I am asking to improve the comments.
>
> >
> >>>
> >>> For mysqlbinlog --verbose --base64=decode-rows we print out an
> >>> INSERT INTO t VALUES () when a write rows event contains an
> >>> empty AI.
> >> ok.
> >>
> >>> @ mysql-test/suite/rpl/r/rpl_row_img.result
> >>> Updates to result files.
> >>> @ mysql-test/suite/rpl/t/rpl_row_img.test
> >>> Added test cases.
> >>> @ sql/log_event.cc
> >>> 1. Added printout for INSERT INTO ... VALUES () when AI is empty
> for
> >>> Write_rows_log_events.
> >>> 2. Changed assertion from 'm_curr_row < ...' to 'm_curr_row
> <= ...'
> >>> 3. Calculation of estimated rows for bulk insert is now
> conditional,
> >>> and dependent of whether m_curr_row < m_rows_end or not.
> >>> 4. In the Update_rows_log_event, changed it to take into
> consideration
> >>> if there are any values in AI. If there are not, then skip
> event.
> >>> @ sql/log_event.h
> >>> Changed assertion to from 'm_curr_row < ...' to ' m_curr_row
> <= ...'
> >>> @ sql/rpl_record.cc
> >>> Do not unpack if master did not send any bit set in the
> write_set.
> >>>
> >>> modified:
> >>> mysql-test/suite/rpl/r/rpl_row_img.result
> >>> mysql-test/suite/rpl/t/rpl_row_img.test
> >>> sql/log_event.cc
> >>> sql/log_event.h
> >>> sql/rpl_record.cc
> >>> === modified file 'mysql-test/suite/rpl/r/rpl_row_img.result'
> >>> --- a/mysql-test/suite/rpl/r/rpl_row_img.result 2010-03-30 16:06:36
> +0000
> >>> +++ b/mysql-test/suite/rpl/r/rpl_row_img.result 2010-03-30 23:19:25
> +0000
> >>> @@ -24179,3 +24179,259 @@ FLUSH TABLES;
> >>> SHOW VARIABLES LIKE 'binlog_row_image';
> >>> Variable_name Value
> >>> binlog_row_image FULL
> >>> +include/stop_slave.inc
> >
> > [snip]
> >
> >>> @@ -8695,7 +8706,11 @@ Rows_log_event::write_row(const Relay_lo
> >>> /* this is the first row to be inserted, we estimate the rows with
> >>> the size of the first row and use that value to initialize
> >>> storage engine for bulk insertion */
> >>> - ulong estimated_rows= (m_rows_end - m_curr_row) / (m_curr_row_end -
> m_curr_row);
> >>> + DBUG_ASSERT(!(m_curr_row > m_curr_row_end));
> >>> + ulong estimated_rows= 1;
> >>> + if (m_curr_row < m_curr_row_end)
> >>> + estimated_rows= (m_rows_end - m_curr_row) / (m_curr_row_end -
> m_curr_row);
> >>> +
> >>
> >> Please, rewrite it as follows:
> >>
> >>
> >> + if (m_curr_row < m_curr_row_end)
> >> + estimated_rows= (m_rows_end - m_curr_row) / (m_curr_row_end -
> m_curr_row);
> >> + else if (m_curr_row = m_curr_row_end)
> >> + ulong estimated_rows= 1;
> >>
> >>
> >> It think it would be easier to read.
> >
> > OK, will do, but will change
> > ulong estimated_rows= 1;
> > to
> > estimated_rows= 1;
> > in the last line. ;)
>
> yeap.
>
> >
> >>> m_table->file->ha_start_bulk_insert(estimated_rows);
> >>> }
> >>>
> >>> @@ -9665,8 +9680,34 @@ int
> >>> Update_rows_log_event::do_exec_row(const Relay_log_info *const rli)
> >>> {
> >>> DBUG_ASSERT(m_table != NULL);
> >>> + int error= 0;
> >>> +
> >>> + /* check if update contains only values (for both BI and AI)
> >>> + for columns that do not exist on the slave. If they do, we
> >>> + can just unpack the rows and return (do nothing on the local
> >>> + table).
> >>> +
> >>> + We can optimize and check only if there are usable values on
> >>> + the AI. However, if no good values on AI but we have values
> >>> + on BI, we will not be letting the slave thread to search the
> >>> + SE for the row. This means that we may be loosing a bit on
> >>> + the safety side!
> >>> +
> >>> + After all, find_row is a consistency check as well.
> >>> + */
> >> These comments are difficult to read. Please, improve them.
> >> Does this handle S3 and S4?
> >
> > S4. I think you refer to the second paragraph as being hard
> > to read...
> >
> > Will improve it.
> >
> >>> + if (/*!is_any_column_signaled_for_table(m_table, &m_cols)
> &&*/ /* BI */
> >>> + !is_any_column_signaled_for_table(m_table, &m_cols_ai) /* AI
> */)
> >>> + {
> >>> + /* read and discard images, because they don't contain any
> >>> + useful information for this server table */
> >>> + error = unpack_current_row(rli, &m_cols);
> >>> + m_curr_row= m_curr_row_end;
> >>> + error = error | unpack_current_row(rli, &m_cols_ai);
> >>> +
> >>> + return error;
> >>> + }
> >> See above.
> >
> > What should I see? Is it hard to read? Do you want me to remove
> > the comment in the 'if' ?
> >
> > I'll assume that you want me to remove the comment on the 'if'.
> > I only added for reviewing purposes anyway...
> >
>
> ok.
>
>
> >>>
> >>> - int error= find_row(rli);
> >>> + error= find_row(rli);
> >>> if (error)
> >>> {
> >>> /*
> >> ok.
> >>
> >>> === modified file 'sql/log_event.h'
> >>> --- a/sql/log_event.h 2010-03-04 17:00:32 +0000
> >>> +++ b/sql/log_event.h 2010-03-30 23:19:25 +0000
> >>> @@ -3628,7 +3628,7 @@ protected:
> >>> DBUG_ASSERT(m_table);
> >>>
> >>> bool first_row= (m_curr_row == m_rows_buf);
> >>> - ASSERT_OR_RETURN_ERROR(m_curr_row < m_rows_end,
> HA_ERR_CORRUPT_EVENT);
> >>> + ASSERT_OR_RETURN_ERROR(m_curr_row <= m_rows_end,
> HA_ERR_CORRUPT_EVENT);
> >>> int const result= ::unpack_row(rli, m_table, m_width, m_curr_row,
> cols,
> >>> &m_curr_row_end,
> &m_master_reclength,
> >>> abort_on_warning, first_row);
> >> ok.
> >>
> >>> === modified file 'sql/rpl_record.cc'
> >>> --- a/sql/rpl_record.cc 2010-03-04 17:00:32 +0000
> >>> +++ b/sql/rpl_record.cc 2010-03-30 23:19:25 +0000
> >>> @@ -208,6 +208,20 @@ unpack_row(Relay_log_info const *rli,
> >>> uchar const *null_ptr= row_data;
> >>> uchar const *pack_ptr= row_data + master_null_byte_count;
> >>>
> >>> + if (bitmap_is_clear_all(cols))
> >>> + {
> >>> + /**
> >>> + There was no data sent from the master, so there is
> >>> + nothing to unpack. In case we are unpacking an AI (for
> >>> + update or insert purposes), we just use the default
> >> Is this comment right? Default value for an update?
> >
> > On an update one unpacks the row on top of the record that
> > the engine has returned in find_row. See:
> >
> > Update_rows_log_event::do_exec_row
> >
> > So this comment is misleading...
> >
> >>> + values from the slave to do the insert.
> >> Updates ---> Inserts?
> >>
> >> Please, improve these comments.
> >
> > Again, these comments are misleading... Perhaps the reference
> > to AI is probably not worth the trouble... I'll remove it.
> >
> > As I take it code-wise (apart from the refactoring above in one 'if')
> > the patch is ok, you just need me to improve some comments.
> >
> > Will rewrite the comments do the small change and recommit.
>
>
> ok.
>
> >
> > Thanks,
> > Luís
> >
> >>> + */
> >>> + *row_end= pack_ptr;
> >>> + *master_reclength= 0;
> >>> + DBUG_RETURN(error);
> >>> + }
> >>> +
> >>> +
> >>> Field **const begin_ptr = table->field;
> >>> Field **field_ptr;
> >>> Field **const end_ptr= begin_ptr + colcnt;
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------
> >>>
> >>> This body part will be downloaded on demand.
> >> --
> >> MySQL Code Commits Mailing List
> >> For list archives: http://lists.mysql.com/commits
> >> To unsubscribe: http://lists.mysql.com/commits?unsub=1
> >>
> >
> >