Hi Luis,
I forgot to mention that you should create a patch based on the rep+3.
Cheers.
Alfranio Correia wrote:
> Hi Luis,
>
> Great work.
> Please, find some comments in what follows.
>
> STATUS
> ------
> Not approved.
>
> REQUIRED CHANGES
> ----------------
> See comments in-line.
>
> REQUESTS
> --------
> See comments in-line.
>
> SUGGESTIONS
> -----------
> See comments in-line.
>
> DETAILS
> -------
> n/a
>
>
> Cheers.
>
>
>
> Luis Soares wrote:
>> #At
> file:///home/lsoares/Workspace/bzr/work/features/wl5092/mysql-5.1-rep%2B2-commit/ based on
> revid:li-bing.song@stripped
>>
>> 3137 Luis Soares 2009-11-06
>> WL#5092: RBR: Options for writing partial or full row images in RBR events
>>
>> Summary
>> =======
>>
>> Implement mysqld option so that the user is able to select
>> whether to log minimal or full images on RBR events depending
>> on the keys available in the master. When sending minimal rows,
>> one can save binlog disk space, network resources and mysqld
>> memory footprint.
>>
>> BACKGROUND
>> ==========
>>
>> Before and After Images
>> -----------------------
>>
>> In row based replication row events may contain two copies for the
>> row that they are changing. These are generally known as images. The
>> first one, called *before image* (BI), contains data that existed on
>> the row before it was actually changed. The second one, called
>> *after image* (AI), contain the actual changes. Each, BI and AI,
>> usage is confined to two different moments in the execution
>> flow. The BI is used while the slave is searching for the row to be
>> updated, while AI is used when replaying the changes in the row.
>>
>> Both have some restrictions:
>>
>> - BI: needs to hold a set of values that can be used by the
>> slave to fetch the correct row. In other words, it should
>> provide a set a values that *uniquely* identify the row to
>> be changed;
>>
>> - AI: needs to hold values that are needed to replay all the
>> changes, that were actually done during the original
>> execution, in an identical set (meaning, same index
>> structures, same engine, ...).
>>
>> Given that BI and AI have different usages, their usefulness can be
>> mapped into the data modification row events:
>>
>> - Write_rows_log_event: *requires only AI*.
>>
>> There is no need for a BI because, we are adding a record,
>> and not changing an existing one. The current
>> implementation logs only the AI.
>>
>> - Delete_rows_log_event: *requires only BI*.
>>
>> There is no need for an AI because, the row ceases to
>> exist, as it is removed. However, before removing it, one
>> needs to find it, thence BI is required.
>>
>> - Update_rows_log_event: *requires both: AI and BI*.
>>
>> Both BI and AI, are required. The row needs to be found (BI
>> comes to play) before being changed (AI comes to play).
>>
>> Summing up, BI must contain values that uniquely identifies rows,
>> acting like a primary key equivalent (PKE), while AI must contain
>> values that make possible changing the row according to the original
>> execution.
>>
>> Primary Key Equivalent
>> ----------------------
>>
>> Tables contain an index structure - mysql calls indexes keys. This
>> structure holds information on which keys are declared, and these
>> range from plain keys (K), to unique keys (UK), or even a primary
>> key (PK).
>>
>> PK - When it comes to logging row based events, PK plays an
>> important role, as it covers the BI requirement of uniquely
>> identifying a given row just by searching using the PK
>> value. Thence, if replicating the master contents, one should
>> be fine by just logging the PK column. The slave would then use
>> this value to search the correct row. Technically, "A PRIMARY
>> KEY is a unique index where all key columns must be defined as
>> NOT NULL. If they are not explicitly declared as NOT NULL,
>> MySQL declares them so implicitly (and silently). A table can
>> have only one PRIMARY KEY." [1]
>>
>> UK - Unique keys share the same usefulness of the PK, except that
>> they need to be declared without nullable parts. From the
>> manual [1]: "[...] For all engines, a UNIQUE index allows
>> multiple NULL values for columns that can contain NULL." In
>> fact, if there is no PK declared in a table and an application
>> requests one, MySQL "[...] returns the first UNIQUE index that
>> has no NULL columns as the PRIMARY KEY." [1]
>>
>> K - Regular keys or nullable UK are of no particular interest when
>> logging row events. Given that these are stripped from
>> uniqueness property provided by UK and PK, these cannot be used
>> to uniquely identify a row.
>>
>> There can be the case that a table does not declare any index at
>> all, or all indexes are regular keys. In this case the master must
>> ensure that the data provided in BI shall be enough to uniquely
>> identify the row. As such, the alternative is to log the full
>> row. This ensures that when searching (using the BI) and applying
>> (using the AI) the next record to be fetched will be uniquely
>> identified by all the fields in the BI. Should there be
>> indistinguishable rows, searching and updating either one in any
>> given order leads to a correct state.
>>
>> Consequently, a *Primary Key Equivalent* (PKE) is defined as:
>>
>> 1. If a PK exists, the PKE is equal to the PK.
>>
>> 2. Otherwise, if there exists a UK where all columns have the NOT
>> NULL attribute, then that is the PKE (if there are more than
>> one such UKs, then one is chosen arbitrarily).
>>
>> 3. Otherwise, the PKE is equal to the set of all columns.
>>
>> Hereafter, we will be considering to PK to be a subset of PKE and it
>> shall map into items 1. and 2. of PKE definition. Furthermore, if no
>> explicit primary key nor UK NOT NULL exists in the table, it is said
>> that the table has no PK.
>>
>> PROBLEM STATEMENT
>> =================
>>
>> Given the definition of PKE, one can have a smaller set of columns
>> to be logged (instead of the full row), and still be able to find
>> (BI) and update a record (AI), while replaying the row event.
>>
>> This can be useful to reduce bandwidth (less network traffic) and
>> storage (samller binlogs) usage as well as mysqld memory
>> footprint. It becomes even more important if tables contain large
>> blobs that do not need to be logged as part of the BI. Currently, in
>> MySQL 5.1 GA, PKE is always assumed to be the full row, ie, the
>> index structure is ignored. Consequently, AI and BI are always
>> logged with all their columns.
>>
>> It should be possible for the user to configure this behavior such
>> that he could request that BI and AI always log full rows, or a PK
>> when available. Put it more simply, the user should be able to state
>> that he wants row images to be logged according to a PKE or always
>> as a full row.
>>
>> SOLUTION
>> ========
>>
>> MySQL shall provide an option and the different configurations
>> should be:
>>
>> - minimal:
>> Means PKE in the before image and changed columns
>> in after image
>>
>> - full:
>> Means all columns in both before and after image
>>
>> - noblob:
>> Works as full but avoids sending blobs when these are not
>> needed. Blobs are still replicated if:
>>
>> 1. In AI, if they have been changed.
>> 2. In BI, if they are part of PK.
>>
>> It shall be named:
>>
>> --binlog-row-image={minimal,noblob,full}
>>
>> DEFAULT VALUE SHALL BE: 'FULL'.
>>
>> BACKPORTS
>> =========
>>
>> Part of this work required the backporting of patches for
>> BUG 33055 and BUG 40045.
>>
>> BUG FIXES
>> =========
>>
>> - BUG 47200
>> - BUG 47313
>> - BUG 14068
>> - BUG 33055 (backport)
>> - BUG 40045 (backport)
>> - BUG 47323 (fix applied in tree used - needed for testing purposes)
>> @ sql/log.h
>> Added enumeration for binlog_row_image option.
>> @ sql/log_event.cc
>> Several changes:
>> 1. refactored find_row so that BUG 47200 and BUG 47303 are fixed.
>> 2. added column marking according to binlog-row-image on slave
>> while applying events.
>> 3. merged backport fix for BUG 40045 and BUG 33055.
>> 4. merged fix for BUG 47323, which is not available in
>> mysql-5.1-rep+2 ATM but is needed for testing purposes.
>> @ sql/log_event.h
>> Changes according to backport of BUG 33055.
>> @ sql/mysqld.cc
>> Added binlog-row-image option. Default set to: BINLOG_ROW_IMAGE_FULL
>> @ sql/rpl_injector.cc
>> Merged backport for BUG 33055.
>> @ sql/rpl_injector.h
>> Merged backport for BUG 33055.
>> @ sql/rpl_record.cc
>> Merged backport for BUG 40045 and BUG 33055.
>> @ sql/set_var.cc
>> Added dynamic variable binlog-row-image.
>> @ sql/set_var.h
>> Added class sys_var_thd_binlog_row_image.
>> @ sql/sql_class.cc
>> Several changes:
>> 1. merged backport for BUG 33055
>> 2. changed THD::binlog_*_row so that, when needed, some fields are
>> removed from the read_set according to the binlog-row-image
>> option used.
>> @ sql/sql_class.h
>> Merged backport for BUG 33055. Also added binlog_prepare_row_images
>> method to THD.
>> @ sql/sql_insert.cc
>> Fixes insert delayed {read,write}_set.
>> @ sql/sql_update.cc
>> Merged BUG 40045 backport fix.
>> @ sql/table.cc
>> Changed mark_columns_needed_for_* so that these take into
>> account binlog-row-image option.
>> @ sql/table.h
>> Added method mark_columns_per_binlog_row_image.
>>
>> modified:
>> sql/handler.cc
>> sql/log.h
>> sql/log_event.cc
>> sql/log_event.h
>> sql/log_event_old.cc
>> sql/log_event_old.h
>> sql/mysql_priv.h
>> sql/mysqld.cc
>> sql/rpl_injector.cc
>> sql/rpl_injector.h
>> sql/rpl_record.cc
>> sql/rpl_record.h
>> sql/set_var.cc
>> sql/set_var.h
>> sql/sql_class.cc
>> sql/sql_class.h
>> sql/sql_insert.cc
>> sql/sql_update.cc
>> sql/table.cc
>> sql/table.h
>> === modified file 'sql/handler.cc'
>> --- a/sql/handler.cc 2009-10-03 10:50:25 +0000
>> +++ b/sql/handler.cc 2009-11-06 01:09:40 +0000
>> @@ -2484,8 +2484,9 @@ int handler::update_auto_increment()
>> void handler::column_bitmaps_signal()
>> {
>> DBUG_ENTER("column_bitmaps_signal");
>> - DBUG_PRINT("info", ("read_set: 0x%lx write_set: 0x%lx", (long)
> table->read_set,
>> - (long) table->write_set));
>> + if (table)
>> + DBUG_PRINT("info", ("read_set: 0x%lx write_set: 0x%lx", (long)
> table->read_set,
>> + table->write_set));
>
> Why is this needed now?
>
>
>> DBUG_VOID_RETURN;
>> }
>>
>> @@ -4553,8 +4554,8 @@ static int write_locked_table_maps(THD *
>> }
>>
>>
>> -typedef bool Log_func(THD*, TABLE*, bool, MY_BITMAP*,
>> - uint, const uchar*, const uchar*);
>> +typedef bool Log_func(THD*, TABLE*, bool,
>> + const uchar*, const uchar*);
>>
>> static int binlog_log_row(TABLE* table,
>> const uchar *before_record,
>> @@ -4568,30 +4569,19 @@ static int binlog_log_row(TABLE* table,
>>
>> if (check_table_binlog_row_based(thd, table))
>> {
>> - MY_BITMAP cols;
>> - /* Potential buffer on the stack for the bitmap */
>> - uint32 bitbuf[BITMAP_STACKBUF_SIZE/sizeof(uint32)];
>> - uint n_fields= table->s->fields;
>> - my_bool use_bitbuf= n_fields <= sizeof(bitbuf)*8;
>> + DBUG_DUMP("read_set 10", (uchar*) table->read_set->bitmap,
>> + (table->s->fields + 7) / 8);
>>
>> /*
>> If there are no table maps written to the binary log, this is
>> the first row handled in this statement. In that case, we need
>> to write table maps for all locked tables to the binary log.
>> */
>> - if (likely(!(error= bitmap_init(&cols,
>> - use_bitbuf ? bitbuf : NULL,
>> - (n_fields + 7) & ~7UL,
>> - FALSE))))
>> + if (likely(!(error= write_locked_table_maps(thd))))
>> {
>> - bitmap_set_all(&cols);
>> - if (likely(!(error= write_locked_table_maps(thd))))
>> - error= (*log_func)(thd, table, table->file->has_transactions(),
>> - &cols, table->s->fields,
>> - before_record, after_record);
>> -
>> - if (!use_bitbuf)
>> - bitmap_free(&cols);
>> + bool const has_trans= table->file->has_transactions();
>> + error=
>> + (*log_func)(thd, table, has_trans, before_record, after_record);
>> }
>> }
>> return error ? HA_ERR_RBR_LOGGING_FAILED : 0;
>>
>> === modified file 'sql/log.h'
>> --- a/sql/log.h 2009-10-07 21:13:07 +0000
>> +++ b/sql/log.h 2009-11-06 01:09:40 +0000
>> @@ -593,6 +593,25 @@ public:
>> }
>> };
>>
>> +enum enum_binlog_row_image {
>> + /*
>> + MINIMAL
>> + means PK in the before image and changed columns in after image
>> +
>> + NOBLOB
>> + means that whenever possible, before and after image contain all columns
>> + except blobs
>> +
>> + FULL
>> + means all columns in both before and after image
>> + */
>> +
>> + BINLOG_ROW_IMAGE_MINIMAL= 0,
>> + BINLOG_ROW_IMAGE_NOBLOB= 1,
>> + BINLOG_ROW_IMAGE_FULL= 2
>> +};
>> +extern TYPELIB binlog_row_image_typelib;
>> +
>
> ok.
>
>> enum enum_binlog_format {
>> /*
>> statement-based except for cases where only row-based can work (UUID()
>>
>> === modified file 'sql/log_event.cc'
>> --- a/sql/log_event.cc 2009-10-09 13:26:37 +0000
>> +++ b/sql/log_event.cc 2009-11-06 01:09:40 +0000
>> @@ -1852,6 +1852,7 @@ Rows_log_event::print_verbose_one_row(IO
>> {
>> const uchar *value0= value;
>> const uchar *null_bits= value;
>> + uint null_bit_index= 0;
>> char typestr[64]= "";
>>
>> value+= (m_width + 7) / 8;
>> @@ -1860,7 +1861,8 @@ Rows_log_event::print_verbose_one_row(IO
>>
>> for (size_t i= 0; i < td->size(); i ++)
>> {
>> - int is_null= (null_bits[i / 8] >> (i % 8)) & 0x01;
>> + int is_null= (null_bits[null_bit_index / 8]
>> + >> (null_bit_index % 8)) & 0x01;
>>
>> if (bitmap_is_set(cols_bitmap, i) == 0)
>> continue;
>> @@ -1897,6 +1899,8 @@ Rows_log_event::print_verbose_one_row(IO
>> }
>>
>> my_b_printf(file, "\n");
>> +
>> + null_bit_index++;
>> }
>> return value - value0;
>> }
>> @@ -7162,6 +7166,18 @@ int Rows_log_event::do_add_row_data(ucha
>> DBUG_ENTER("Rows_log_event::do_add_row_data");
>> DBUG_PRINT("enter", ("row_data: 0x%lx length: %lu", (ulong) row_data,
>> (ulong) length));
>> +
>> + /*
>> + If length is zero, there is nothing to write, so we just
>> + return. Note that this is not an optimization, since calling
>> + realloc() with size 0 means free().
>
> Please, improve this comment.
> Is this the case for insert into tt values()?
>
>
>> + */
>> + if (length == 0)
>> + {
>> + m_row_count++;
>> + DBUG_RETURN(0);
>> + }
>> +
>> /*
>> Don't print debug messages when running valgrind since they can
>> trigger false warnings.
>> @@ -7476,7 +7492,7 @@ int Rows_log_event::do_apply_event(Relay
>> (ulong) m_curr_row, (ulong) m_curr_row_end, (ulong)
> m_rows_end));
>>
>> if (!m_curr_row_end && !error)
>> - error= unpack_current_row(rli);
>> + error= unpack_current_row(rli, &m_cols);
>>
>> // at this moment m_curr_row_end should be set
>> DBUG_ASSERT(error || m_curr_row_end != NULL);
>> @@ -8238,9 +8254,8 @@ void Table_map_log_event::print(FILE *fi
>> #if !defined(MYSQL_CLIENT)
>> Write_rows_log_event::Write_rows_log_event(THD *thd_arg, TABLE *tbl_arg,
>> ulong tid_arg,
>> - MY_BITMAP const *cols,
>> bool is_transactional)
>> - : Rows_log_event(thd_arg, tbl_arg, tid_arg, cols, is_transactional)
>> + : Rows_log_event(thd_arg, tbl_arg, tid_arg, tbl_arg->write_set,
> is_transactional)
>> {
>> }
>> #endif
>> @@ -8444,15 +8459,18 @@ Rows_log_event::write_row(const Relay_lo
>> /* fill table->record[0] with default values */
>> bool abort_on_warnings= (rli->sql_thd->variables.sql_mode &
>> (MODE_STRICT_TRANS_TABLES |
> MODE_STRICT_ALL_TABLES));
>> - if ((error= prepare_record(table, m_width,
>> + if ((error= prepare_record(table, &m_cols,
>> table->file->ht->db_type !=
> DB_TYPE_NDBCLUSTER,
>> abort_on_warnings, m_curr_row == m_rows_buf)))
>> DBUG_RETURN(error);
>>
>> /* unpack row into table->record[0] */
>> - if ((error= unpack_current_row(rli, abort_on_warnings)))
>> + if ((error= unpack_current_row(rli, &m_cols, abort_on_warnings)))
>> DBUG_RETURN(error);
>>
>> + // Temporary fix to find out why it fails [/Matz]
>> + memcpy(m_table->write_set->bitmap, m_cols.bitmap,
> (m_table->write_set->n_bits + 7) / 8);
>> +
>
> Is this really necessary after your changes?
>
>> if (m_curr_row == m_rows_buf)
>> {
>> /* this is the first row to be inserted, we estimate the rows with
>> @@ -8477,6 +8495,8 @@ Rows_log_event::write_row(const Relay_lo
>> TODO: Add safety measures against infinite looping.
>> */
>>
>> + m_table->mark_columns_per_binlog_row_image();
>> +
>> while ((error= table->file->ha_write_row(table->record[0])))
>> {
>> if (error == HA_ERR_LOCK_DEADLOCK ||
>> @@ -8493,7 +8513,7 @@ Rows_log_event::write_row(const Relay_lo
>> - or because the information which key is not available
>> */
>> table->file->print_error(error, MYF(0));
>> - DBUG_RETURN(error);
>> + goto error;
>> }
>> /*
>> We need to retrieve the old row into record[1] to be able to
>> @@ -8515,7 +8535,7 @@ Rows_log_event::write_row(const Relay_lo
>> if (error == HA_ERR_RECORD_DELETED)
>> error= HA_ERR_KEY_NOT_FOUND;
>> table->file->print_error(error, MYF(0));
>> - DBUG_RETURN(error);
>> + goto error;
>> }
>> }
>> else
>> @@ -8525,7 +8545,8 @@ Rows_log_event::write_row(const Relay_lo
>> if (table->file->extra(HA_EXTRA_FLUSH_CACHE))
>> {
>> DBUG_PRINT("info",("Error when setting HA_EXTRA_FLUSH_CACHE"));
>> - DBUG_RETURN(my_errno);
>> + error= my_errno;
>> + goto error;
>> }
>>
>> if (key.get() == NULL)
>> @@ -8534,7 +8555,8 @@ Rows_log_event::write_row(const Relay_lo
>> if (key.get() == NULL)
>> {
>> DBUG_PRINT("info",("Can't allocate key buffer"));
>> - DBUG_RETURN(ENOMEM);
>> + error= ENOMEM;
>> + goto error;
>> }
>> }
>>
>> @@ -8550,7 +8572,7 @@ Rows_log_event::write_row(const Relay_lo
>> if (error == HA_ERR_RECORD_DELETED)
>> error= HA_ERR_KEY_NOT_FOUND;
>> table->file->print_error(error, MYF(0));
>> - DBUG_RETURN(error);
>> + goto error;
>> }
>> }
>>
>> @@ -8567,7 +8589,7 @@ Rows_log_event::write_row(const Relay_lo
>> if (!get_flags(COMPLETE_ROWS_F))
>> {
>> restore_record(table,record[1]);
>> - error= unpack_current_row(rli);
>> + error= unpack_current_row(rli, &m_cols);
>> }
>>
>> #ifndef DBUG_OFF
>> @@ -8612,7 +8634,7 @@ Rows_log_event::write_row(const Relay_lo
>> table->file->print_error(error, MYF(0));
>> }
>>
>> - DBUG_RETURN(error);
>> + goto error;
>> }
>> else
>> {
>> @@ -8621,12 +8643,14 @@ Rows_log_event::write_row(const Relay_lo
>> {
>> DBUG_PRINT("info",("ha_delete_row() returns error %d",error));
>> table->file->print_error(error, MYF(0));
>> - DBUG_RETURN(error);
>> + goto error;
>> }
>> /* Will retry ha_write_row() with the offending row removed. */
>> }
>> }
>>
>> +error:
>> + m_table->default_column_bitmaps();
>> DBUG_RETURN(error);
>
>
> Why do you need to that?
> What is the difference between the master and the slave in this case?
> Please, add a comment when you call mark_columns_per_binlog_row_image
> and after setting the default values.
>
>
>> }
>>
>> @@ -8668,7 +8692,7 @@ void Write_rows_log_event::print(FILE *f
>>
>> Returns TRUE if different.
>> */
>> -static bool record_compare(TABLE *table)
>> +static bool record_compare(TABLE *table, MY_BITMAP *cols)
>> {
>> /*
>> Need to set the X bit and the filler bits in both records since
>> @@ -8700,14 +8724,16 @@ static bool record_compare(TABLE *table)
>> }
>> }
>>
>> - if (table->s->blob_fields + table->s->varchar_fields == 0)
>> + if (table->s->blob_fields + table->s->varchar_fields == 0
> &&
>> + bitmap_is_set_all(cols))
>> {
>> result= cmp_record(table,record[1]);
>> goto record_compare_exit;
>> }
>>
>> /* Compare null bits */
>> - if (memcmp(table->null_flags,
>> + if (bitmap_is_set_all(cols) &&
>> + memcmp(table->null_flags,
>> table->null_flags+table->s->rec_buff_length,
>> table->s->null_bytes))
>> {
>> @@ -8716,12 +8742,17 @@ static bool record_compare(TABLE *table)
>> }
>>
>> /* Compare updated fields */
>> - for (Field **ptr=table->field ; *ptr ; ptr++)
>> + for (Field **ptr=table->field ;
>> + *ptr && ((ptr - table->field) < cols->n_bits);
>> + ptr++)
>> {
>> - if ((*ptr)->cmp_binary_offset(table->s->rec_buff_length))
>> + if (bitmap_is_set(cols, (*ptr)->field_index))
>> {
>> - result= TRUE;
>> - goto record_compare_exit;
>> + if ((*ptr)->cmp_binary_offset(table->s->rec_buff_length))
>> + {
>> + result= TRUE;
>> + goto record_compare_exit;
>> + }
>> }
>> }
>>
>> @@ -8744,6 +8775,159 @@ record_compare_exit:
>> return result;
>> }
>>
>> +
>> +/**
>> + Validates before image. Searches the bitmap for
>> + columns set. If no colum, for the existing table,
>> + then the image cannot be used for searching a
>> + record (regardless of using position(), index scan
>> + or table scan).
>> +
>> + @param table the table we are using.
>> + @param bi_cols the bitmap that signals usable columns.
>> +
>> + @return TRUE if invalid, FALSE otherwise.
>> +*/
>> +static
>> +my_bool invalid_bi(TABLE *table, MY_BITMAP *bi_cols)
>> +{
>> +
>> + int nfields_set= 0;
>> + for (Field **ptr=table->field ;
>> + *ptr && ((ptr - table->field) < bi_cols->n_bits);
>> + ptr++)
>> + {
>> + if (bitmap_is_set(bi_cols, (*ptr)->field_index))
>> + nfields_set++;
>> + }
>> +
>> + return (nfields_set == 0);
>> +}
>
>
> ok.
>
>> +
>> +/**
>> + Validates whether the before image is usable for the
>> + given key. It can be the case that the before image
>> + does not contain values for the key (eg, master was
>> + using 'minimal' option for image logging and slave has
>> + different index structure on the table).
>> +
>> + @param keyinfo reference to key.
>> + @param bi_cols the bitmap that signals usable columns.
>> +
>> + @return TRUE if usable, FALSE otherwise.
>> +*/
>> +static
>> +my_bool is_usable_key(KEY *keyinfo, MY_BITMAP *bi_cols)
>> +{
>> + for (uint i=0 ; i < keyinfo->key_parts ;i++)
>> + {
>> + uint fieldnr= keyinfo->key_part[i].fieldnr - 1;
>> + if (fieldnr >= bi_cols->n_bits ||
>> + !bitmap_is_set(bi_cols, fieldnr))
>> + return FALSE;
>> + }
>> +
>> + return TRUE;
>> +}
>> +
>> +/**
>> + Searches the table for a given key that can be used
>> + according to the existing values, ie, columns set
>> + in the bitmap.
>> +
>> + The caller can specify which type of key to find by
>> + setting the following flags in the key_type parameter:
>> +
>> + - PRI_KEY_FLAG
>> + Returns the primary key.
>> +
>> + - UNIQUE_KEY_FLAG
>> + Returns a unique key (flagged with HA_NOSAME)
>> +
>> + - MULTIPLE_KEY_FLAG
>> + Returns a key not unique nor PK.
>> +
>> + The above flags can be used together, in which case, the
>> + search is conducted in the above listed order. Eg, the
>> + following flag:
>> +
>> + (PRI_KEY_FLAG | UNIQUE_KEY_FLAG | MULTIPLE_KEY_FLAG)
>> +
>> + means that a primary key is returned if it is suitable. If
>> + not then the unique keys are searched. If no unique key is
>> + suitable, then the keys are searched. Finally, if no key
>> + is suitable, MAX_KEY is returned.
>> +
>> + @param table reference to the table.
>> + @param bi_cols the bitmap that signals usable columns.
>> + @param key_type the type of key to search.
>> +
>> + @return MAX_KEY if no key, according to the key_type specified
>> + is suitable. Returns the key otherwise.
>> +
>> +*/
>> +static
>> +uint
>> +search_key_for_bi(TABLE *table, MY_BITMAP *bi_cols, uint key_type)
>> +{
>> + KEY *keyinfo;
>> + uint res= MAX_KEY;
>> + uint key;
>> +
>> + if (key_type & PRI_KEY_FLAG)
>> + {
>> + if (table->s->primary_key < MAX_KEY)
>> + {
>> + keyinfo= table->s->key_info + (uint) table->s->primary_key;
>> + if (is_usable_key(keyinfo, bi_cols))
>> + return table->s->primary_key;
>> + }
>> + }
>> +
>> + if (key_type & UNIQUE_KEY_FLAG)
>> + {
>> + if (table->s->uniques)
>> + {
>> + for (key=0,keyinfo= table->key_info ;
>> + (key < table->s->keys) && (res == MAX_KEY);
>> + key++,keyinfo++)
>> + {
>> + if (!(keyinfo->flags & HA_NOSAME) || /* skip not unique */
>> + (key == table->s->primary_key)) /* skip primary */
>> + continue;
>> + res= is_usable_key(keyinfo, bi_cols) ? key : MAX_KEY;
>> +
>> + if (res < MAX_KEY)
>> + return res;
>> + }
>> + }
>> + }
>> +
>> + if (key_type & MULTIPLE_KEY_FLAG)
>> + {
>> + if (table->s->keys)
>> + {
>> + for (key=0,keyinfo= table->key_info ;
>> + (key < table->s->keys) && (res == MAX_KEY);
>> + key++,keyinfo++)
>> + {
>> + if (!(table->s->keys_in_use.is_set(key)) || /* key is no active
> */
>> + (keyinfo->flags & HA_NOSAME) || /* skip uniques */
>> + (key == table->s->primary_key)) /* skip primary */
>> + continue;
>> +
>> + res= is_usable_key(keyinfo, bi_cols) ? key : MAX_KEY;
>> +
>> + if (res < MAX_KEY)
>> + return res;
>> + }
>> + }
>> + }
>> +
>> + return res;
>> +}
>> +
>> +
>> /**
>> Locate the current row in event's table.
>>
>> @@ -8771,6 +8955,7 @@ record_compare_exit:
>> @c position() and @c rnd_pos() will be used.
>> */
>>
>> +
>> int Rows_log_event::find_row(const Relay_log_info *rli)
>> {
>> DBUG_ENTER("Rows_log_event::find_row");
>> @@ -8779,6 +8964,8 @@ int Rows_log_event::find_row(const Relay
>>
>> TABLE *table= m_table;
>> int error= 0;
>> + KEY *keyinfo;
>> + uint key;
>>
>> /*
>> rpl_row_tabledefs.test specifies that
>> @@ -8787,16 +8974,28 @@ int Rows_log_event::find_row(const Relay
>> Todo: fix wl3228 hld that requires defauls for all types of events
>> */
>>
>> - prepare_record(table, m_width, FALSE);
>> - error= unpack_current_row(rli);
>> + prepare_record(table, &m_cols, FALSE);
>> + error= unpack_current_row(rli, &m_cols);
>> +
>> + // Temporary fix to find out why it fails [/Matz]
>> + memcpy(m_table->read_set->bitmap, m_cols.bitmap,
> (m_table->read_set->n_bits + 7) / 8);
>> +
>> + if (invalid_bi(table, &m_cols))
>> + {
>> + error= HA_ERR_END_OF_FILE;
>> + goto err;
>> + }
>>
>> #ifndef DBUG_OFF
>> DBUG_PRINT("info",("looking for the following record"));
>> DBUG_DUMP("record[0]", table->record[0], table->s->reclength);
>> #endif
>>
>> - if ((table->file->ha_table_flags() &
> HA_PRIMARY_KEY_REQUIRED_FOR_POSITION) &&
>> - table->s->primary_key < MAX_KEY)
>> + if ((key= search_key_for_bi(table, &m_cols, PRI_KEY_FLAG)) >= MAX_KEY)
>> + /* we dont have a PK, or PK is not usable with BI values */
>> + goto INDEX_SCAN;
>> +
>> + if ((table->file->ha_table_flags() &
> HA_PRIMARY_KEY_REQUIRED_FOR_POSITION))
>> {
>> /*
>> Use a more efficient method to fetch the record given by
>> @@ -8830,11 +9029,7 @@ int Rows_log_event::find_row(const Relay
>>
>> // We can't use position() - try other methods.
>>
>> - /*
>> - We need to retrieve all fields
>> - TODO: Move this out from this function to main loop
>> - */
>> - table->use_all_columns();
>> +INDEX_SCAN:
>>
>> /*
>> Save copy of the record in table->record[1]. It might be needed
>> @@ -8842,8 +9037,16 @@ int Rows_log_event::find_row(const Relay
>> */
>> store_record(table,record[1]);
>>
>> - if (table->s->keys > 0)
>> + if ((key= search_key_for_bi(table, &m_cols,
>> + (PRI_KEY_FLAG | UNIQUE_KEY_FLAG |
> MULTIPLE_KEY_FLAG)))
>> + >= MAX_KEY)
>> + /* we dont have a key, or no key is suitable for the BI values */
>> + goto TABLE_SCAN;
>> +
>> {
>> + keyinfo= table->key_info + key;
>> +
>> +
>> DBUG_PRINT("info",("locating record using primary key (index_read)"));
>>
>> /* We have a key: search the table using the index */
>> @@ -8857,14 +9060,14 @@ int Rows_log_event::find_row(const Relay
>> /* Fill key data for the row */
>>
>> DBUG_ASSERT(m_key);
>> - key_copy(m_key, table->record[0], table->key_info, 0);
>> + key_copy(m_key, table->record[0], keyinfo, 0);
>>
>> /*
>> Don't print debug messages when running valgrind since they can
>> trigger false warnings.
>> */
>> #ifndef HAVE_purify
>> - DBUG_DUMP("key data", m_key, table->key_info->key_length);
>> + DBUG_DUMP("key data", m_key, keyinfo->key_length);
>> #endif
>>
>> /*
>> @@ -8911,7 +9114,7 @@ int Rows_log_event::find_row(const Relay
>> found. I can see no scenario where it would be incorrect to
>> chose the row to change only using a PK or an UNNI.
>> */
>> - if (table->key_info->flags & HA_NOSAME)
>> + if (keyinfo->flags & HA_NOSAME || key ==
> table->s->primary_key)
>> {
>> table->file->ha_index_end();
>> goto ok;
>> @@ -8924,7 +9127,7 @@ int Rows_log_event::find_row(const Relay
>> */
>> DBUG_PRINT("info",("non-unique index, scanning it to find matching
> record"));
>>
>> - while (record_compare(table))
>> + while (record_compare(table, &m_cols))
>> {
>> /*
>> We need to set the null bytes to ensure that the filler bit
>> @@ -8957,8 +9160,12 @@ int Rows_log_event::find_row(const Relay
>> Have to restart the scan to be able to fetch the next row.
>> */
>> table->file->ha_index_end();
>> + goto ok;
>> }
>> - else
>> +
>> +TABLE_SCAN:
>> +
>> + /* All that we can do now is rely on a table scan */
>> {
>> DBUG_PRINT("info",("locating record using table scan (rnd_next)"));
>>
>> @@ -8978,7 +9185,6 @@ int Rows_log_event::find_row(const Relay
>> {
>> restart_rnd_next:
>> error= table->file->rnd_next(table->record[0]);
>> -
>> DBUG_PRINT("info", ("error: %s", HA_ERR(error)));
>> switch (error) {
>>
>> @@ -9005,7 +9211,7 @@ int Rows_log_event::find_row(const Relay
>> goto err;
>> }
>> }
>> - while (restart_count < 2 && record_compare(table));
>> + while (restart_count < 2 && record_compare(table, &m_cols));
>>
>> /*
>> Note: above record_compare will take into accout all record fields
>> @@ -9041,9 +9247,9 @@ err:
>>
>> #ifndef MYSQL_CLIENT
>> Delete_rows_log_event::Delete_rows_log_event(THD *thd_arg, TABLE *tbl_arg,
>> - ulong tid, MY_BITMAP const *cols,
>> + ulong tid,
>> bool is_transactional)
>> - : Rows_log_event(thd_arg, tbl_arg, tid, cols, is_transactional)
>> + : Rows_log_event(thd_arg, tbl_arg, tid, tbl_arg->read_set,
> is_transactional)
>> {
>> }
>> #endif /* #if !defined(MYSQL_CLIENT) */
>> @@ -9065,22 +9271,14 @@ Delete_rows_log_event::Delete_rows_log_e
>> int
>> Delete_rows_log_event::do_before_row_operations(const Slave_reporting_capability
> *const)
>> {
>> - if ((m_table->file->ha_table_flags() &
> HA_PRIMARY_KEY_REQUIRED_FOR_POSITION) &&
>> - m_table->s->primary_key < MAX_KEY)
>> - {
>> - /*
>> - We don't need to allocate any memory for m_key since it is not used.
>> - */
>> - return 0;
>> - }
>> -
>> if (m_table->s->keys > 0)
>> {
>> // Allocate buffer for key searches
>> - m_key= (uchar*)my_malloc(m_table->key_info->key_length, MYF(MY_WME));
>> + m_key= (uchar*)my_malloc(MAX_KEY_LENGTH, MYF(MY_WME));
>> if (!m_key)
>> return HA_ERR_OUT_OF_MEM;
>> }
>> +
>> return 0;
>> }
>>
>> @@ -9103,10 +9301,13 @@ int Delete_rows_log_event::do_exec_row(c
>>
>> if (!(error= find_row(rli)))
>> {
>> +
>> + m_table->mark_columns_per_binlog_row_image();
>> /*
>> Delete the record found, located in record[0]
>> */
>> error= m_table->file->ha_delete_row(m_table->record[0]);
>> + m_table->default_column_bitmaps();
>> }
>> return error;
>> }
>> @@ -9132,21 +9333,10 @@ void Delete_rows_log_event::print(FILE *
>> #if !defined(MYSQL_CLIENT)
>> Update_rows_log_event::Update_rows_log_event(THD *thd_arg, TABLE *tbl_arg,
>> ulong tid,
>> - MY_BITMAP const *cols_bi,
>> - MY_BITMAP const *cols_ai,
>> bool is_transactional)
>> -: Rows_log_event(thd_arg, tbl_arg, tid, cols_bi, is_transactional)
>> +: Rows_log_event(thd_arg, tbl_arg, tid, tbl_arg->read_set, is_transactional)
>> {
>> - init(cols_ai);
>> -}
>> -
>> -Update_rows_log_event::Update_rows_log_event(THD *thd_arg, TABLE *tbl_arg,
>> - ulong tid,
>> - MY_BITMAP const *cols,
>> - bool is_transactional)
>> -: Rows_log_event(thd_arg, tbl_arg, tid, cols, is_transactional)
>> -{
>> - init(cols);
>> + init(tbl_arg->write_set);
>> }
>>
>> void Update_rows_log_event::init(MY_BITMAP const *cols)
>> @@ -9232,7 +9422,7 @@ Update_rows_log_event::do_exec_row(const
>> able to skip to the next pair of updates
>> */
>> m_curr_row= m_curr_row_end;
>> - unpack_current_row(rli);
>> + unpack_current_row(rli, &m_cols_ai);
>> return error;
>> }
>>
>> @@ -9253,7 +9443,7 @@ Update_rows_log_event::do_exec_row(const
>> (MODE_STRICT_TRANS_TABLES |
> MODE_STRICT_ALL_TABLES));
>> m_curr_row= m_curr_row_end;
>> /* this also updates m_curr_row_end */
>> - if ((error= unpack_current_row(rli, abort_on_warnings)))
>> + if ((error= unpack_current_row(rli, &m_cols_ai, abort_on_warnings)))
>> return error;
>>
>> /*
>> @@ -9270,9 +9460,15 @@ Update_rows_log_event::do_exec_row(const
>> DBUG_DUMP("new values", m_table->record[0], m_table->s->reclength);
>> #endif
>>
>> + // Temporary fix to find out why it fails [/Matz]
>> + memcpy(m_table->read_set->bitmap, m_cols.bitmap,
> (m_table->read_set->n_bits + 7) / 8);
>> + memcpy(m_table->write_set->bitmap, m_cols_ai.bitmap,
> (m_table->write_set->n_bits + 7) / 8);
>> +
>> + m_table->mark_columns_per_binlog_row_image();
>> error= m_table->file->ha_update_row(m_table->record[1],
> m_table->record[0]);
>> if (error == HA_ERR_RECORD_IS_THE_SAME)
>> error= 0;
>> + m_table->default_column_bitmaps();
>>
>> return error;
>> }
>>
>> === modified file 'sql/log_event.h'
>> --- a/sql/log_event.h 2009-10-01 17:22:44 +0000
>> +++ b/sql/log_event.h 2009-11-06 01:09:40 +0000
>> @@ -3563,13 +3563,14 @@ protected:
>>
>> // Unpack the current row into m_table->record[0]
>> int unpack_current_row(const Relay_log_info *const rli,
>> + MY_BITMAP const *cols,
>> const bool abort_on_warning= TRUE)
>> {
>> DBUG_ASSERT(m_table);
>>
>> bool first_row= (m_curr_row == m_rows_buf);
>> ASSERT_OR_RETURN_ERROR(m_curr_row < m_rows_end, HA_ERR_CORRUPT_EVENT);
>> - int const result= ::unpack_row(rli, m_table, m_width, m_curr_row,
> &m_cols,
>> + int const result= ::unpack_row(rli, m_table, m_width, m_curr_row, cols,
>> &m_curr_row_end,
> &m_master_reclength,
>> abort_on_warning, first_row);
>> if (m_curr_row_end > m_rows_end)
>> @@ -3661,7 +3662,7 @@ public:
>>
>> #if !defined(MYSQL_CLIENT)
>> Write_rows_log_event(THD*, TABLE*, ulong table_id,
>> - MY_BITMAP const *cols, bool is_transactional);
>> + bool is_transactional);
>> #endif
>> #ifdef HAVE_REPLICATION
>> Write_rows_log_event(const char *buf, uint event_len,
>> @@ -3670,14 +3671,12 @@ public:
>> #if !defined(MYSQL_CLIENT)
>> static bool binlog_row_logging_function(THD *thd, TABLE *table,
>> bool is_transactional,
>> - MY_BITMAP *cols,
>> - uint fields,
>> const uchar *before_record
>> __attribute__((unused)),
>> const uchar *after_record)
>> {
>> return thd->binlog_write_row(table, is_transactional,
>> - cols, fields, after_record);
>> + after_record);
>> }
>> #endif
>>
>> @@ -3724,7 +3723,6 @@ public:
>> bool is_transactional);
>>
>> Update_rows_log_event(THD*, TABLE*, ulong table_id,
>> - MY_BITMAP const *cols,
>> bool is_transactional);
>>
>> void init(MY_BITMAP const *cols);
>> @@ -3740,13 +3738,11 @@ public:
>> #if !defined(MYSQL_CLIENT)
>> static bool binlog_row_logging_function(THD *thd, TABLE *table,
>> bool is_transactional,
>> - MY_BITMAP *cols,
>> - uint fields,
>> const uchar *before_record,
>> const uchar *after_record)
>> {
>> return thd->binlog_update_row(table, is_transactional,
>> - cols, fields, before_record, after_record);
>> + before_record, after_record);
>> }
>> #endif
>>
>> @@ -3800,7 +3796,7 @@ public:
>>
>> #ifndef MYSQL_CLIENT
>> Delete_rows_log_event(THD*, TABLE*, ulong,
>> - MY_BITMAP const *cols, bool is_transactional);
>> + bool is_transactional);
>> #endif
>> #ifdef HAVE_REPLICATION
>> Delete_rows_log_event(const char *buf, uint event_len,
>> @@ -3809,14 +3805,12 @@ public:
>> #if !defined(MYSQL_CLIENT)
>> static bool binlog_row_logging_function(THD *thd, TABLE *table,
>> bool is_transactional,
>> - MY_BITMAP *cols,
>> - uint fields,
>> const uchar *before_record,
>> const uchar *after_record
>> __attribute__((unused)))
>> {
>> return thd->binlog_delete_row(table, is_transactional,
>> - cols, fields, before_record);
>> + before_record);
>> }
>> #endif
>>
>>
>> === modified file 'sql/log_event_old.cc'
>> --- a/sql/log_event_old.cc 2009-10-09 13:26:37 +0000
>> +++ b/sql/log_event_old.cc 2009-11-06 01:09:40 +0000
>> @@ -2007,7 +2007,7 @@ Old_rows_log_event::write_row(const Rela
>>
>> /* fill table->record[0] with default values */
>>
>> - if ((error= prepare_record(table, m_width,
>> + if ((error= prepare_record(table, table->write_set,
>> TRUE /* check if columns have def. values */)))
>> DBUG_RETURN(error);
>>
>> @@ -2222,7 +2222,7 @@ int Old_rows_log_event::find_row(const R
>> /* unpack row - missing fields get default values */
>>
>> // TODO: shall we check and report errors here?
>> - prepare_record(table, m_width, FALSE /* don't check errors */);
>> + prepare_record(table, table->read_set, FALSE /* don't check errors */);
>> error= unpack_current_row(rli);
>>
>> #ifndef DBUG_OFF
>>
>> === modified file 'sql/log_event_old.h'
>> --- a/sql/log_event_old.h 2007-12-05 19:00:14 +0000
>> +++ b/sql/log_event_old.h 2009-11-06 01:09:40 +0000
>> @@ -358,7 +358,8 @@ class Write_rows_log_event_old : public
>> public:
>> #if !defined(MYSQL_CLIENT)
>> Write_rows_log_event_old(THD*, TABLE*, ulong table_id,
>> - MY_BITMAP const *cols, bool is_transactional);
>> + MY_BITMAP const *cols,
>> + bool is_transactional);
>> #endif
>> #ifdef HAVE_REPLICATION
>> Write_rows_log_event_old(const char *buf, uint event_len,
>> @@ -367,14 +368,12 @@ public:
>> #if !defined(MYSQL_CLIENT)
>> static bool binlog_row_logging_function(THD *thd, TABLE *table,
>> bool is_transactional,
>> - MY_BITMAP *cols,
>> - uint fields,
>> const uchar *before_record
>> __attribute__((unused)),
>> const uchar *after_record)
>> {
>> return thd->binlog_write_row(table, is_transactional,
>> - cols, fields, after_record);
>> + after_record);
>> }
>> #endif
>>
>> @@ -444,13 +443,11 @@ public:
>> #if !defined(MYSQL_CLIENT)
>> static bool binlog_row_logging_function(THD *thd, TABLE *table,
>> bool is_transactional,
>> - MY_BITMAP *cols,
>> - uint fields,
>> const uchar *before_record,
>> const uchar *after_record)
>> {
>> return thd->binlog_update_row(table, is_transactional,
>> - cols, fields, before_record, after_record);
>> + before_record, after_record);
>> }
>> #endif
>>
>> @@ -509,7 +506,8 @@ class Delete_rows_log_event_old : public
>> public:
>> #ifndef MYSQL_CLIENT
>> Delete_rows_log_event_old(THD*, TABLE*, ulong,
>> - MY_BITMAP const *cols, bool is_transactional);
>> + MY_BITMAP const *cols,
>> + bool is_transactional);
>> #endif
>> #ifdef HAVE_REPLICATION
>> Delete_rows_log_event_old(const char *buf, uint event_len,
>> @@ -518,14 +516,12 @@ public:
>> #if !defined(MYSQL_CLIENT)
>> static bool binlog_row_logging_function(THD *thd, TABLE *table,
>> bool is_transactional,
>> - MY_BITMAP *cols,
>> - uint fields,
>> const uchar *before_record,
>> const uchar *after_record
>> __attribute__((unused)))
>> {
>> return thd->binlog_delete_row(table, is_transactional,
>> - cols, fields, before_record);
>> + before_record);
>> }
>> #endif
>>
>
>
> ok.
>
>> === modified file 'sql/mysql_priv.h'
>> --- a/sql/mysql_priv.h 2009-09-29 14:40:52 +0000
>> +++ b/sql/mysql_priv.h 2009-11-06 01:09:40 +0000
>> @@ -1860,6 +1860,7 @@ extern ulong binlog_cache_size, open_fil
>> extern ulonglong max_binlog_cache_size;
>> extern ulong max_binlog_size, max_relay_log_size;
>> extern ulong opt_binlog_rows_event_max_size;
>> +extern ulong opt_binlog_row_image_id;
>> extern ulong rpl_recovery_rank, thread_cache_size, thread_pool_size;
>> extern ulong back_log;
>> #endif /* MYSQL_SERVER */
>> @@ -1983,6 +1984,7 @@ extern TYPELIB thread_handling_typelib;
>> extern uint8 uc_update_queries[SQLCOM_END+1];
>> extern uint sql_command_flags[];
>> extern TYPELIB log_output_typelib;
>> +extern TYPELIB binlog_row_image_typelib;
>>
>> /* optional things, have_* variables */
>> extern SHOW_COMP_OPTION have_community_features;
>>
>> === modified file 'sql/mysqld.cc'
>> --- a/sql/mysqld.cc 2009-10-03 10:50:25 +0000
>> +++ b/sql/mysqld.cc 2009-11-06 01:09:40 +0000
>> @@ -506,6 +506,14 @@ TYPELIB binlog_format_typelib=
>> binlog_format_names, NULL };
>> ulong opt_binlog_format_id= (ulong) BINLOG_FORMAT_UNSPEC;
>> const char *opt_binlog_format= binlog_format_names[opt_binlog_format_id];
>> +
>> +const char *binlog_row_image_names[]= {"MINIMAL", "NOBLOB", "FULL", NullS};
>> +TYPELIB binlog_row_image_typelib=
>> + { array_elements(binlog_row_image_names) - 1, "",
>> + binlog_row_image_names, NULL };
>> +ulong opt_binlog_row_image_id= (ulong) BINLOG_ROW_IMAGE_FULL;
>> +const char *opt_binlog_row_image_arg=
>> + binlog_row_image_names[BINLOG_ROW_IMAGE_MINIMAL];
>> #ifdef HAVE_INITGROUPS
>> static bool calling_initgroups= FALSE; /**< Used in SIGSEGV handler. */
>> #endif
>> @@ -5685,7 +5693,8 @@ enum options_mysqld
>> OPT_IGNORE_BUILTIN_INNODB,
>> OPT_SYNC_RELAY_LOG,
>> OPT_SYNC_RELAY_LOG_INFO,
>> - OPT_SYNC_MASTER_INFO
>> + OPT_SYNC_MASTER_INFO,
>> + OPT_BINLOG_ROW_IMAGE
>> };
>>
>>
>> @@ -7038,6 +7047,16 @@ The minimum value for this variable is 4
>> (uchar**) &max_system_variables.net_wait_timeout, 0, GET_ULONG,
>> REQUIRED_ARG, NET_WAIT_TIMEOUT, 1, IF_WIN(INT_MAX32/1000, LONG_TIMEOUT),
>> 0, 1, 0},
>
>
>> + {"binlog-row-image", OPT_BINLOG_ROW_IMAGE,
>> + "Controls wether the records should be logged in a full, reversible or "
>> + "minimal manner. Full means that all columns are stored in the before "
>> + "and after image are logged. Reversible means that PK and changed "
>> + "columns are stored in the before and after image. Minimal means that PK "
>> + "columns are stored in before image and changed columns in the after "
>> + "image. (Default: minimal.)",
>
> Don't use the expression manner, replace it by the possible values.
> Reversible? I thought we would postpone its implementation?
> And blobs?
> Is minimal the default?
>
>> + (uchar **) &opt_binlog_row_image_arg,
>> + (uchar **) &opt_binlog_row_image_arg,
>> + 0, GET_STR, REQUIRED_ARG, 0, 0, 0, 0, 0, 0},
>> {0, 0, 0, 0, 0, 0, GET_NO_ARG, NO_ARG, 0, 0, 0, 0, 0, 0}
>> };
>>
>> @@ -7779,6 +7798,8 @@ static int mysql_init_variables(void)
>> global_system_variables.old_passwords= 0;
>> global_system_variables.old_alter_table= 0;
>> global_system_variables.binlog_format= BINLOG_FORMAT_UNSPEC;
>> + global_system_variables.binlog_row_image= BINLOG_ROW_IMAGE_FULL;
>> +
>> /*
>> Default behavior for 4.1 and 5.0 is to treat NULL values as unequal
>> when collecting index statistics for MyISAM tables.
>> @@ -8044,6 +8065,13 @@ mysqld_get_one_option(int optid,
>> global_system_variables.binlog_format= opt_binlog_format_id= id - 1;
>> break;
>> }
>> + case OPT_BINLOG_ROW_IMAGE:
>> + {
>> + int id;
>> + id= find_type_or_exit(argument, &binlog_row_image_typelib,
> opt->name);
>> + global_system_variables.binlog_row_image= opt_binlog_row_image_id= id - 1;
>> + break;
>> + }
>> case (int)OPT_BINLOG_DO_DB:
>> {
>> binlog_filter->add_do_db(argument);
>>
>> === modified file 'sql/rpl_injector.cc'
>> --- a/sql/rpl_injector.cc 2008-02-19 11:43:01 +0000
>> +++ b/sql/rpl_injector.cc 2009-11-06 01:09:40 +0000
>> @@ -109,14 +109,15 @@ int injector::transaction::write_row (se
>> record_type record)
>> {
>> DBUG_ENTER("injector::transaction::write_row(...)");
>> -
>> if (int error= check_state(ROW_STATE))
>> DBUG_RETURN(error);
>>
>> server_id_type save_id= m_thd->server_id;
>> m_thd->set_server_id(sid);
>> - m_thd->binlog_write_row(tbl.get_table(), tbl.is_transactional(),
>> - cols, colcnt, record);
>> +
>> + table::save_sets saveset(tbl, cols, cols);
>> +
>> + m_thd->binlog_write_row(tbl.get_table(), tbl.is_transactional(), record);
>> m_thd->set_server_id(save_id);
>> DBUG_RETURN(0);
>> }
>> @@ -133,8 +134,8 @@ int injector::transaction::delete_row(se
>>
>> server_id_type save_id= m_thd->server_id;
>> m_thd->set_server_id(sid);
>> - m_thd->binlog_delete_row(tbl.get_table(), tbl.is_transactional(),
>> - cols, colcnt, record);
>> + table::save_sets saveset(tbl, cols, cols);
>> + m_thd->binlog_delete_row(tbl.get_table(), tbl.is_transactional(),
> record);
>> m_thd->set_server_id(save_id);
>> DBUG_RETURN(0);
>> }
>> @@ -151,8 +152,11 @@ int injector::transaction::update_row(se
>>
>> server_id_type save_id= m_thd->server_id;
>> m_thd->set_server_id(sid);
>> - m_thd->binlog_update_row(tbl.get_table(), tbl.is_transactional(),
>> - cols, colcnt, before, after);
>> +
>> + // The read- and write sets with autorestore
>
> Please, elaborate on this.
>
>> + table::save_sets saveset(tbl, cols, cols);
>> +
>> + m_thd->binlog_update_row(tbl.get_table(), tbl.is_transactional(), before,
> after);
>> m_thd->set_server_id(save_id);
>> DBUG_RETURN(0);
>> }
>
>
> Why do I need this if before writting to the binary log we save bitmap and after
> restore?
> Can you elaborate and also add some comments?
> Can you also explain why the readset is equal to the writeset?
>
>> === modified file 'sql/rpl_injector.h'
>> --- a/sql/rpl_injector.h 2007-05-10 09:59:39 +0000
>> +++ b/sql/rpl_injector.h 2009-11-06 01:09:40 +0000
>> @@ -117,6 +117,27 @@ public:
>> class table
>> {
>> public:
>> + class save_sets {
>> + public:
>> + save_sets(table const &tbl, MY_BITMAP const *new_rs, MY_BITMAP
> const *new_ws)
>> + : m_table(tbl.get_table()),
>> + save_read_set(m_table->read_set),
>> + save_write_set(m_table->write_set)
>> + {
>> +
> m_table->column_bitmaps_set(const_cast<MY_BITMAP*>(new_rs),
>> + const_cast<MY_BITMAP*>(new_ws));
>> + }
>> +
>> + ~save_sets() {
>> + m_table->column_bitmaps_set(save_read_set, save_write_set);
>> + }
>> +
>> + private:
>> + TABLE *m_table;
>> + MY_BITMAP *save_read_set;
>> + MY_BITMAP *save_write_set;
>> + };
>> +
>> table(TABLE *table, bool is_transactional)
>> : m_table(table), m_is_transactional(is_transactional)
>> {
>>
>> === modified file 'sql/rpl_record.cc'
>> --- a/sql/rpl_record.cc 2009-09-29 14:18:44 +0000
>> +++ b/sql/rpl_record.cc 2009-11-06 01:09:40 +0000
>> @@ -75,15 +75,20 @@ pack_row(TABLE *table, MY_BITMAP const*
>> unsigned int null_bits= (1U << 8) - 1;
>> // Mask to mask out the correct but among the null bits
>> unsigned int null_mask= 1U;
>> + DBUG_PRINT("debug", ("null ptr: 0x%lx; row start: %p; null bytes: %d",
>> + (ulong) null_ptr, row_data, null_byte_count));
>> + DBUG_DUMP("cols", (uchar*) cols->bitmap, cols->last_word_ptr -
> cols->bitmap + 1);
>> for ( ; (field= *p_field) ; p_field++)
>> {
>> - DBUG_PRINT("debug", ("null_mask=%d; null_ptr=%p; row_data=%p;
> null_byte_count=%d",
>> - null_mask, null_ptr, row_data, null_byte_count));
>> + DBUG_PRINT("debug", ("field: %s; null mask: 0x%x",
>> + field->field_name, null_mask));
>> if (bitmap_is_set(cols, p_field - table->field))
>> {
>> my_ptrdiff_t offset;
>> if (field->is_null(rec_offset))
>> {
>> + DBUG_PRINT("debug", ("Is NULL; null_mask: 0x%x; null_bits: 0x%x",
>> + null_mask, null_bits));
>> offset= def_offset;
>> null_bits |= null_mask;
>> }
>> @@ -104,9 +109,9 @@ pack_row(TABLE *table, MY_BITMAP const*
>> #endif
>> pack_ptr= field->pack(pack_ptr, field->ptr + offset,
>> field->max_data_length(), TRUE);
>> - DBUG_PRINT("debug", ("field: %s; pack_ptr: 0x%lx;"
>> + DBUG_PRINT("debug", ("Packed into row; pack_ptr: 0x%lx;"
>> " pack_ptr':0x%lx; bytes: %d",
>> - field->field_name, (ulong) old_pack_ptr,
>> + (ulong) old_pack_ptr,
>> (ulong) pack_ptr,
>> (int) (pack_ptr - old_pack_ptr)));
>> }
>> @@ -120,6 +125,12 @@ pack_row(TABLE *table, MY_BITMAP const*
>> null_bits= (1U << 8) - 1;
>> }
>> }
>> +#ifndef DBUG_OFF
>> + else
>> + {
>> + DBUG_PRINT("debug", ("Skipped"));
>> + }
>> +#endif
>> }
>>
>> /*
>> @@ -207,6 +218,11 @@ unpack_row(Relay_log_info const *rli,
>> {
>> Field *const f= *field_ptr;
>>
>> + DBUG_PRINT("debug", ("field: %s; null mask: 0x%x; null bits: 0x%lx;"
>> + " row start: %p; null bytes: %ld",
>> + f->field_name, null_mask, (ulong) null_bits,
>> + pack_ptr, (ulong) master_null_byte_count));
>> +
>> /*
>> No need to bother about columns that does not exist: they have
>> gotten default values when being emptied above.
>> @@ -268,15 +284,20 @@ unpack_row(Relay_log_info const *rli,
>> uchar const *const old_pack_ptr= pack_ptr;
>> #endif
>> pack_ptr= f->unpack(f->ptr, pack_ptr, metadata, TRUE);
>> - DBUG_PRINT("debug", ("field: %s; metadata: 0x%x;"
>> + DBUG_PRINT("debug", ("Unpacked; metadata: 0x%x;"
>> " pack_ptr: 0x%lx; pack_ptr': 0x%lx; bytes: %d",
>> - f->field_name, metadata,
>> - (ulong) old_pack_ptr, (ulong) pack_ptr,
>> + metadata, (ulong) old_pack_ptr, (ulong) pack_ptr,
>> (int) (pack_ptr - old_pack_ptr)));
>> }
>>
>> null_mask <<= 1;
>> }
>> +#ifndef DBUG_OFF
>> + else
>> + {
>> + DBUG_PRINT("debug", ("Non-existent: skipped"));
>> + }
>> +#endif
>> i++;
>> }
>>
>> @@ -331,8 +352,6 @@ unpack_row(Relay_log_info const *rli,
>> be NULL. Otherwise error is reported.
>>
>> @param table Table whose record[0] buffer is prepared.
>> - @param skip Number of columns for which default/nullable check
>> - should be skipped.
>> @param check Specifies if lack of default error needs checking.
>> @param abort_on_warning
>> Controls how to react on lack of a field's default.
>> @@ -341,8 +360,7 @@ unpack_row(Relay_log_info const *rli,
>>
>> @returns 0 on success or a handler level error code
>> */
>> -int prepare_record(TABLE *const table,
>> - const uint skip, const bool check,
>> +int prepare_record(TABLE *const table, const MY_BITMAP *cols, const bool check,
>> const bool abort_on_warning, const bool first_row)
>> {
>> DBUG_ENTER("prepare_record");
>> @@ -350,12 +368,7 @@ int prepare_record(TABLE *const table,
>> int error= 0;
>> restore_record(table, s->default_values);
>>
>> - /*
>> - This skip should be revisited in 6.0, because in 6.0 RBR one
>> - can have holes in the row (as the grain of the writeset is
>> - the column and not the entire row).
>> - */
>> - if (skip >= table->s->fields || !check)
>> + if (!check)
>> DBUG_RETURN(0);
>>
>> /*
>> @@ -364,34 +377,38 @@ int prepare_record(TABLE *const table,
>> explicit value for a field not having the explicit default
>> (@c check_that_all_fields_are_given_values()).
>> */
>> - for (Field **field_ptr= table->field+skip; *field_ptr; ++field_ptr)
>> +
>> + DBUG_PRINT_BITSET("debug", "cols: %s", cols);
>> + for (Field **field_ptr= table->field; *field_ptr; ++field_ptr)
>> {
>> - uint32 const mask= NOT_NULL_FLAG | NO_DEFAULT_VALUE_FLAG;
>> - Field *const f= *field_ptr;
>> - if ((f->flags & NO_DEFAULT_VALUE_FLAG) &&
>> - (f->real_type() != MYSQL_TYPE_ENUM))
>> - {
>> -
>> - MYSQL_ERROR::enum_warning_level error_type=
>> - MYSQL_ERROR::WARN_LEVEL_NOTE;
>> - if (abort_on_warning && (table->file->has_transactions() ||
>> - first_row))
>> - {
>> - error= HA_ERR_ROWS_EVENT_APPLY;
>> - error_type= MYSQL_ERROR::WARN_LEVEL_ERROR;
>> - }
>> - else
>> + if ((uint) (field_ptr - table->field) >= cols->n_bits ||
>> + !bitmap_is_set(cols, field_ptr - table->field))
>> + {
>> + Field *const f= *field_ptr;
>> + if ((f->flags & NO_DEFAULT_VALUE_FLAG) &&
>> + (f->real_type() != MYSQL_TYPE_ENUM))
>> {
>> - f->set_default();
>> - error_type= MYSQL_ERROR::WARN_LEVEL_WARN;
>> + MYSQL_ERROR::enum_warning_level error_type=
>> + MYSQL_ERROR::WARN_LEVEL_NOTE;
>> + if (abort_on_warning && (table->file->has_transactions()
> ||
>> + first_row))
>> + {
>> + error= HA_ERR_ROWS_EVENT_APPLY;
>> + error_type= MYSQL_ERROR::WARN_LEVEL_ERROR;
>> + }
>> + else
>> + {
>> + DBUG_PRINT("debug", ("Set default; field: %s", f->field_name));
>> + f->set_default();
>> + error_type= MYSQL_ERROR::WARN_LEVEL_WARN;
>> + }
>> + push_warning_printf(current_thd, error_type,
>> + ER_NO_DEFAULT_FOR_FIELD,
>> + ER(ER_NO_DEFAULT_FOR_FIELD),
>> + f->field_name);
>> }
>> - push_warning_printf(current_thd, error_type,
>> - ER_NO_DEFAULT_FOR_FIELD,
>> - ER(ER_NO_DEFAULT_FOR_FIELD),
>> - f->field_name);
>> }
>> }
>> -
>> DBUG_RETURN(error);
>> }
>>
>>
>> === modified file 'sql/rpl_record.h'
>> --- a/sql/rpl_record.h 2009-09-29 14:18:44 +0000
>> +++ b/sql/rpl_record.h 2009-11-06 01:09:40 +0000
>> @@ -31,7 +31,8 @@ int unpack_row(Relay_log_info const *rli
>> const bool abort_on_warning= TRUE, const bool first_row= TRUE);
>>
>> // Fill table's record[0] with default values.
>> -int prepare_record(TABLE *const table, const uint skip, const bool check,
>> +int prepare_record(TABLE *const table, const MY_BITMAP *cols,
>> + const bool check,
>> const bool abort_on_warning= TRUE,
>> const bool first_row= TRUE);
>> #endif
>
> ok.
>
>> === modified file 'sql/set_var.cc'
>> --- a/sql/set_var.cc 2009-09-29 14:40:52 +0000
>> +++ b/sql/set_var.cc 2009-11-06 01:09:40 +0000
>> @@ -180,6 +180,9 @@ static sys_var_long_ptr sys_binlog_cache
>> &binlog_cache_size);
>> static sys_var_thd_binlog_format sys_binlog_format(&vars, "binlog_format",
>> &SV::binlog_format);
>> +static sys_var_thd_binlog_row_image sys_binlog_row_image(&vars,
>> + "binlog_row_image",
>> +
> &SV::binlog_row_image);
>> static sys_var_thd_ulong sys_bulk_insert_buff_size(&vars,
> "bulk_insert_buffer_size",
>> &SV::bulk_insert_buff_size);
>> static sys_var_const_os sys_character_sets_dir(&vars,
>> @@ -1289,12 +1292,23 @@ bool sys_var_thd_binlog_format::is_reado
>> return sys_var_thd_enum::is_readonly();
>> }
>
>>
>> -
>
> Please, avoid this.
>
>> void fix_binlog_format_after_update(THD *thd, enum_var_type type)
>> {
>> thd->reset_current_stmt_binlog_row_based();
>> }
>>
>> +bool sys_var_thd_binlog_row_image::check(THD *thd, set_var *var) {
>> + /*
>> + All variables that affect writing to binary log (either format or
>> + turning logging on and off) use the same checking. We call the
>> + superclass ::check function to assign the variable correctly, and
>> + then check the value.
>> + */
>
> Please, improve the comment.
>
>> + bool result= sys_var_thd_enum::check(thd, var);
>> + if (!result)
>> + result= check_log_update(thd, var);
>> + return result;
>> +}
>>
>> static void fix_max_binlog_size(THD *thd, enum_var_type type)
>> {
>>
>> === modified file 'sql/set_var.h'
>> --- a/sql/set_var.h 2009-09-29 14:40:52 +0000
>> +++ b/sql/set_var.h 2009-11-06 01:09:40 +0000
>> @@ -1281,6 +1281,19 @@ public:
>> bool is_readonly() const;
>> };
>>
>> +class sys_var_thd_binlog_row_image :public sys_var_thd_enum
>> +{
>> +public:
>> + sys_var_thd_binlog_row_image(sys_var_chain *chain, const char *name_arg,
>> + ulong SV::*offset_arg)
>> + :sys_var_thd_enum(chain, name_arg, offset_arg,
>> + &binlog_row_image_typelib,
>> + NULL)
>> + {};
>> + bool check(THD *thd, set_var *var);
>> +};
>> +
>> +
>> /****************************************************************************
>> Classes for parsing of the SET command
>> ****************************************************************************/
>>
>> === modified file 'sql/sql_class.cc'
>> --- a/sql/sql_class.cc 2009-10-03 10:50:25 +0000
>> +++ b/sql/sql_class.cc 2009-11-06 01:09:40 +0000
>> @@ -3306,8 +3306,6 @@ void xid_cache_delete(XID_STATE *xid_sta
>>
>> template <class RowsEventT> Rows_log_event*
>> THD::binlog_prepare_pending_rows_event(TABLE* table, uint32 serv_id,
>> - MY_BITMAP const* cols,
>> - size_t colcnt,
>> size_t needed,
>> bool is_transactional,
>> RowsEventT *hint __attribute__((unused)))
>> @@ -3345,13 +3343,11 @@ THD::binlog_prepare_pending_rows_event(T
>> pending->server_id != serv_id ||
>> pending->get_table_id() != table->s->table_map_id ||
>> pending->get_type_code() != type_code ||
>> - pending->get_data_size() + needed > opt_binlog_rows_event_max_size
> ||
>> - pending->get_width() != colcnt ||
>> - !bitmap_cmp(pending->get_cols(), cols))
>> + pending->get_data_size() + needed > opt_binlog_rows_event_max_size)
>
>> {
>> /* Create a new RowsEventT... */
>> Rows_log_event* const
>> - ev= new RowsEventT(this, table, table->s->table_map_id, cols,
>> + ev= new RowsEventT(this, table, table->s->table_map_id,
>> is_transactional);
>> if (unlikely(!ev))
>> DBUG_RETURN(NULL);
>> @@ -3377,18 +3373,15 @@ THD::binlog_prepare_pending_rows_event(T
>> compiling option.
>> */
>> template Rows_log_event*
>> -THD::binlog_prepare_pending_rows_event(TABLE*, uint32, MY_BITMAP const*,
>> - size_t, size_t, bool,
>> +THD::binlog_prepare_pending_rows_event(TABLE*, uint32, size_t, bool,
>> Write_rows_log_event*);
>>
>> template Rows_log_event*
>> -THD::binlog_prepare_pending_rows_event(TABLE*, uint32, MY_BITMAP const*,
>> - size_t colcnt, size_t, bool,
>> +THD::binlog_prepare_pending_rows_event(TABLE*, uint32, size_t, bool,
>> Delete_rows_log_event *);
>>
>> template Rows_log_event*
>> -THD::binlog_prepare_pending_rows_event(TABLE*, uint32, MY_BITMAP const*,
>> - size_t colcnt, size_t, bool,
>> +THD::binlog_prepare_pending_rows_event(TABLE*, uint32, size_t, bool,
>> Update_rows_log_event *);
>> #endif
>>
>> @@ -3580,7 +3573,6 @@ namespace {
>>
>>
>> int THD::binlog_write_row(TABLE* table, bool is_trans,
>> - MY_BITMAP const* cols, size_t colcnt,
>> uchar const *record)
>> {
>> DBUG_ASSERT(current_stmt_binlog_row_based &&
> mysql_bin_log.is_open());
>> @@ -3595,11 +3587,10 @@ int THD::binlog_write_row(TABLE* table,
>>
>> uchar *row_data= memory.slot(0);
>>
>> - size_t const len= pack_row(table, cols, row_data, record);
>> + size_t const len= pack_row(table, table->write_set, row_data, record);
>>
>> Rows_log_event* const ev=
>> - binlog_prepare_pending_rows_event(table, server_id, cols, colcnt,
>> - len, is_trans,
>> + binlog_prepare_pending_rows_event(table, server_id, len, is_trans,
>>
> static_cast<Write_rows_log_event*>(0));
>>
>> if (unlikely(ev == 0))
>> @@ -3609,12 +3600,26 @@ int THD::binlog_write_row(TABLE* table,
>> }
>>
>> int THD::binlog_update_row(TABLE* table, bool is_trans,
>> - MY_BITMAP const* cols, size_t colcnt,
>> const uchar *before_record,
>> const uchar *after_record)
>> {
>> + int error= 0;
>> DBUG_ASSERT(current_stmt_binlog_row_based &&
> mysql_bin_log.is_open());
>>
>> + /**
>> + Save a reference to the original read and write set bitmaps.
>> + We will need this to restore the bitmaps at the end.
>> + */
>> + MY_BITMAP *old_read_set= table->read_set;
>> + MY_BITMAP *old_write_set= table->write_set;
>> +
>> + /**
>> + This will remove spurious fields required during execution but
>> + not needed for binlogging. This is done according to the:
>> + binlog-row-image option.
>> + */
>> + binlog_prepare_row_images(table);
>> +
>> size_t const before_maxlen = max_row_length(table, before_record);
>> size_t const after_maxlen = max_row_length(table, after_record);
>>
>> @@ -3625,9 +3630,9 @@ int THD::binlog_update_row(TABLE* table,
>> uchar *before_row= row_data.slot(0);
>> uchar *after_row= row_data.slot(1);
>>
>> - size_t const before_size= pack_row(table, cols, before_row,
>> + size_t const before_size= pack_row(table, table->read_set, before_row,
>> before_record);
>> - size_t const after_size= pack_row(table, cols, after_row,
>> + size_t const after_size= pack_row(table, table->write_set, after_row,
>> after_record);
>>
>> /*
>> @@ -3642,24 +3647,43 @@ int THD::binlog_update_row(TABLE* table,
>> #endif
>>
>> Rows_log_event* const ev=
>> - binlog_prepare_pending_rows_event(table, server_id, cols, colcnt,
>> + binlog_prepare_pending_rows_event(table, server_id,
>> before_size + after_size, is_trans,
>> static_cast<Update_rows_log_event*>(0));
>>
>> if (unlikely(ev == 0))
>> return HA_ERR_OUT_OF_MEM;
>>
>> - return
>> - ev->add_row_data(before_row, before_size) ||
>> - ev->add_row_data(after_row, after_size);
>> + error= ev->add_row_data(before_row, before_size) ||
>> + ev->add_row_data(after_row, after_size);
>> +
>
>> + /* restore read set for the rest of execution */
>> + table->column_bitmaps_set_no_signal(old_read_set,
>> + old_write_set);
>
> Restore not only the read set but also the write set.
>
>> +
>> + return error;
>> }
>>
>> int THD::binlog_delete_row(TABLE* table, bool is_trans,
>> - MY_BITMAP const* cols, size_t colcnt,
>> uchar const *record)
>> {
>> + int error= 0;
>> DBUG_ASSERT(current_stmt_binlog_row_based &&
> mysql_bin_log.is_open());
>>
>> + /**
>> + Save a reference to the original read and write set bitmaps.
>> + We will need this to restore the bitmaps at the end.
>> + */
>> + MY_BITMAP *old_read_set= table->read_set;
>> + MY_BITMAP *old_write_set= table->write_set;
>> +
>> + /**
>> + This will remove spurious fields required during execution but
>> + not needed for binlogging. This is done according to the:
>> + binlog-row-image option.
>> + */
>> + binlog_prepare_row_images(table);
>> +
>> /*
>> Pack records into format for transfer. We are allocating more
>> memory than needed, but that doesn't matter.
>> @@ -3670,19 +3694,80 @@ int THD::binlog_delete_row(TABLE* table,
>>
>> uchar *row_data= memory.slot(0);
>>
>> - size_t const len= pack_row(table, cols, row_data, record);
>
>> + DBUG_DUMP("table->read_set", (uchar*) table->read_set->bitmap,
> (table->s->fields + 7) / 8);
>
> Add a line explaining why you are only interesing in dumping information on the
> delete.
>
>> + size_t const len= pack_row(table, table->read_set, row_data, record);
>>
>> Rows_log_event* const ev=
>> - binlog_prepare_pending_rows_event(table, server_id, cols, colcnt,
>> - len, is_trans,
>> + binlog_prepare_pending_rows_event(table, server_id, len, is_trans,
>> static_cast<Delete_rows_log_event*>(0));
>>
>> if (unlikely(ev == 0))
>> return HA_ERR_OUT_OF_MEM;
>>
>> - return ev->add_row_data(row_data, len);
>> + error= ev->add_row_data(row_data, len);
>> +
>> + /* restore read set for the rest of execution */
>> + table->column_bitmaps_set_no_signal(old_read_set,
>> + old_write_set);
>> +
>> + return error;
>
> Same as above.
>
>> }
>>
>> +void THD::binlog_prepare_row_images(TABLE *table)
>> +{
>> + DBUG_ENTER("THD::binlog_prepare_row_images");
>> + /**
>> + Remove from read_set spurious columns. The write_set has been
>> + handled before in table->mark_columns_needed_for_update.
>> + */
>> +
>> + DBUG_PRINT_BITSET("debug", "table->read_set (before preparing): %s",
> table->read_set);
>> + THD *thd= table->in_use;
>> +
>> + /**
>> + if there is a primary key in the table (ie, user declared PK or a
>> + non-null unique index) and we dont want to ship the entire image.
>> + */
>> + if (table->s->primary_key < MAX_KEY &&
>> + (thd->variables.binlog_row_image < BINLOG_ROW_IMAGE_FULL))
>> + {
>> + DBUG_ASSERT(table->read_set != &table->tmp_set);
>
> Please, try to describe what the assertion means.
>
>> +
>> + bitmap_clear_all(&table->tmp_set);
>> +
>> + switch(thd->variables.binlog_row_image)
>> + {
>> + case BINLOG_ROW_IMAGE_MINIMAL:
>> + /* MINIMAL: Mark only PK */
>> +
> table->mark_columns_used_by_index_no_reset(table->s->primary_key,
>> + &table->tmp_set);
>> + break;
>> + case BINLOG_ROW_IMAGE_NOBLOB:
>> + /**
>> + NOBLOB: Remove unnecessary BLOB fields from read_set
>> + (the ones that are not part of PK).
>> + */
>> + bitmap_union(&table->tmp_set, table->read_set);
>> + for (Field **ptr=table->field ; *ptr ; ptr++)
>> + {
>> + Field *field= (*ptr);
>> + if ((field->type() == MYSQL_TYPE_BLOB) &&
>> + !(field->flags & PRI_KEY_FLAG))
>> + bitmap_clear_bit(&table->tmp_set, field->field_index);
>> + }
>> + break;
>> + default:
>> + DBUG_ASSERT(0); // impossible.
>> + }
>> +
>> + /* set the temporary read_set */
>> + table->column_bitmaps_set_no_signal(&table->tmp_set,
>> + table->write_set);
>> + }
>> +
>> + DBUG_PRINT_BITSET("debug", "table->read_set (after preparing): %s",
> table->read_set);
>> + DBUG_VOID_RETURN;
>> +}
>>
>> int THD::binlog_remove_pending_rows_event(bool clear_maps)
>> {
>>
>> === modified file 'sql/sql_class.h'
>> --- a/sql/sql_class.h 2009-10-03 10:50:25 +0000
>> +++ b/sql/sql_class.h 2009-11-06 01:09:40 +0000
>> @@ -355,6 +355,7 @@ struct system_variables
>> ulong ndb_index_stat_cache_entries;
>> ulong ndb_index_stat_update_freq;
>> ulong binlog_format; // binlog format for this thd (see enum_binlog_format)
>> + ulong binlog_row_image;
>> /*
>> In slave thread we need to know in behalf of which
>> thread the query is being run to replicate temp tables properly
>> @@ -1405,14 +1406,12 @@ public:
>> void binlog_set_stmt_begin();
>> int binlog_write_table_map(TABLE *table, bool is_transactional);
>> int binlog_write_row(TABLE* table, bool is_transactional,
>> - MY_BITMAP const* cols, size_t colcnt,
>> - const uchar *buf);
>> + const uchar *new_data);
>> int binlog_delete_row(TABLE* table, bool is_transactional,
>> - MY_BITMAP const* cols, size_t colcnt,
>> - const uchar *buf);
>> + const uchar *old_data);
>> int binlog_update_row(TABLE* table, bool is_transactional,
>> - MY_BITMAP const* cols, size_t colcnt,
>> const uchar *old_data, const uchar *new_data);
>> + void binlog_prepare_row_images(TABLE* table);
>>
>> void set_server_id(uint32 sid) { server_id = sid; }
>>
>> @@ -1421,8 +1420,6 @@ public:
>> */
>> template <class RowsEventT> Rows_log_event*
>> binlog_prepare_pending_rows_event(TABLE* table, uint32 serv_id,
>> - MY_BITMAP const* cols,
>> - size_t colcnt,
>> size_t needed,
>> bool is_transactional,
>> RowsEventT* hint);
>>
>> === modified file 'sql/sql_insert.cc'
>> --- a/sql/sql_insert.cc 2009-08-31 14:09:09 +0000
>> +++ b/sql/sql_insert.cc 2009-11-06 01:09:40 +0000
>> @@ -2202,6 +2202,11 @@ int write_delayed(THD *thd, TABLE *table
>> DBUG_PRINT("delayed", ("transmitting auto_inc: %lu",
>> (ulong) row->forced_insert_id));
>> }
>> +
>> + /* Copy the original write set */
>> + bitmap_clear_all(di->table->write_set);
>> + bitmap_union(di->table->write_set, table->write_set);
>> + di->table->file->column_bitmaps_signal();
>
> Please, add an explanation on this part.
>
>>
>> di->rows.push_back(row);
>> di->stacked_inserts++;
>> @@ -2568,7 +2573,13 @@ bool Delayed_insert::handle_inserts(void
>> pthread_mutex_unlock(&mutex);
>>
>> table->next_number_field=table->found_next_number_field;
>> - table->use_all_columns();
>> + /*
>> + needed for some autoinc not null fields.
>> + Otherwise, one would hit an assertion
>> + when the insert tried to read the field.
>> + */
>> + bitmap_set_all(table->read_set);
>> + table->mark_columns_needed_for_insert();
>>
>> thd_proc_info(&thd, "upgrading lock");
>> if (thr_upgrade_write_delay_lock(*thd.lock->locks, delayed_lock))
>
>
> Can you explain the hunks above?
>
>
>> === modified file 'sql/sql_update.cc'
>> --- a/sql/sql_update.cc 2009-08-28 16:21:54 +0000
>> +++ b/sql/sql_update.cc 2009-11-06 01:09:40 +0000
>> @@ -326,6 +326,7 @@ int mysql_update(THD *thd,
>> /* Update the table->file->stats.records number */
>> table->file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK);
>>
>> + table->mark_columns_needed_for_update();
>
>
> I think you should explain why you are moving.
>
>
>> select= make_select(table, 0, 0, conds, 0, &error);
>> if (error || !limit ||
>> (select && select->check_quick(thd, safe_update, limit)))
>> @@ -364,8 +365,6 @@ int mysql_update(THD *thd,
>> }
>> init_ftfuncs(thd, select_lex, 1);
>>
>> - table->mark_columns_needed_for_update();
>> -
>> /* Check if we are modifying a key that we are used to search with */
>>
>> if (select && select->quick)
>>
>> === modified file 'sql/table.cc'
>> --- a/sql/table.cc 2009-08-12 09:46:08 +0000
>> +++ b/sql/table.cc 2009-11-06 01:09:40 +0000
>> @@ -4443,6 +4443,8 @@ void st_table::mark_auto_increment_colum
>>
>> void st_table::mark_columns_needed_for_delete()
>> {
>> + mark_columns_per_binlog_row_image();
>> +
>> if (triggers)
>> triggers->mark_fields_used(TRG_EVENT_DELETE);
>> if (file->ha_table_flags() & HA_REQUIRES_KEY_COLUMNS_FOR_DELETE)
>> @@ -4455,15 +4457,26 @@ void st_table::mark_columns_needed_for_d
>> }
>> file->column_bitmaps_signal();
>> }
>> - if (file->ha_table_flags() & HA_PRIMARY_KEY_REQUIRED_FOR_DELETE)
>> + if (file->ha_table_flags() & HA_PRIMARY_KEY_REQUIRED_FOR_DELETE ||
>> + mysql_bin_log.is_open() && in_use &&
>> + in_use->current_stmt_binlog_row_based)
>
> I don't get why you need this. Can you explain?
> And if you really need this why don't you do whatever you need to do in the
> mark_columns_per_binlog_row_image()?
>
>> {
>> /*
>> - If the handler has no cursor capabilites, we have to read either
>> - the primary key, the hidden primary key or all columns to be
>> - able to do an delete
>> + If the handler has no cursor capabilites, or we have row-based
>> + replication active for the current statement, we have to read
>> + either the primary key, the hidden primary key or all columns to
>> + be able to do an delete
>> */
>
> ok.
>
>> if (s->primary_key == MAX_KEY)
>> - file->use_hidden_primary_key();
>> + {
>> + /*
>> + If in RBR, this was done already in:
>> + mark_columns_per_binlog_row_image
>> + */
>> + if (!(mysql_bin_log.is_open() && in_use &&
>> + in_use->current_stmt_binlog_row_based))
>> + file->use_hidden_primary_key();
>> + }
>
> Please, merge the comments and the "if".
>
>> else
>> {
>> mark_columns_used_by_index_no_reset(s->primary_key, read_set);
>> @@ -4493,6 +4506,9 @@ void st_table::mark_columns_needed_for_d
>>
>> void st_table::mark_columns_needed_for_update()
>> {
>> +
>> + mark_columns_per_binlog_row_image();
>> +
>
> Please, move this to a place after the DBUG_ENTER.
>
>
>> DBUG_ENTER("mark_columns_needed_for_update");
>> if (triggers)
>> triggers->mark_fields_used(TRG_EVENT_UPDATE);
>
>
>
>> @@ -4511,12 +4527,21 @@ void st_table::mark_columns_needed_for_u
>> if (file->ha_table_flags() & HA_PRIMARY_KEY_REQUIRED_FOR_DELETE)
>> {
>> /*
>> - If the handler has no cursor capabilites, we have to read either
>> + If the handler has no cursor capabilites, or we have row-based
>> + logging active for the current statement, we have to read either
>> the primary key, the hidden primary key or all columns to be
>> able to do an update
>> */
>> if (s->primary_key == MAX_KEY)
>> - file->use_hidden_primary_key();
>> + {
>> + /*
>> + If in RBR, this was done already in:
>> + mark_columns_per_binlog_row_image
>> + */
>> + if (!(mysql_bin_log.is_open() && in_use &&
>> + in_use->current_stmt_binlog_row_based))
>> + file->use_hidden_primary_key();
>> + }
>
> Please, merge the comments and the "if".
>
>
>> else
>> {
>> mark_columns_used_by_index_no_reset(s->primary_key, read_set);
>> @@ -4526,6 +4551,67 @@ void st_table::mark_columns_needed_for_u
>> DBUG_VOID_RETURN;
>> }
>
>
>
>>
>> +void st_table::mark_columns_per_binlog_row_image()
>> +{
>> + DBUG_ENTER("mark_columns_per_binlog_row_image");
>
>
> Add a description.
>
>> +
>> + /**
>> + If in RBR we may need to mark some extra columns,
>> + depending on the binlog-row-image command line argument.
>> + */
>> + if ((mysql_bin_log.is_open() && in_use &&
>> + in_use->current_stmt_binlog_row_based))
>
>
> Why do you need the in_use?
> Should it be always true, shouldn't it?
>
>
>> + {
>> +
>> + THD *thd= current_thd;
>> +
>> + /* if there is no PK, then mark all columns for the BI. */
>> + if (s->primary_key >= MAX_KEY)
>> + bitmap_set_all(read_set);
>> +
>> + switch (thd->variables.binlog_row_image)
>> + {
>> + case BINLOG_ROW_IMAGE_FULL:
>
>> + if (s->primary_key < MAX_KEY)
>> + bitmap_set_all(read_set);
>
> Why do you need the if? Performance?
> It would clearer if you remove the "if".
>
>> + bitmap_set_all(write_set);
>> + break;
>> + case BINLOG_ROW_IMAGE_NOBLOB:
>> + /* for every field that is not set, mark it unless it is a blob */
>> + for (Field **ptr=field ; *ptr ; ptr++)
>> + {
>> + Field *field= *ptr;
>> + /*
>> + bypass blob fields. These can be set or not set, we don't care.
>> + Later if we don't need them in the before image, we will discard
>> + them.
>
> Later? When? Please, just give a hint. You don't need to get into too many details.
> Also explain why you do not get rid of them here.
>
>> +
>> + If set in the AI, then the blob is really needed. There is nothing
>> + we can do.
>
> s/. There is/, there is/
>
>> + */
>> + if ((field->flags & PRI_KEY_FLAG) ||
>> + (field->type() != MYSQL_TYPE_BLOB))
>> + bitmap_set_bit(read_set, field->field_index);
>
>
> Now I think you should check if bitmap_set_all was called.
>
>
>> +
>> + if (field->type() != MYSQL_TYPE_BLOB)
>> + bitmap_set_bit(write_set, field->field_index);
>
>
> ok.
>
>
>> + }
>> + break;
>> + case BINLOG_ROW_IMAGE_MINIMAL:
>> + /* mark the primary key if available in the read_set */
>> + if (s->primary_key < MAX_KEY)
>> + mark_columns_used_by_index_no_reset(s->primary_key, read_set);
>> + break;
>> +
>> + default:
>> + DBUG_ASSERT(FALSE);
>> + }
>> + file->column_bitmaps_signal();
>> + }
>> +
>> + DBUG_VOID_RETURN;
>> +}
>> +
>>
>> /*
>> Mark columns the handler needs for doing an insert
>> @@ -4536,6 +4622,7 @@ void st_table::mark_columns_needed_for_u
>>
>> void st_table::mark_columns_needed_for_insert()
>> {
>> + mark_columns_per_binlog_row_image();
>> if (triggers)
>> {
>> /*
>
> ok.
>
>> === modified file 'sql/table.h'
>> --- a/sql/table.h 2009-09-23 21:32:31 +0000
>> +++ b/sql/table.h 2009-11-06 01:09:40 +0000
>> @@ -820,6 +820,7 @@ struct st_table {
>> void mark_columns_needed_for_update(void);
>> void mark_columns_needed_for_delete(void);
>> void mark_columns_needed_for_insert(void);
>> + void mark_columns_per_binlog_row_image(void);
>> inline void column_bitmaps_set(MY_BITMAP *read_set_arg,
>> MY_BITMAP *write_set_arg)
>> {
>
>
> Change the name of the function. Maybe something like
> mark_columns_needed_for_binlog?
>
>
>>
>> ------------------------------------------------------------------------
>>
>> This body part will be downloaded on demand.
>
>