at the very least, this comment in the 5.1 code is not accurate:
Informs handler that write_row() which tries to insert new row into the
table and encounters some already existing row with same primary/unique
key can replace old row with new row instead of reporting error (basically
it informs handler that we do REPLACE instead of simple INSERT).
Off by default.
My storage engine cannot replace the old row with the new row instead
of reporting error, because row based replication will not work.
Hopefully, getting MySQL to respect and set this flag in the binlog,
and have the proper behavior propogated on the slave will happen. This
provides a performance boost for engines, and would avoid having NDB
inject the row itself into the binary log. NDB touching the binary log
sounds like a workaround for this issue, IMO.
On Tue, May 11, 2010 at 2:41 PM, Mats Kindahl <mats@stripped> wrote:
> On 05/11/2010 01:23 PM, Zardosht Kasheff wrote:
>> Hello Mats and Sergei,
>> Thank for responding to my question. I will file a bug. The bug seems
>> to be the following. As Mats said, the flag means "I'm executing a
>> replace right now, so you can optimize for that if you like." Storage
>> engines that want to replace the row in the write call (which is what
>> the name of the flag implies) cannot do so, because replication does
>> not work.
>> I do not understand the following statement: "we have no support for
>> allowing the engine to give this information". I do not see why the
>> engine needs to provide any information. This is how I thought the
>> flag should work. If the operation is "replace into" (and there are no
>> delete triggers on the table), then MySQL calls the storage engine
>> with HA_EXTRA_WRITE_CAN_REPLACE.
> What Serg was referring to is the fact that the flag is set on the
> slave only if slave-exec-mode=idempotent *or NDB engine is used*,
> which essentially means that the storage engine have decided to
> execute as if slave-exec-mode=idempotent; ideally, any storage engine
> should be allowed to make such a request, which means that it either
> have to provide a table flag or a handlerton flag.
>> If the write succeeds, the binary log
>> takes note that an insert with HA_EXTRA_WRITE_CAN_REPLACE has taken
> Unfortunately, the binary log does not note what flags was used for
> replication, and this is what could be considered a bug. On the slave,
> there is no notion of REPLACE vs. INSERT: the only data that exists is
> write, delete, or update a row.
>> Then, on the slave, an insert is replayed with this flag set.
>> If the insert fails with a duplicate key error, then the slave should
>> call an update.
>> This is how I assumed it would work, and I how I think it should work
>> (I would be happy to hear other opinions).
> Yes, that is an alternative.
>> Also, other questions:
>> - What does the IDEMPOTENT flag mean in this context?
> That some errors are ignored, such as duplicate key errors for write
> or update and if a row is not found when executing a delete row.
>> - What happens if NDB is the master and slave? Does NDB use the flag
>> to optimize inserts on the master?
> No, but to replicate updates as "replace" (not a real replace, more
> similar to an update). However, this has proven to cause some
> problems, so it is not certain that the behavior will be supported in
> the future.
>> - What about if NDB is the master and MyISAM is the slave? How does
>> the master behave?
> NDB injects the rows itself into the binary log using a separate
> thread. Each epoch is written to the binary log as a single transaction.
> Just my few cents,
> Mats Kindahl
>> On Tue, May 11, 2010 at 4:55 AM, Mats Kindahl <Mats.Kindahl@stripped>
>>> On 05/11/2010 10:30 AM, Sergei Golubchik wrote:
>>>> Hi, Mats!
>>>> On May 11, Mats Kindahl wrote:
>>>>> Hi Zardosht,
>>>>> The flag HA_EXTRA_WRITE_CAN_REPLACE only is set on the slave for
>>>>> rows when --slave-exec-mode is IDEMPOTENT *or when the NDB Cluster
>>>>> engine is used*. In other words, the NDB engine is hardcoded to
>>>>> use IDEMPOTENT.
>>>> Which is, arguably, a bug. The flag is not HA_EXTRA_IS_NDB_CLUSTER,
>>>> other engines may take use of it too.
>>> Agree on that, but we have no support for allowing the engine to give
>>> this information right now.
>>> Note that the HA_EXTRA flags are means for the server to communicate
>>> with the engine, not the other way around. This flag says "I'm executing
>>> a replace right now, so you can optimize for that if you like."
>>> Best wishes,
>>> Mats Kindahl
>>> Mats Kindahl
>>> Senior Software Engineer
>>> Database Technology Group
>>> Sun Microsystems
>>> MySQL Internals Mailing List
>>> For list archives: http://lists.mysql.com/internals
>>> To unsubscribe: http://lists.mysql.com/internals?unsub=1
> Mats Kindahl
> Senior Software Engineer
> Database Technology Group
> Sun Microsystems
> MySQL Internals Mailing List
> For list archives: http://lists.mysql.com/internals
> To unsubscribe: http://lists.mysql.com/internals?unsub=1