From: Zardosht Kasheff Date: March 2 2010 1:11pm Subject: Re: question on HA_EXTRA_IGNORE_DUP_KEY List-Archive: http://lists.mysql.com/internals/37759 Message-Id: <2f9663ba1003020511l5ac53ce3p151e4daa1864d194@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Are there other problems besides what has been listed? I ask because the performance advantage we can gain from this optimization is huge. Thanks -Zardosht On Tue, Mar 2, 2010 at 4:08 AM, Sergey Vojtovich wrote: > Hi Zardosht, > > thanks for this explanation. Sounds like a nice idea, but still it will > not work with all MySQL subsystems correctly. > > Regards, > Sergey > > On Fri, Feb 26, 2010 at 10:20:40AM -0500, Zardosht Kasheff wrote: >> Hello Sergey and Sergei, >> >> In handler::write_row, the row is written, while the uniqueness check >> is deferred to the background or to query time (whichever comes >> first). So, if the deferred uniqueness check finds a duplicate, the >> inserted row is thrown away. >> >> If the row should not be there due to a duplicate, the user will never >> see that row in a query. >> >> We do a similar thing with replace into. Only, instead of throwing >> away the new row, we overwrite the old row with the new row. >> >> -Zardosht >> >> On Fri, Feb 26, 2010 at 6:41 AM, Sergey Vojtovich wrote: >> > Hi Zardosht, >> > >> > On Thu, Feb 25, 2010 at 02:31:11PM -0500, Zardosht Kasheff wrote: >> >> Can you please elaborate on the problems? >> > * http://dev.mysql.com/doc/refman/5.1/en/trigger-syntax.html >> > =A0 >> > =A0 =A0An AFTER trigger is executed only if the BEFORE trigger (if any= ) >> > =A0 =A0and the row operation both execute successfully. >> > =A0 >> > >> > =A0Assume we have AFTER INSERT trigger, which gets executed for a row >> > =A0that was not actually written. It is rather confusing. >> > >> > * Written rows counter will count duplicate (=3Dthrown away) rows. Use= rs >> > =A0may rely on this counter. >> > >> > * As for the auto increment - I can't provide qualified answer atm. Bu= t >> > =A0have a strong feeling that it will not work as expected. >> > >> >> Normally, with Insert ignore, a disk seek is required =A0to verify >> >> uniquness of the element (and thus the return of an error). If I do >> >> not need to return an error, the disk seek is unnecessary. >> > Basically I'm more interested in what happens with a record, within >> > ::write_row(). If a duplicate record gets written actually and thrown >> > away later. Why the above seek is absolutely required? And how the >> > decision to write or throw away a record is done? >> > >> >> It is a performance improvement. >> >> >> >> I realize there are the following downsides: >> >> =A0- the estimate of the number of rows of the table may get inaccura= te >> >> =A0- the statement will not properly return the number of duplicates = found >> >> >> >> Are there other downsides? >> > Yes, there're: >> > * Rather minor, but when there was actually no row written by INSERT >> > =A0IGNORE, query cache for this table will get invalidated. >> > >> > * If we get success from an engine, row will get binlogged, assuming >> > =A0binlog-format=3Drow. Will likely fail on slave. >> > >> > * There're more potential side-effects, since this looks like =A0API >> > =A0violation. >> > >> > Regards, >> > Sergey >> > >> >> -Zardosht >> >> >> >> On Thu, Feb 25, 2010 at 1:55 PM, Sergey Vojtovich wrot= e: >> >> > Hi Zardosht, >> >> > >> >> > I believe this is rather a fragile solution. I can think about prob= lems >> >> > at least with counters, wrong auto_increment. Also triggers will ge= t >> >> > processed for this non-inserted record. >> >> > >> >> > Just out of curiosity, what kind of optimization you're trying to >> >> > achieve? >> >> > >> >> > Regards, >> >> > Sergey >> >> > >> >> > On Wed, Feb 24, 2010 at 01:20:12PM -0500, Zardosht Kasheff wrote: >> >> >> Ok, so if I understand this correctly, the following if clause: >> >> >> >> >> >> thd->lex->ignore && thd->lex->duplicates =3D=3D SQL_ERROR >> >> >> >> >> >> if it evaluates to true, I can assume that the user is doing "inse= rt ignore". >> >> >> >> >> >> How I plan to use this is as follows. Suppose I have a table, (a i= nt, >> >> >> b int, primary key (a)), that has the element (1,1) inserted. >> >> >> >> >> >> Given the following statements: >> >> >> insert ignore into table values (2,2), (1,2); >> >> >> select * from table; >> >> >> >> >> >> The expected result of the select * is to return (1,1),(2,2). The >> >> >> insert will make two calls to handler::write_row. For (2,2), >> >> >> handler::write_row will insert the element, and return success. Wi= th >> >> >> other storage engines, for (1,2), handler::write_row will insert >> >> >> nothing and return HA_ERR_FOUND_DUPP_KEY. What my change will do i= s >> >> >> still insert nothing, but instead of returning HA_ERR_FOUND_DUPP_K= EY, >> >> >> it will return success. >> >> >> >> >> >> I assume there is nothing wrong with this? >> >> >> >> >> >> Thanks >> >> >> -Zardosht >> >> >> >> >> >> On Wed, Feb 24, 2010 at 1:05 PM, Venu Kalyan wr= ote: >> >> >> > Hi ! >> >> >> > >> >> >> > On Tue, Feb 23, 2010 at 3:23 PM, Zardosht Kasheff wrote: >> >> >> >> But does this distinguish the commands "insert ignore" and "ins= ert on >> >> >> >> duplicate key update"? >> >> >> > >> >> >> > Actually its contrary and unknown behavior even though MySQL all= ows >> >> >> > the synax to use IGNORE and ON DUPLICATE KEY / REPLACE together = in >> >> >> > single statement, and it should in general through an error when= used >> >> >> > together as server does not which one to prefer. >> >> >> > >> >> >> > In this case (Ignore and ON DUPLICATE KEY together in single ins= ert) >> >> >> > ON DUPLICATE KEY takes precedence in the code due to the followi= ng >> >> >> > condition.. >> >> >> > >> >> >> > if (duplic !=3D DUP_ERROR || ignore) >> >> >> > =A0 =A0 =A0table->file->extra(HA_EXTRA_NO_IGNORE_DUP_KEY); >> >> >> > >> >> >> > For your case; =A0you can validate this using : >> >> >> > >> >> >> > if (thd->lex->ignore && thd->lex->duplicates =3D=3D SQL_ERROR) { >> >> >> > =A0// safe to ignore my dups >> >> >> > } >> >> >> > >> >> >> > if (thd->lex->ignore && thd->lex->duplicates !=3D SQL_ERROR) { >> >> >> > // used ignore together with IGNORE and ON KEY DUPLICATE / REPLA= CE >> >> >> > statements, I will decide what needs to be done here >> >> >> > } >> >> >> > >> >> >> > Hope this helps. >> >> >> > >> >> >> > Thanks >> >> >> > Venu Anuganti >> >> >> > >> >> >> >> >> >> >> >> On Tue, Feb 23, 2010 at 6:20 PM, Venu Kalyan = wrote: >> >> >> >>> You should look for both sql_command type and lex->ignore valu= e (both are >> >> >> >>> part of thd->lex->sql_command and thd->lex->ignore) to determi= ne if the >> >> >> >>> storage engine can safely return success on dup cases. But rem= ember, doing >> >> >> >>> so will result in wrong stats (number of rows inserted/updated= ) >> >> >> >>> >> >> >> >>> Apart from insert/replace, load data also uses the same logic. >> >> >> >>> >> >> >> >>> On Tue, Feb 23, 2010 at 12:07 PM, Zardosht Kasheff >> >> >> >>> wrote: >> >> >> >>>> >> >> >> >>>> Hello, >> >> >> >>>> >> >> >> >>>> Does anyone have any guidance here? >> >> >> >>>> >> >> >> >>>> Thanks >> >> >> >>>> -Zardosht >> >> >> >>>> >> >> >> >>>> On Wed, Feb 17, 2010 at 3:47 PM, Zardosht Kasheff >> >> >> >>>> wrote: >> >> >> >>>> > So it seems that just looking at this flag will not work, b= ecause >> >> >> >>>> > INSERT IGNORE and INSERT...ON DUPLICATE KEY UPDATE both set= this flag. >> >> >> >>>> > >> >> >> >>>> > I guess what I am wondering is the following, is there any = way for the >> >> >> >>>> > handler to know for sure that the user has called "insert i= gnore", and >> >> >> >>>> > if so, would it be ok for the handler to return success eve= n if >> >> >> >>>> > internally a duplicate key is found? >> >> >> >>>> > >> >> >> >>>> > Thanks >> >> >> >>>> > -Zardosht >> >> >> >>>> > >> >> >> >>>> > On Fri, Feb 12, 2010 at 5:08 PM, Zardosht Kasheff >> >> >> >>>> > wrote: >> >> >> >>>> >> Hello all, >> >> >> >>>> >> >> >> >> >>>> >> I am trying to implement an optimization in our storage en= gine if >> >> >> >>>> >> HA_EXTRA_IGNORE_DUP_KEY is set. Would it be ok if handler:= :write_row >> >> >> >>>> >> returns success if this flag is set, the sql statement is = NOT >> >> >> >>>> >> SQLCOM_REPLACE or SQLCOM_REPLACE_SELECT, and the row is no= t inserted >> >> >> >>>> >> if a duplicate is otherwise there. >> >> >> >>>> >> >> >> >> >>>> >> Basically, the question is if this flag is set, is it ok t= o not return >> >> >> >>>> >> HA_ERR_FOUND_DUPP_KEY in the cases where the statement is = not "replace >> >> >> >>>> >> into". >> >> >> >>>> >> >> >> >> >>>> >> Thanks >> >> >> >>>> >> -Zardosht >> >> >> >>>> >> >> >> >> >>>> > >> >> >> >>>> >> >> >> >>>> -- >> >> >> >>>> MySQL Internals Mailing List >> >> >> >>>> For list archives: http://lists.mysql.com/internals >> >> >> >>>> To unsubscribe: =A0 =A0http://lists.mysql.com/internals?unsub= =3Dmydb08@stripped >> >> >> >>>> >> >> >> >>> >> >> >> >>> >> >> >> >> >> >> >> > >> >> >> >> >> >> -- >> >> >> MySQL Internals Mailing List >> >> >> For list archives: http://lists.mysql.com/internals >> >> >> To unsubscribe: =A0 =A0http://lists.mysql.com/internals?unsub=3Dsv= oj@stripped >> >> >> >> >> > >> >> > -- >> >> > Sergey Vojtovich >> >> > MySQL AB, Software Engineer >> >> > Izhevsk, Russia, www.mysql.com >> >> > >> >> >> >> -- >> >> MySQL Internals Mailing List >> >> For list archives: http://lists.mysql.com/internals >> >> To unsubscribe: =A0 =A0http://lists.mysql.com/internals?unsub=3Dsvoj@= sun.com >> >> >> > >> > -- >> > Sergey Vojtovich >> > MySQL AB, Software Engineer >> > Izhevsk, Russia, www.mysql.com >> > > > -- > Sergey Vojtovich > MySQL AB, Software Engineer > Izhevsk, Russia, www.mysql.com >