Hi Serg,
On 10/20/08 2:01 PM, Sergei Golubchik wrote:
> Hi, Davi!
>
> On Oct 16, Davi Arnaut wrote:
>> 2707 Davi Arnaut 2008-10-16
>> Bug#28323: Server crashed in xid cache operations
>>
>> The problem was that the server did not robustly handle a
>> unilateral roll back issued by the Resource Manager (RM)
>> due to a resource deadlock within the transaction branch.
>> By not acknowledging the roll back, the server (TM) would
>> eventually corrupt the XA transaction state and crash.
>>
>> The solution is to mark the transaction as rollback-only
>> if the RM indicates that it rolled back its branch of the
>> transaction.
>
>> === modified file 'sql/sql_parse.cc'
>> --- a/sql/sql_parse.cc 2008-10-02 11:57:52 +0000
>> +++ b/sql/sql_parse.cc 2008-10-16 14:50:38 +0000
>> @@ -90,9 +90,54 @@ const char *command_name[]={
>> };
>>
>> const char *xa_state_names[]={
>> - "NON-EXISTING", "ACTIVE", "IDLE", "PREPARED"
>> + "NON-EXISTING", "ACTIVE", "IDLE", "PREPARED", "ROLLBACK ONLY"
>> };
>>
>> +/**
>> + Mark a XA transaction as rollback-only if the RM unilaterally
>> + rolled back the transaction branch.
>> +
>> + @note If a rollback was requested by the RM, this function sets
>> + the appropriate rollback error code and transits the state
>> + to XA_ROLLBACK_ONLY.
>
> please explain also the return values here.
OK.
>> +*/
>> +static bool xa_trans_rolled_back(THD *thd, XID_STATE *xid_state)
>
> Do you need THD in here ?
No, it's a left over.
>> +{
>> + if (xid_state->rm_error)
>> + {
>> + switch (xid_state->rm_error) {
>> + case ER_LOCK_WAIT_TIMEOUT:
>> + my_error(ER_XA_RBTIMEOUT, MYF(0));
>> + break;
>> + case ER_LOCK_DEADLOCK:
>> + my_error(ER_XA_RBDEADLOCK, MYF(0));
>> + break;
>> + default:
>> + my_error(ER_XA_RBROLLBACK, MYF(0));
>> + }
>> + xid_state->xa_state= XA_ROLLBACK_ONLY;
>> + }
>> +
>> + return (xid_state->xa_state == XA_ROLLBACK_ONLY);
>> +}
>> +
>> +/**
>> + Rollback work done on behalf of at ransaction branch.
>> +*/
>> +static bool xa_trans_rollback(THD *thd)
>> +{
>> + bool status= test(ha_rollback(thd));
>> +
>> + thd->options&= ~(ulong) OPTION_BEGIN;
>> + thd->transaction.all.modified_non_trans_table= FALSE;
>> + thd->server_status&= ~SERVER_STATUS_IN_TRANS;
>> + xid_cache_delete(&thd->transaction.xid_state);
>> + thd->transaction.xid_state.xa_state= XA_NOTR;
>> + thd->transaction.xid_state.rm_error= 0;
>> +
>> + return status;
>> +}
>> +
>> #ifndef EMBEDDED_LIBRARY
>> static bool do_command(THD *thd);
>> #endif // EMBEDDED_LIBRARY
>> @@ -5017,6 +5062,8 @@ create_sp_error:
>> if (thd->transaction.xid_state.xa_state == XA_IDLE&&
>> thd->lex->xa_opt == XA_RESUME)
>> {
>> + if (xa_trans_rolled_back(thd,&thd->transaction.xid_state))
>> + break;
>
> Move this check down, after the check for xid.eq(...).
>
> I don't think you should return ER_XA_RBROLLBACK if someone did XA START
> for some arbitrary, unrelated XID.
>
> On the second thought... Can you reach XA_IDLE state at all with
> non-zero xid.rm_error ? Doesn't xid.rm_error guarantee that the state
> will be XA_ROLLBACK_ONLY ?
True. I guess we can drop the check...
>
>> if (! thd->transaction.xid_state.xid.eq(thd->lex->xid))
>> {
>> my_error(ER_XAER_NOTA, MYF(0));
>> @@ -5074,6 +5122,8 @@ create_sp_error:
>> my_error(ER_XAER_NOTA, MYF(0));
>> break;
>> }
>> + if (xa_trans_rolled_back(thd,&thd->transaction.xid_state))
>> + break;
>
> That's in case SQLCOM_XA_END as far as I understand. Ok.
>
>> thd->transaction.xid_state.xa_state=XA_IDLE;
>> send_ok(thd);
>> break;
>> @@ -5105,6 +5155,11 @@ create_sp_error:
>> XID_STATE *xs=xid_cache_search(thd->lex->xid);
>
> This is SQLCOM_XA_COMMIT. Don't you need to add this if(xa_trans_rolled_back
> also to case SQLCOM_XA_PREPARE ?
>
>> if (!xs || xs->in_thd)
>> my_error(ER_XAER_NOTA, MYF(0));
>> + else if (xa_trans_rolled_back(thd, xs))
>> + {
>> + xa_trans_rollback(thd);
>> + break;
>> + }
>
> Hmm, you're inside
>
> if (!thd->transaction.xid_state.xid.eq(thd->lex->xid))
>
> that is, for the case when one does XA COMMIT for another connection.
> You cannot use xa_trans_rollback() here, you need to use
> ha_commit_or_rollback_by_xid().
Right.
>> else
>> {
>> ha_commit_or_rollback_by_xid(thd->lex->xid, 1);
>> @@ -5157,6 +5217,11 @@ create_sp_error:
>> XID_STATE *xs=xid_cache_search(thd->lex->xid);
>> if (!xs || xs->in_thd)
>> my_error(ER_XAER_NOTA, MYF(0));
>> + else if (xa_trans_rolled_back(thd, xs))
>> + {
>> + xa_trans_rollback(thd);
>> + break;
>
> Same here about ha_commit_or_rollback_by_xid()
Actually, I should just eliminate this xa_trans_rollback as the usual
path will rollback anyway.
-- Davi