Hi, Davi!
On Oct 16, Davi Arnaut wrote:
> 2707 Davi Arnaut 2008-10-16
> Bug#28323: Server crashed in xid cache operations
>
> The problem was that the server did not robustly handle a
> unilateral roll back issued by the Resource Manager (RM)
> due to a resource deadlock within the transaction branch.
> By not acknowledging the roll back, the server (TM) would
> eventually corrupt the XA transaction state and crash.
>
> The solution is to mark the transaction as rollback-only
> if the RM indicates that it rolled back its branch of the
> transaction.
> === modified file 'sql/sql_parse.cc'
> --- a/sql/sql_parse.cc 2008-10-02 11:57:52 +0000
> +++ b/sql/sql_parse.cc 2008-10-16 14:50:38 +0000
> @@ -90,9 +90,54 @@ const char *command_name[]={
> };
>
> const char *xa_state_names[]={
> - "NON-EXISTING", "ACTIVE", "IDLE", "PREPARED"
> + "NON-EXISTING", "ACTIVE", "IDLE", "PREPARED", "ROLLBACK ONLY"
> };
>
> +/**
> + Mark a XA transaction as rollback-only if the RM unilaterally
> + rolled back the transaction branch.
> +
> + @note If a rollback was requested by the RM, this function sets
> + the appropriate rollback error code and transits the state
> + to XA_ROLLBACK_ONLY.
please explain also the return values here.
> +*/
> +static bool xa_trans_rolled_back(THD *thd, XID_STATE *xid_state)
Do you need THD in here ?
> +{
> + if (xid_state->rm_error)
> + {
> + switch (xid_state->rm_error) {
> + case ER_LOCK_WAIT_TIMEOUT:
> + my_error(ER_XA_RBTIMEOUT, MYF(0));
> + break;
> + case ER_LOCK_DEADLOCK:
> + my_error(ER_XA_RBDEADLOCK, MYF(0));
> + break;
> + default:
> + my_error(ER_XA_RBROLLBACK, MYF(0));
> + }
> + xid_state->xa_state= XA_ROLLBACK_ONLY;
> + }
> +
> + return (xid_state->xa_state == XA_ROLLBACK_ONLY);
> +}
> +
> +/**
> + Rollback work done on behalf of at ransaction branch.
> +*/
> +static bool xa_trans_rollback(THD *thd)
> +{
> + bool status= test(ha_rollback(thd));
> +
> + thd->options&= ~(ulong) OPTION_BEGIN;
> + thd->transaction.all.modified_non_trans_table= FALSE;
> + thd->server_status&= ~SERVER_STATUS_IN_TRANS;
> + xid_cache_delete(&thd->transaction.xid_state);
> + thd->transaction.xid_state.xa_state= XA_NOTR;
> + thd->transaction.xid_state.rm_error= 0;
> +
> + return status;
> +}
> +
> #ifndef EMBEDDED_LIBRARY
> static bool do_command(THD *thd);
> #endif // EMBEDDED_LIBRARY
> @@ -5017,6 +5062,8 @@ create_sp_error:
> if (thd->transaction.xid_state.xa_state == XA_IDLE &&
> thd->lex->xa_opt == XA_RESUME)
> {
> + if (xa_trans_rolled_back(thd, &thd->transaction.xid_state))
> + break;
Move this check down, after the check for xid.eq(...).
I don't think you should return ER_XA_RBROLLBACK if someone did XA START
for some arbitrary, unrelated XID.
On the second thought... Can you reach XA_IDLE state at all with
non-zero xid.rm_error ? Doesn't xid.rm_error guarantee that the state
will be XA_ROLLBACK_ONLY ?
> if (! thd->transaction.xid_state.xid.eq(thd->lex->xid))
> {
> my_error(ER_XAER_NOTA, MYF(0));
> @@ -5074,6 +5122,8 @@ create_sp_error:
> my_error(ER_XAER_NOTA, MYF(0));
> break;
> }
> + if (xa_trans_rolled_back(thd, &thd->transaction.xid_state))
> + break;
That's in case SQLCOM_XA_END as far as I understand. Ok.
> thd->transaction.xid_state.xa_state=XA_IDLE;
> send_ok(thd);
> break;
> @@ -5105,6 +5155,11 @@ create_sp_error:
> XID_STATE *xs=xid_cache_search(thd->lex->xid);
This is SQLCOM_XA_COMMIT. Don't you need to add this if(xa_trans_rolled_back
also to case SQLCOM_XA_PREPARE ?
> if (!xs || xs->in_thd)
> my_error(ER_XAER_NOTA, MYF(0));
> + else if (xa_trans_rolled_back(thd, xs))
> + {
> + xa_trans_rollback(thd);
> + break;
> + }
Hmm, you're inside
if (!thd->transaction.xid_state.xid.eq(thd->lex->xid))
that is, for the case when one does XA COMMIT for another connection.
You cannot use xa_trans_rollback() here, you need to use
ha_commit_or_rollback_by_xid().
> else
> {
> ha_commit_or_rollback_by_xid(thd->lex->xid, 1);
> @@ -5157,6 +5217,11 @@ create_sp_error:
> XID_STATE *xs=xid_cache_search(thd->lex->xid);
> if (!xs || xs->in_thd)
> my_error(ER_XAER_NOTA, MYF(0));
> + else if (xa_trans_rolled_back(thd, xs))
> + {
> + xa_trans_rollback(thd);
> + break;
Same here about ha_commit_or_rollback_by_xid()
> + }
> else
> {
> ha_commit_or_rollback_by_xid(thd->lex->xid, 0);
Regards / Mit vielen Grüßen,
Sergei
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@stripped>
/ /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect
/_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028
<___/ Sonnenallee 1, 85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Häring