List:Commits« Previous MessageNext Message »
From:Chuck Bell Date:August 12 2008 1:22pm
Subject:RE: bzr commit into mysql-6.0-backup branch (cbell:2676) Bug#33414
View as plain text  
Hi Oystein,

My replies to your questions... 

> Maybe not the right time to raise this question, but why do 
> we need a separate timeout for the ddl blocker?  Why not use 
> the existing table_lock_wait_timeout for this?  Seems a bit 
> cumbersome to have to set several different timeout variables 
> in order to control when a query times out.

This timeout -- as Peter has so eloquently stated -- affects only DDL
operations that occur during backup. Hence the name change from
ddl_wait_timeout. These DDL operations are blocked by the DDL blocker so any
other timeout does not apply -- the DDL block occurs before tables are
locked so not even the table lock timeout will work. Since the DDL blocker
(currently) has no timeout feature, a DDL operation blocked by backup is
blocked for as long as backup runs. Thus, if a backup runs for a very long
time, the DDL operations are stuck. We needed a way to allow the DDL
operations to fail if backup runs for a long time (or is hung).

> I have not been able to understand the mechanism used for 
> variables, but since you are following the same pattern as 
> for existing variables, I guess it is OK.  However, it 
> puzzles me a bit that I am not able to find any code that 
> instantiate the new sys_var_backup_wait_timeout class nor 
> implements the declared sys_backup_wait_timeout function.  
 
Welcome to one of the most treacherous sections of the MySQL code! The
variable code is second to the optimizer for developer fatalities.

Ok, all jocularity aside...

Instantiation is in set_var.cc @ 230.

The function sys_backup_wait_timeout is in the same place. It is of type
sys_var_backup_wait_timeout class.

> In addition, the following note in set_var.cc concerns me:
>    @note
>      Be careful with var->save_result: sys_var::check() only updates
>      ulonglong_value; so other members of the union are 
> garbage then; to use
>      them you must first assign a value to them (in specific 
> ::check() for
>      example).
> 
> I see that sys_var_backup_wait_timeout::update accesses 
> save_result.ulong_value.  Is the above note relevant in that context?

I don't think so since we do not use it for anything else. Some variables
could be used for more than one type. We only use backup time out as ulong.

> Another concern is that the ddl_blocker seems very brittle 
> with respect to errors.  It seems to me that in case of 
> errors in DDL operations, it may happen that only one of the 
> start_DDL()/end_DDL() functions are called.  This is a more 
> general problem than related to this patch, but I think this 
> patch adds new risks in this area (see comments below).

Care was taken when the DDL blocker was written to ensure the pair of
start/end methods were called. Take a look in sql_parse where most of these
occur and you will see the end method is called in the event of an error.

It is also true that the ddl blocker is not intended to be long lived. Once
the runtime team gets their longer term plans in motion, the ddl blocker
will be made obsolete and removed. So we need to strive for a compromise of
what is good enough for backup and perfection for the server general
purpose.

> > === added file 'mysql-test/t/backup_timeout.test'
> > --- a/mysql-test/t/backup_timeout.test	1970-01-01 
> 00:00:00 +0000
> > +++ b/mysql-test/t/backup_timeout.test	2008-08-08 
> 14:07:00 +0000
> > @@ -0,0 +1,210 @@
> > +#
> > +# This test is for the DDL blocker timeout feature.
> > +# TODO : Add a native driver to the test when one becomes 
> available # 
> > +# Notes
> > +#   You must use a dedicated connection for getting and 
> releasing locks.
> > +# Do not issue a get_lock() or release_lock() in the same 
> connection 
> > +# (thread) as code that calls BACKUP_BREAKPOINT(). Using the same 
> > +connection # to get/release locks and run code that issues 
> > +BACKUP_BREAKPOINTs will result # in an assertion using 
> > +DEBUG_ASSERT(thd->ull == NULL) from debug_sync_point() # 
> in item_func.cc.
> > +#
> 
> I think it would be better if the above note was written with 
> the context of the operations performed in this test script.  
> People reading and writing test scripts do not necessarily 
> know how get_lock(), release_lock(), BACKUP_BREAKPOINT(), and
> debug_sync_point() maps to operations in this script.

These comments are obsolete. I will remove them.

> > +--source include/have_innodb.inc
> > +--source include/have_debug.inc
> > +--source include/not_embedded.inc
> 
> Don't you neeed have_debug_sync.inc too?

Yep. Will add.

> > +SET DEBUG_SYNC= 'reset';
> > +
> > +#
> > +# Remove backup files (if they exist) #
> > +
> > +--error 0,1
> > +--remove_file 
> $MYSQLTEST_VARDIR/master-data/bup_ddl_blocker_orig.bak;
> 
> Why delete this file?  It is not used in this test.

Obsolete. Will remove.

> > +--error 0,1
> > +--remove_file $MYSQLTEST_VARDIR/master-data/bup_ddl_blocker.bak;
> 
> This file is deleted before each backup.  Why delete it here too?

As a safety measure in case something fails. It ensures a complete cleanup.

> > +#
> > +# Connections used in this test
> > +#
> > +# con1       used to create data, load data, and run the backup 
> > +# con2-con5  used for DDL statements: 2 before backup and 
> 2 during backup
> > +# con6       used for setting and releasing breakpoints
> > +#
> 
> It is con5 that is used for setting and releasing breakpoints 
> in this test.

Ok, will corrct.

> > +connect (con1,localhost,root,,);
> > +connect (con2,localhost,root,,);
> > +connect (con3,localhost,root,,);
> > +connect (con4,localhost,root,,);
> > +connect (con5,localhost,root,,);
> > +connect (con6,localhost,root,,);
> > +
> > +connection con1;
> > +
> > +# Create data for this test and tailor it to the test.
> > +--source include/backup_ddl_create_data.inc
> > +
> > +--disable_warnings
> > +DROP TABLE IF EXISTS bup_ddl_blocker.t1, bup_ddl_blocker.t2,
> > +                     bup_ddl_blocker.t3, bup_ddl_blocker.t4; 
> > +--enable_warnings
> 
> What is the point in first running a script to create some 
> tables and then drop them right afterwards?

The script is the same for all of the test cases. It saves time and much
duplication. Some tests require all tables but this one doesn't need them at
all (a special case). If you are completely turned off by this I can copy
and paste but that will only add to confusion if someone needs to modify the
script.

> > +--echo Set ddl timeout to 1 second
> > +SET backup_wait_timeout = 1;
> > +SHOW VARIABLES LIKE 'backup_wait%';
> > +
> > +--echo con2: Try a ddl operation and watch it expire --error 
> > +ER_DDL_TIMEOUT CREATE TABLE bup_ddl_blocker.t2 (col_a CHAR(40)) 
> > +ENGINE=MEMORY;
> 
> Would it be possible to check that the timing is right?  That 
> is, that it times out after approx. 1 sec. and not 10?

I really don't think so. The timing aspect in mysql testing is really
difficult to predict. On some machines, it works correctly but on others the
timing can be so bad several minutes could go by before the processor
(simulated in VM) gets time from the clock.

> > +# Part C - test default behavior: timeout if 
> backup_wait_timeout = 0
> 
> What do you mean by default behavior?  Default for 
> backup_wait_timeout is 50.

Obsolete comment. At one time the default was 0. I will change this.

> >    pthread_mutex_lock(&THR_LOCK_DDL_is_blocked);
> >    thd->enter_cond(&COND_DDL_blocker, &THR_LOCK_DDL_is_blocked,
> >                    "DDL blocker: DDL is blocked");
> > -  while (DDL_blocked && !thd->DDL_exception)
> > -    pthread_cond_wait(&COND_DDL_blocker, &THR_LOCK_DDL_is_blocked);
> > -  start_DDL();
> > -  thd->exit_cond("DDL blocker: Ok to run DDL");
> > -  DEBUG_SYNC(thd, "after_start_ddl");
> > -  DBUG_RETURN(TRUE);
> > +  while (DDL_blocked && !thd->DDL_exception && (ret == 0)) 
> {
> > +    if (thd->backup_wait_timeout == 0)
> > +      ret = -1;
> > +    else
> > +      ret= pthread_cond_timedwait(&COND_DDL_blocker, 
> &THR_LOCK_DDL_is_blocked,
> > +                                  &ddl_timeout);  }  
> > + thd->exit_cond("DDL blocker: DDL is not blocked");  if 
> (ret == 0)  {
> > +    start_DDL();
> > +    DEBUG_SYNC(thd, "after_start_ddl");  }  else
> > +    my_error(ER_DDL_TIMEOUT, MYF(0), thd->query);
> > +
> 
> This assumes that all errors in pthread_cond_timedwait is timeouts.
> At least in theory, pthread_cond_timedwait may return other 
> errors than ETIMEDOUT, but I guess it is not a big problem 
> that any such errors are reported as a timeout.

Ok

> > === modified file 'sql/sql_parse.cc'
> > --- a/sql/sql_parse.cc	2008-07-14 14:56:49 +0000
> > +++ b/sql/sql_parse.cc	2008-08-08 14:07:00 +0000
> > @@ -2326,7 +2326,11 @@ mysql_execute_command(THD *thd)
> >        TABLE in the same way. That way we avoid that a new table is
> >        created during a gobal read lock.
> >      */
> > -    DDL_blocker->check_DDL_blocker(thd);
> > +    if (!DDL_blocker->check_DDL_blocker(thd))
> > +    {
> > +      res= 1;
> > +      goto end_with_restore_list;
> > +    }
> 
> The got leads to code where end_DDL is called.  However, if 
> DDL op timed out, start_DDL has not been called.  This can 
> lead to accounting problems wrt DDL_blocks counter.

No, in check_ddl_blocker(), it either returns success or it fails with a
timeout error. There are no other error conditions that could result in a
failed return.

See:
  if (ret -- 0)
  ...
  else
    my_error()

> >      if (!thd->locked_tables_mode &&
> >          !(need_start_waiting= 
> !wait_if_global_read_lock(thd, 0, 1)))
> >      {
> > @@ -2467,7 +2471,8 @@ end_with_restore_list:
> >      table without having to do a full rebuild.
> >    */
> >    {
> > -    DDL_blocker->check_DDL_blocker(thd);
> > +    if (!DDL_blocker->check_DDL_blocker(thd))
> > +      goto error;
> 
> This error handling is OK for the call to check_DDL_blocker, 
> but if later errors causes goto error, end_DDL will not be 
> called even if check_DDL_blocker called start_DDL.  (This is 
> not introduced by this patch.  Maybe another bug report 
> should be created for this problem.)

I see what you are talking about, but in this case there is an error and the
query is going to fail. I agree this is outside the scope of this bug
(BUG#33414). However, if feel strongly this is wrong then please open a new
bug on the problem with a test that shows how to produce the error
conditions.

Chuck

Thread
bzr commit into mysql-6.0-backup branch (cbell:2676) Bug#33414Chuck Bell8 Aug
  • Re: bzr commit into mysql-6.0-backup branch (cbell:2676) Bug#33414Jørgen Løland8 Aug
  • Re: bzr commit into mysql-6.0-backup branch (cbell:2676) Bug#33414Øystein Grøvlen12 Aug
    • RE: bzr commit into mysql-6.0-backup branch (cbell:2676) Bug#33414Chuck Bell12 Aug
      • Re: bzr commit into mysql-6.0-backup branch (cbell:2676) Bug#33414Peter Gulutzan12 Aug
      • Re: bzr commit into mysql-6.0-backup branch (cbell:2676) Bug#33414Øystein Grøvlen13 Aug
  • RE: bzr commit into mysql-6.0-backup branch (cbell:2676) Bug#33414Chuck Bell12 Aug