List:Commits« Previous MessageNext Message »
From:Konstantin Osipov Date:July 25 2007 10:44pm
Subject:Re: bk commit into 5.1 tree (malff:1.2516) BUG#25422
View as plain text  
Hello Marc,

Please find my review inline.

* marc.alff@stripped <marc.alff@stripped> [07/07/20 06:03]:

> ChangeSet@stripped, 2007-07-19 19:43:15-06:00, malff@weblab.(none) +31 -0
>   WL#3984 (Revise locking of mysql.general_log and mysql.slow_log)
>   Bug#25422 (Hang with log tables)
>   Bug 29129 (Resetting general_log while the GLOBAL READ LOCK is set causes
>              a deadlock)
>   
>   NOTE TO REVIEWER:
>   There is still an opened issue with this patch, see mysql-tests/t/disabled.def
>   The test rpl_optimize dead locks, but it's unclear if this is because:
>   - a new bug was introduced by this patch
>   - an existing unrelated bug was exposed by this patch
>   To discuss during review, this point needs further investigation.
>   
>   ------
>   
>   Prior to this fix, the server would hang when performing concurrent
>   ALTER TABLE or TRUNCATE TABLE statements against the LOG TABLES,
>   which are mysql.general_log and mysql.slow_log.
>   
>   The root cause traces to the following code:
>   in sql_base.cc, open_table()
>     if (table->in_use != thd)
>     {
>       /* wait_for_condition will unlock LOCK_open for us */
>       wait_for_condition(thd, &LOCK_open, &COND_refresh);
>     }
>   The problem with this code is that the current implementation of the
>   LOGGER creates 'fake' THD objects, like
>   - Log_to_csv_event_handler::general_log_thd
>   - Log_to_csv_event_handler::slow_log_thd
>   which are not associated to a real thread running in the server,
>   so that waiting for these non-existing threads to release table locks
>   cause the dead lock.
>   
>   In general, the design of Log_to_csv_event_handler does not fit into the
>   general architecture of the server, so that the concept of general_log_thd
>   and slow_log_thd has to be abandoned:
>   - this implementation does not work with table locking
>   - it will not work with commands like SHOW PROCESSLIST
>   - having the log tables always opened does not integrate well with DDL
>   operations / FLUSH TABLES / SET GLOBAL READ_ONLY
>   
>   With this patch, the fundamental design of the LOGGER has been changed to:
>   - always open and close a log table when writing a log
>   - remove totally the usage of fake THD objects
>   - clarify how locking of log tables is implemented in general.

OK, below is a very relevant and useful information for the reviewer --
it's very nice that you catalogued all significant changes
performed in this patch. But pushed as is it can bloat 'bk
changes' history. 
Please move this part of the comment to the worklog task. In the main
changeset comment we should keep the description of the problem, the
principles of the solution, and incompatible changes, if any.

>   Changing the locking design for log tables had some impact in several areas.
>   The total change consist of:
>   
>   Part I: Technical changes
>   

<cut>

Just a remark, since you're going to be moving this out anyway,
there is no part II.

> +# check that FLUSH LOGS does not flush the log tables
>  #

I don't know why it should not, it perhaps should.
But it's OK to leave it as is in your patch, since it contains
a different significant change that we should focus on (so agree
with your change and the comment for it).

We can change this later.

> +# check that FLUSH TABLES does flush the log tables
>  #

OK, that's fine.

> +reap;
> +connection addconroot3;
> +reap;
> +connection addconroot4;
> +reap;
> +connection addconroot5;
> +reap;
> +connection addconroot6;
> +reap;
> +connection addconroot7;
> +reap;
> +connection addconroot8;
> +reap;
> +
> +connection default;
> +
> +drop procedure proc25422_truncate_slow;
> +drop procedure proc25422_truncate_general;
> +drop procedure proc25422_alter_slow;
> +drop procedure proc25422_alter_general;
> +

Please close the connections (addconrootX) here.

> +# make the output deterministic:
> +# the order used in SHOW OPEN TABLES
> +# is too much implementation dependent
> +--disable_ps_protocol
>  flush tables;
>  select Host, User from mysql.user limit 0;
>  select Host, Db from mysql.host limit 0;
>  show open tables from mysql;
> +--enable_ps_protocol

Have you considered disabling the logging temporarily 
instead, so that the general_log table is not present in the
output?

This is ps.test, it tests prepared statements, so we should
leave the test in place if we can.

> --- a/mysql-test/t/show_check.test	2007-06-16 05:14:06 -06:00
> +++ b/mysql-test/t/show_check.test	2007-07-19 19:43:10 -06:00
> @@ -402,6 +402,11 @@ drop table if exists t1;
>  CREATE TABLE txt1(a int);
>  CREATE TABLE tyt2(a int);
>  CREATE TABLE urkunde(a int);
> +
> +# make the output deterministic:
> +# the order used in SHOW OPEN TABLES
> +# is too much implementation dependent
> +--disable_ps_protocol

Same here.

> +
> +
> +int handler::ha_write_row_no_binlog(uchar *buf)
> +{
> +  return write_row(buf);
> +}
> +

Please write a doxygen comment here, some triviality is fine:

/** 
  Write a record to the engine bypassing row-level binary logging.
*/

This API extension should be fine for now, later, when
ha_write_row() does more on top of write_row() than just
binlogging, we will be in trouble. But that's a soluble problem,
so this change is fine.

I discussed it briefly with Sergey, he agrees.

> --- a/sql/lock.cc	2007-06-18 14:13:18 -06:00
> +++ b/sql/lock.cc	2007-07-19 19:43:10 -06:00
> @@ -117,10 +117,84 @@ MYSQL_LOCK *mysql_lock_tables(THD *thd, 
>    MYSQL_LOCK *sql_lock;
>    TABLE *write_lock_used;
>    int rc;
> +  uint i;
> +  bool must_honor_global_lock;
> +  uint system_count;
> +
>    DBUG_ENTER("mysql_lock_tables");
>  
>    *need_reopen= FALSE;
>  
> +  must_honor_global_lock= false;

With this change you:
 * add an additional loop over all the tables
 * check for a log table twice - first when processing the
   command, i.e. in mysql_alter_table, mysql_truncate_table, where
   you set the internal privilege, and the second time here, to check if
   the privilege is set and should be applied.

Please leave the check for the system table in get_lock_data, that
will save one loop, since, as you will see below, I suggest to get
rid of this loop completely.

> +  system_count= 0;
> +  for (i=0 ; i<count; i++)
> +  {
> +    TABLE *t= tables[i];
> +
> +    /* Protect against 'fake' partially initialized TABLE_SHARE */
> +    DBUG_ASSERT(t->s->table_category != TABLE_UNKNOWN_CATEGORY);
> +
> +    /*
> +      Table I/O to performance schema tables is performed
> +      only internally by the server implementation.
> +      When a user is requesting a lock, the following
> +      constraints are enforced:
> +    */
> +    if (t->s->require_write_privileges() &&
> +        !(thd->options & OPTION_INTERNAL_PRIVILEGE))
> +    {

This code leads to duplication of checks:
 * first time you check if it is a log table when you set the
   privilege. The privilege is set only in case the statement
   accesses a log table.
 * the second part of the check is performed here.

The internal privilege can be set based solely on the SQL command
code. Or, we can simply check for the SQL command code here, and
then we don't need to set the privilege at all.

In MySQL we traditionally do not protect against programmers that
violate internal invariants.
Our general architecture is data flow oriented: the
parser and semantic analysis pass prepared objects over to the
locking and the execution subsystem. The locking and execution
subsystem is not made to deal with arbitrary input -- if
misused, e.g. fed semantically invalid input, it blows up, and
not only in case of 'system' or 'performance/log' tables, but also
in case of pre-locking, views, and so on.
The advantage of this approach is a faster system. The
disadvantage is tight coupling, higher maintenance cost.
In any case you can not change this design that was adopted years
ago in a patch, if ever.
As far as I understand, the main reason to keep the checks in the
locking subsystem is a more modular design -- but this argument,
is, unfortunately, diminished by the current state of the art.

Let me try to re-iterate through the list of requirements and see
what alternative solutions to checking are available.

All commands that read-lock log tables (except explicit LOCK
TABLES <log_table> READ) should be supported:

CREATE TABLE LIKE <log table> CREATE TABLE ... SELECT <log_table>,
SHOW FIELDS FROM <log table>, SHOW CREATE TABLE <log table>,
SELECT FROM <log table>, INSERT INTO <t1> SELECT ....

They should be available "online", that is, when logging to tables
is "ON".

Additionally, the following DDL commands should be allowed, as
a mechanism for the system administrator to maintain
log tables:

RENAME, CREATE, DROP, TRUNCATE, ALTER.

Everything else can be prohibited.

Note, that ALTER, strictly speaking, is not necessary -- it can be
substituted with CREATE TABLE LIKE + RENAME, and this approach
should be preferred, since it doesn't keep the log tables locked
for a long time.
But ALTER still may be useful for ALTER TABLE <log_table> ADD/DROP
PARTITION.

RENAME and TRUNCATE have to be allowed online - they will be 
used for log rotation. The same is true for ALTER <partitioning
operation>. 

CREATE and DROP don't need to be available online.

I think we have agreed that this is a minimal set that cannot be
reduced. Except maybe ALTER, if we decide to not pursue the
partitioning capability in this version. The current
implementation also supports REPAIR, OPTIMIZE, ANALYZE - we can
consider adding them to this list, but this is not critical.

To process DDL statements, or, in other words, those statements
that require an exclusive lock, we currently do the
following:

 * acquire an exclusive lock on the logging subsystem
 * if the operation can be performed only when logs are disabled,
   check that this is the case. 
 * perform the operation 
 * release the lock.

With this in mind, I can count three sources of input that
influence the behavior of the server with regard to log tables:

 * whether the query on hand accesses a log table or not
 * whether logging is currently enabled or not
 * whether the SQL command on hand is allowed for log tables or
   not, or, by implication, whether the SQL command requires a
   write lock or not.

We can collect this input solely based on the parsed tree,
or, by querying the parsed tree in conjunction with querying the
data dictionary.

The input is needed to:
 * yield an error in case a combination is not supported
 * yield an error in case a combination is not supported online
 * temporarily acquire a lock on the logging subsystem if the
   operation can only be performed offline, to prevent sudden
   switches of logs to "ON" while the operation is in progress.

My strong preference is that, regardless of what is the final set
of supported operations, or the semantics of the performed checks, 
we analyze the environment and perform checks and actions only in
one place -- i.e.  either in mysql_lock_tables, or
in, e.g., mysql_execute_command.

Moreover, if we have to discard some operations in online mode, we
at least need the check for the current mode to be
performed before the tables are locked -- since there we also 
should acquire a lock on the logging subsystem for the duration of
the operation.

To me that means that, provided we decide to allow some statements in
offline mode only, all the checks should reside in
mysql_execute_command. 

But alternatively, we can choose to allow all operations online.

Consider this, with the new design of log tables locking, locking
the entire logging subsystem to perform a TRUNCATE or DROP is
unnecessary -- the existing isolation guarantees provided by
table-level locking should be satisfactory.

Since this is the case, I believe we should remove all the code
that detects that the operation is performed on the log table
and locks the logging subsystem, and let the operation proceed
fully online.

RENAME and DROP have some specificity: if a log table is dropped,
we'll get an error in open_performance_table on attempt to perform 
a write. In this case I suggest to automatically disable logging
to this table for subsequent statements.
We will also have to automatically disable logging-to-tables when
write to a log table fails, e.g. when a table is corrupted, so
this functionality is necessary anyway.

If we decide to choose always-online alternative, the check in
get_lock_data may look as follows:

 if (table->s->table_category == TABLE_CATEGORY_PERFORMANCE &&
     !(sql_command_flags[thd->sql_command] & ALLOWED_WITH_PERF_TABLE))
 {
   my_error(...);
   goto err;
 }

 This approach is both compact and robust, but only applicable if
we decide to support all operations online. Otherwise, I'd prefer
to implement the checks outside the locking subsystem.

In both cases the extra loop can and should be removed.

> +      /*
> +        A user should not be able to prevent writes,
> +        since this would be a DOS attack.
> +      */
> +      if (t->reginfo.lock_type == TL_READ_NO_INSERT)
> +      {
> +        my_error(ER_CANT_READ_LOCK_LOG_TABLE, MYF(0));
> +        DBUG_RETURN(0);
> +      }
> +      /*
> +        A user should not be able to insert arbitrary
> +        data into log tables.
> +      */
> +      if (t->reginfo.lock_type >= TL_READ_NO_INSERT)
> +      {
> +        my_error(ER_CANT_WRITE_LOCK_LOG_TABLE, MYF(0));
> +        DBUG_RETURN(0);
> +      }
> +      /*
> +        A user should not be able to hold a lock of any
> +        kind in the session, this would be a DOS attack.
> +      */
> +      if (thd->lex->sql_command == SQLCOM_LOCK_TABLES)
> +      {
> +        my_error(ER_CANT_READ_LOCK_LOG_TABLE, MYF(0));
> +        DBUG_RETURN(0);
> +      }

We don't have to return a different error message each time
someone misuses a log table. One error message and one check is
enough.

> +    }
> +
> +    if ((t->s->table_category == TABLE_CATEGORY_SYSTEM) &&
> +        (t->reginfo.lock_type >= TL_WRITE_ALLOW_WRITE))
> +    {
> +      system_count++;
> +    }
> +
> +    if (t->s->honor_global_locks())
> +    {
> +      must_honor_global_lock= true;
> +    }

Please use an existing mechanism to allow logging under the global
read lock - MYSQL_LOCK_IGNORE_GLOBAL_READ_LOCK flag passed to
mysql_lock_tables from open_performance_table. That would also
allow reuse of open_ltable in open_proc_table in future.

> diff -Nrup a/sql/log.cc b/sql/log.cc
> --- a/sql/log.cc	2007-06-15 11:36:14 -06:00
> +++ b/sql/log.cc	2007-07-19 19:43:10 -06:00
> @@ -57,6 +57,27 @@ static int binlog_commit(handlerton *hto
>  static int binlog_rollback(handlerton *hton, THD *thd, bool all);
>  static int binlog_prepare(handlerton *hton, THD *thd, bool all);
>  

Please provide a comment for this class, when it can be used.
Alternatively, you can move here the comment that explains what
errors may happen and why they should be silenced.

> +class Silence_log_table_errors : public Internal_error_handler
> +{
> +public:
> +  Silence_log_table_errors()
> +  {}
> +
> +  virtual ~Silence_log_table_errors() {}
> +
> +  virtual bool handle_error(uint sql_errno,
> +                            MYSQL_ERROR::enum_warning_level level,
> +                            THD *thd);
> +};
> +
> +bool
> +Silence_log_table_errors::handle_error(uint /* sql_errno */,
> +                                       MYSQL_ERROR::enum_warning_level /* level */,
> +                                       THD * /* thd */)
> +{
> +  return true;
> +}

Please stick to the coding style.

BTW, here is a syntax highlighting tip for Vim:

kostja@bodhi:~$ grep -r TRUE .vim
.vim/after/syntax/c.vim:syn keyword cConstant TRUE FALSE

> +
>  /* Check if a given table is opened log table */
>  int check_if_log_table(uint db_len, const char *db, uint table_name_len,
>                         const char *table_name, uint check_if_opened)
> @@ -200,211 +235,38 @@ int check_if_log_table(uint db_len, cons
>      if (table_name_len == 11 && !(lower_case_table_names ?
>                                    my_strcasecmp(system_charset_info,
>                                                  table_name, "general_log") :
> -                                  strcmp(table_name, "general_log")) &&
> -        (!check_if_opened || logger.is_log_table_enabled(QUERY_LOG_GENERAL)))
> -      return QUERY_LOG_GENERAL;
> -    else
> -      if (table_name_len == 8 && !(lower_case_table_names ?
> -        my_strcasecmp(system_charset_info, table_name, "slow_log") :
> -        strcmp(table_name, "slow_log")) &&
> -          (!check_if_opened ||logger.is_log_table_enabled(QUERY_LOG_SLOW)))
> +                                  strcmp(table_name, "general_log")))
> +    {
> +      if (!check_if_opened || logger.is_log_table_enabled(QUERY_LOG_GENERAL))
> +        return QUERY_LOG_GENERAL;
> +      return 0;
> +    }
> +
> +    if (table_name_len == 8 && !(lower_case_table_names ?
> +      my_strcasecmp(system_charset_info, table_name, "slow_log") :
> +      strcmp(table_name, "slow_log")))
> +    {
> +      if (!check_if_opened || logger.is_log_table_enabled(QUERY_LOG_SLOW))
>          return QUERY_LOG_SLOW;
> +      return 0;
> +    }

If we pursue the always-online approach, we can further simplify
the design of logs-to-tables and, in particular, remove flag
'check_if_opened' from LOGGER::check_if_log_table()

>  bool Log_to_csv_event_handler::
> -  log_general(time_t event_time, const char *user_host,
> +  log_general(THD *thd, time_t event_time, const char *user_host,
>                uint user_host_len, int thread_id,
>                const char *command_type, uint command_type_len,
>                const char *sql_text, uint sql_text_len,
>                CHARSET_INFO *client_cs)
>  {
> -  TABLE *table= general_log.table;
> +  TABLE_LIST table_list;
> +  TABLE *table;
> +  bool result= true;
> +  bool need_close= false;
> +  bool need_pop= false;
> +  bool need_rnd_end= false;

Please stick to the coding style.

> +  table->file->ha_write_row_no_binlog(table->record[0]);
> +  table->file->ha_rnd_end();
> +  table->file->ha_release_auto_increment();

I believe you should check for the return value of
ha_write_row_no_binlog() and disable logging-to-tables if it
fails. 

Please take a look at how this situation is handled in INSERT
statement code.

> +  tmp_disable_binlog(thd);
> +  table->file->ha_write_row_no_binlog(table->record[0]);
> +  reenable_binlog(thd);

Why do you do both, tmp_disable_binlog and ha_write_row_no_binlog?

> --- a/sql/mysql_priv.h	2007-06-21 13:02:10 -06:00
> +++ b/sql/mysql_priv.h	2007-07-19 19:43:10 -06:00
> @@ -355,6 +355,13 @@ MY_LOCALE *my_locale_by_number(uint numb
>  */
>  #define TMP_TABLE_FORCE_MYISAM          (ULL(1) << 32)
>  
> +/*
> +  Privileged thread, that can perform operations on
> +  privileged tables (as defined by require_write_privileges()).
> +  Currently, this flag is used to control locking / writing
> +  to the mysql.general_log and mysql.slow_log tables.
> +*/
> +#define OPTION_INTERNAL_PRIVILEGE (ULL(1) << 33) // THD, intern

Since this is used specifically to split the checks between two
subsystems, I believe this should be removed.

>        }
>      }
> +
> +    LEX_STRING db;
> +    LEX_STRING alias;
> +    db.str= table_list->db;
> +    db.length= strlen(table_list->db);
> +    alias.str= table_list->alias;
> +    alias.length= strlen(table_list->alias);
> +    TABLE_CATEGORY table_category= get_table_category(& db, & alias);
> +
>      /*
> -      No table in the locked tables list. In case of explicit LOCK TABLES
> -      this can happen if a user did not include the able into the list.
> -      In case of pre-locked mode locked tables list is generated automatically,
> -      so we may only end up here if the table did not exist when
> -      locked tables list was created.
> +      Since we don't want to allow explicit LOCK TABLES on performance schema
> +      tables, the code falls through and we open these tables from the cache,
> +      even under a LOCK TABLES. This does not cause a dead lock, because
> +      manipulating TABLE_CATEGORY_PERFORMANCE tables is very restricted.
>      */
> -    if (thd->prelocked_mode == PRELOCKED)
> -      my_error(ER_NO_SUCH_TABLE, MYF(0), table_list->db, table_list->alias);
> -    else
> -      my_error(ER_TABLE_NOT_LOCKED, MYF(0), alias);
> -    DBUG_RETURN(0);
> +    if (table_category != TABLE_CATEGORY_PERFORMANCE)
> +    {
> +      /*
> +        No table in the locked tables list. In case of explicit LOCK TABLES
> +        this can happen if a user did not include the able into the list.
> +        In case of pre-locked mode locked tables list is generated automatically,
> +        so we may only end up here if the table did not exist when
> +        locked tables list was created.
> +      */
> +
> +      if (thd->prelocked_mode == PRELOCKED)
> +        my_error(ER_NO_SUCH_TABLE, MYF(0), table_list->db,
> table_list->alias);
> +      else
> +        my_error(ER_TABLE_NOT_LOCKED, MYF(0), alias);
> +      DBUG_RETURN(0);
> +    }
>    }

thd->locked_tables should always be NULL here if you've done
reset_and_backup_open_tables_state, therefore this change is
unnecessary and the new code never works. Please remove.


> @@ -2484,7 +2500,7 @@ TABLE *open_table(THD *thd, TABLE_LIST *
>        c1: name lock t2; -- blocks
>        c2: open t1; -- blocks
>      */
> -    if (table->needs_reopen_or_name_lock() &&
> !table->s->log_table)
> +    if (table->needs_reopen_or_name_lock())
>      {
>        DBUG_PRINT("note",
>                   ("Found table '%s.%s' with different refresh version",
> @@ -2960,10 +2976,9 @@ void close_old_data_files(THD *thd, TABL
>    for (; table ; table=table->next)
>    {
>      /*
> -      Reopen marked for flush. But close log tables. They are flushed only
> -      explicitly on FLUSH LOGS
> +      Reopen marked for flush.
>      */
> -    if (table->needs_reopen_or_name_lock() &&
> !table->s->log_table)
> +    if (table->needs_reopen_or_name_lock())


All changes of this sort are excellent.

> @@ -4127,11 +4135,6 @@ int lock_tables(THD *thd, TABLE_LIST *ta
>      DBUG_ASSERT(thd->lock == 0);	// You must lock everything at once
>      TABLE **start,**ptr;
>      uint lock_flag= MYSQL_LOCK_NOTIFY_IF_NEED_REOPEN;
> -    
> -    /* Ignore GLOBAL READ LOCK and GLOBAL READ_ONLY if called from a logger */
> -    if (logger.is_privileged_thread(thd))
> -      lock_flag|= (MYSQL_LOCK_IGNORE_GLOBAL_READ_LOCK |
> -                   MYSQL_LOCK_IGNORE_GLOBAL_READ_ONLY);

It's a good idea to remove this code from lock_tables. But instead
of replacing it with a similar check in mysql_lock_tables,
the code should be moved to open_performance_table -- the only place
where we need a special rule to ignore the global read lock.

That will simplify open_proc_table/open_system_table as well,
since there we'll be able to start using open_ltable.

> +
> +TABLE *
> +open_performance_schema_table(THD *thd, TABLE_LIST *one_table,
> +                              Open_tables_state *backup)
> +{

Please add a doxygen comment.

> +  DBUG_ENTER("open_performance_schema_table");
> +
> +  thd->reset_n_backup_open_tables_state(backup);
> +
> +  DBUG_ASSERT((thd->options & OPTION_INTERNAL_PRIVILEGE) == 0);
> +  thd->options|= OPTION_INTERNAL_PRIVILEGE;

This should be gone.

> +  TABLE *table= open_ltable(thd, one_table, one_table->lock_type);
> +  if (table)
> +  {
> +    DBUG_ASSERT(table->s->table_category == TABLE_CATEGORY_PERFORMANCE);
>      table->use_all_columns();
> +    table->no_replicate= TRUE;
> +    table->s->cached_row_logging_check= 0;

Please add a comment what these three lines are here for.
>    }
>  
> +  thd->options&= ~OPTION_INTERNAL_PRIVILEGE;
>    DBUG_RETURN(table);
>  }
> +
> +void close_performance_schema_table(THD *thd, Open_tables_state *backup)
> +{

Missing doxygen comments.


> +  bool found_old_table;
> +
> +  if (thd->lock)
> +  {
> +    /*
> +      Note:
> +      We do not create explicitly a separate transaction for the
> +      performance table I/O, but borrow the current transaction.
> +      lock + unlock will autocommit the change done in the
> +      performance schema table: this is the expected result.
> +      The current transaction should not be affected by this code.
> +      TODO: Consider doing a backup of THD::transaction and THD::ha_data
> +      in Open_tables_state, to better isolate this.
> +    */
> +    mysql_unlock_tables(thd, thd->lock);
> +    thd->lock= 0;
> +  }

Why do you check for thd->lock? Isn't it always set here?
We cannot simply back up THD::transaction in Open_tables_state,
a transaction is tied to a thread, and storage
engines expect to find their data in THD::transaction and
THD::ha_data.

In other words, while I agree with outlining the problem in the
comment, I think the suggestion is incorrect.

> +
> +  safe_mutex_assert_not_owner(&LOCK_open);
> +
> +  pthread_mutex_lock(&LOCK_open);
> +
> +  mysql_ha_mark_tables_for_reopen(thd, thd->open_tables);

Please add a comment clarifying what this call is here for.

> +  found_old_table= false;
> +  while (thd->open_tables)
> +    found_old_table|= close_thread_table(thd, &thd->open_tables);
> +
> +  if (found_old_table)
> +    broadcast_refresh();
> +
> +  pthread_mutex_unlock(&LOCK_open);
> +
> +  thd->restore_backup_open_tables_state(backup);
> +}

OK.

> +bool mysql_truncate_impl(THD *thd, TABLE_LIST *table_list, bool dont_send_ok)
>  {
>    HA_CREATE_INFO create_info;
>    char path[FN_REFLEN];
>    TABLE *table;
>    bool error;
> -  uint closed_log_tables= 0, lock_logger= 0;
>    uint path_length;
> -  uint log_type;
> -  DBUG_ENTER("mysql_truncate");
> +  DBUG_ENTER("mysql_truncate_impl");
>  
>    bzero((char*) &create_info,sizeof(create_info));
>    /* If it is a temporary table, close and regenerate it */
> @@ -960,18 +958,6 @@ bool mysql_truncate(THD *thd, TABLE_LIST
>        DBUG_RETURN(TRUE);
>    }
>  
> -  log_type= check_if_log_table(table_list->db_length, table_list->db,
> -                               table_list->table_name_length,
> -                               table_list->table_name, 1);
> -  /* close log tables in use */
> -  if (log_type)
> -  {
> -    lock_logger= 1;
> -    logger.lock();
> -    logger.close_log_table(log_type, FALSE);
> -    closed_log_tables= closed_log_tables | log_type;
> -  }
> -
>    // Remove the .frm extension AIX 5.2 64-bit compiler bug (BUG#16155): this
>    // crashes, replacement works.  *(path + path_length - reg_ext_length)=
>    // '\0';
> @@ -997,14 +983,6 @@ end:
>      VOID(pthread_mutex_lock(&LOCK_open));
>      unlock_table_name(thd, table_list);
>      VOID(pthread_mutex_unlock(&LOCK_open));
> -
> -    if (opt_slow_log && (closed_log_tables & QUERY_LOG_SLOW))
> -      logger.reopen_log_table(QUERY_LOG_SLOW);
> -
> -    if (opt_log && (closed_log_tables & QUERY_LOG_GENERAL))
> -      logger.reopen_log_table(QUERY_LOG_GENERAL);
> -    if (lock_logger)
> -      logger.unlock();
>    }
>    else if (error)
>    {
> @@ -1030,3 +1008,29 @@ trunc_by_del:
>    thd->current_stmt_binlog_row_based= save_binlog_row_based;
>    DBUG_RETURN(error);
>  }
> +
> +
> +bool mysql_truncate(THD *thd, TABLE_LIST *table_list, bool dont_send_ok)
> +{
> +  uint log_type;
> +  bool result;
> +
> +  log_type= check_if_log_table(table_list->db_length, table_list->db,
> +                               table_list->table_name_length,
> +                               table_list->table_name, 0);
> +  if (log_type)
> +  {
> +    DBUG_ASSERT((thd->options & OPTION_INTERNAL_PRIVILEGE) == 0);
> +    thd->options|= OPTION_INTERNAL_PRIVILEGE;
> +  }
> +
> +  result= mysql_truncate_impl(thd, table_list, dont_send_ok);
> +
> +  if (log_type)
> +  {
> +    thd->options&= ~OPTION_INTERNAL_PRIVILEGE;
> +  }
> +
> +  return result;
> +}

Instead of juggling with thd->options, please add a flag to
sql_command_flags 'ALLOWED_WITH_PERF_TABLE'.

This change, as well as the change in mysql_alter_table, won't be
needed then.

> -  /* enable logging back if needed */
> -  if (disable_logs)
> -  {
> -    if (logger.reopen_log_tables())
> -      error= TRUE;
> -    logger.unlock();
> -  }

Good.

> --- a/sql/sql_table.cc	2007-06-19 05:01:05 -06:00
> +++ b/sql/sql_table.cc	2007-07-19 19:43:11 -06:00
> @@ -1565,7 +1565,7 @@ int mysql_rm_table_part2(THD *thd, TABLE
>        table->db_type= share->db_type();
>  
>      /* Disable drop of enabled log tables */
> -    if (share && share->log_table &&
> +    if (share && (share->table_category == TABLE_CATEGORY_PERFORMANCE)
> &&
>          check_if_log_table(table->db_length, table->db,
>                             table->table_name_length, table->table_name, 1))

This branch is not needed if we allow to drop the log table when
the logs are online.

> +++ b/sql/table.cc	2007-07-19 19:43:11 -06:00
> @@ -21,6 +21,18 @@
>  #include <m_ctype.h>
>  #include "md5.h"
>  
> +/* INFORMATION_SCHEMA name */
> +LEX_STRING INFORMATION_SCHEMA_NAME= {C_STRING_WITH_LEN("information_schema")};
> +
> +/* MYSQL_SCHEMA name */
> +LEX_STRING MYSQL_SCHEMA_NAME= {C_STRING_WITH_LEN("mysql")};
> +
> +/* GENERAL_LOG name */
> +LEX_STRING GENERAL_LOG_NAME= {C_STRING_WITH_LEN("general_log")};
> +
> +/* SLOW_LOG name */
> +LEX_STRING SLOW_LOG_NAME= {C_STRING_WITH_LEN("slow_log")};
> +

Good.
>  
> +TABLE_CATEGORY get_table_category(const LEX_STRING *db, const LEX_STRING *name)
> +{
> +  DBUG_ASSERT(db != NULL);
> +  DBUG_ASSERT(name != NULL);
> +
> +  if ((db->length == INFORMATION_SCHEMA_NAME.length) &&
> +      (my_strcasecmp(system_charset_info,
> +                    INFORMATION_SCHEMA_NAME.str,
> +                    db->str) == 0))
> +  {
> +    return TABLE_CATEGORY_INFORMATION;
> +  }
> +
> +  if ((db->length == MYSQL_SCHEMA_NAME.length) &&
> +      (my_strcasecmp(system_charset_info,
> +                    MYSQL_SCHEMA_NAME.str,
> +                    db->str) == 0))
> +  {
> +    if (is_system_table_name(name->str, name->length))
> +    {
> +      return TABLE_CATEGORY_SYSTEM;
> +    }
> +
> +    if ((name->length == GENERAL_LOG_NAME.length) &&
> +        (my_strcasecmp(system_charset_info,
> +                      GENERAL_LOG_NAME.str,
> +                      name->str) == 0))
> +    {
> +      return TABLE_CATEGORY_PERFORMANCE;
> +    }
> +
> +    if ((name->length == SLOW_LOG_NAME.length) &&
> +        (my_strcasecmp(system_charset_info,
> +                      SLOW_LOG_NAME.str,
> +                      name->str) == 0))
> +    {
> +      return TABLE_CATEGORY_PERFORMANCE;
> +    }
> +  }
> +
> +  return TABLE_CATEGORY_USER;
> +}

OK.

>    share->reclength = uint2korr((head+16));
>    if (*(head+26) == 1)
> -    share->system= 1;				/* one-record-database */
> +    share->one_record_table= 1;                     /* one-record-database */

As discussed on IRC, this name is present in EXPLAIN, please
restore the old name.

> +/**
> +  Category of table found in the table share.
> +*/
> +enum enum_table_category
> +{
> +  /**
> +    Unknown value.
> +  */
> +  TABLE_UNKNOWN_CATEGORY=0,
> +
> +  /**
> +    Temporary table.
> +    The table is visible only in the session.
> +    Therefore,
> +    - LOCK TABLE <t> FOR READ/WRITE
> +    - FLUSH TABLES WITH READ LOCK
> +    - SET GLOBAL READ_ONLY = ON
> +    do not apply to this table.
> +    Temporary tables are not part of the table cache.
> +  */
> +  TABLE_CATEGORY_TEMPORARY=1,

LOCK TABLE applies to temporary tables. See my patch for Bug#24918
and the comments there. Please update this comment.

> +
> +  /**
> +    User table.
> +    These tables do honor:
> +    - LOCK TABLE <t> FOR READ/WRITE
> +    - FLUSH TABLES WITH READ LOCK
> +    - SET GLOBAL READ_ONLY = ON
> +    User tables are cached in the table cache.
> +  */
> +  TABLE_CATEGORY_USER=2,

Our decision for global read only and flush tables with read lock
is currently not based solely on the "type" of a table. Quite
frankly, it's a new way to look at things.
Which is a fine way, but the comment should mention that READ_ONLY does
nothing on the slave or when the operation is performed by the
super user.


> +
> +  /**
> +    System table, maintained by the server.
> +    These tables do honor:
> +    - LOCK TABLE <t> FOR READ/WRITE
> +    - FLUSH TABLES WITH READ LOCK
> +    - SET GLOBAL READ_ONLY = ON
> +    Typically, writes to system tables are performed by
> +    the server implementation, not explicitly be a user.
> +    System tables are cached in the table cache.
> +  */
> +  TABLE_CATEGORY_SYSTEM=3,

OK.

> +
> +  /**
> +    Information schema tables.
> +    These tables are an interface provided by the system
> +    to inspect the system metadata.
> +    These tables do *not* honor:
> +    - LOCK TABLE <t> FOR READ/WRITE
> +    - FLUSH TABLES WITH READ LOCK
> +    - SET GLOBAL READ_ONLY = ON
> +    as there is no point in locking explicitely
> +    an INFORMATION_SCHEMA table.
> +    Nothing is directly written to information schema tables.
> +    Note that this value is not used currently,
> +    since information schema tables are not shared,
> +    but implemented as session specific temporary tables.
> +  */
> +  /*
> +    TODO: Fixing the performance issues of I_S will lead
> +    to I_S tables in the table cache, which should use
> +    this table type.
> +  */
> +  TABLE_CATEGORY_INFORMATION=4,
> +
> +  /**
> +    Performance schema tables.
> +    These tables are an interface provided by the system
> +    to inspect the system performance data.
> +    These tables do *not* honor:
> +    - LOCK TABLE <t> FOR READ/WRITE
> +    - FLUSH TABLES WITH READ LOCK
> +    - SET GLOBAL READ_ONLY = ON
> +    as there is no point in locking explicitely
> +    a PERFORMANCE_SCHEMA table.
> +    An example of PERFORMANCE_SCHEMA tables are:
> +    - mysql.slow_log
> +    - mysql.general_log,
> +    which *are* updated even when there is either
> +    a GLOBAL READ LOCK or a GLOBAL READ_ONLY in effect.
> +    User queries do not write directly to these tables
> +    (there are exceptions for log tables).
> +    The server implementation perform writes.
> +    Performance tables are cached in the table cache.
> +  */
> +  TABLE_CATEGORY_PERFORMANCE=5
> +};
> +typedef enum enum_table_category TABLE_CATEGORY;
> +
> +TABLE_CATEGORY get_table_category(const LEX_STRING *db,
> +                                  const LEX_STRING *name);
> +
>  /*
>    This structure is shared between different table objects. There is one
>    instance of table share per one table in the database.
> @@ -116,6 +210,9 @@ class Table_triggers_list;
>  typedef struct st_table_share
>  {
>    st_table_share() {}                    /* Remove gcc warning */
> +
> +  TABLE_CATEGORY table_category;
> +
>    /* hash of field names (contains pointers to elements of field array) */
>    HASH	name_hash;			/* hash of field names */
>    MEM_ROOT mem_root;
> @@ -209,7 +306,10 @@ typedef struct st_table_share
>    uint column_bitmap_size;
>    uchar frm_version;
>    bool null_field_first;
> -  bool system;                          /* Set if system table (one record) */
> +
> +  /** Set if system table (one record) */
> +  bool one_record_table;
> +
>    bool crypted;                         /* If .frm file is crypted */
>    bool db_low_byte_first;		/* Portable row format */
>    bool crashed;
> @@ -227,18 +327,6 @@ typedef struct st_table_share
>    */
>    int cached_row_logging_check;
>  
> -  /*
> -    TRUE if this is a system table like 'mysql.proc', which we want to be
> -    able to open and lock even when we already have some tables open and
> -    locked. To avoid deadlocks we have to put certain restrictions on
> -    locking of this table for writing. FALSE - otherwise.
> -  */

It's an important comment, I think it should be preserved in some
form.

> -  bool system_table;
> -  /*
> -    This flag is set for the log tables. Used during FLUSH instances to skip
> -    log tables, while closing tables (since logs must be always available)
> -  */
> -  bool log_table;

OK.
>  #ifdef WITH_PARTITION_STORAGE_ENGINE
>    bool auto_partitioned;
>    const char *partition_info;
> @@ -302,6 +390,16 @@ typedef struct st_table_share
>      set_table_cache_key(key_buff, key_length);
>    }
>  
> +  inline bool honor_global_locks()
> +  {
> +    return ((table_category == TABLE_CATEGORY_USER)
> +            || (table_category == TABLE_CATEGORY_SYSTEM));
> +  }
> +
> +  inline bool require_write_privileges()
> +  {
> +    return (table_category == TABLE_CATEGORY_PERFORMANCE);
> +  }

I believe we should use the existing mechanisms to ignore the
global read lock.
All this code should be gone then.

>  } TABLE_SHARE;
>  

One item that your patch does not address and should address, as part of
the new locking design, is holding the lock on the logging
subsystem while writing into a table. This lock can and should be
removed to provide performance and avoid extra possibilities for deadlocks.
This will also allow us to adopt the 'always-online' approach,
that I described above in the mail, if we choose to.

This is all I have for your patch.  Thank you so much for working
on this. I understand you perceive the question how to implement
the checks as controversial. Please focus then on implementing the
rest of the comments -- and we can decide what to do with checks
separately.

-- 
-- Konstantin Osipov              Software Developer, Moscow, Russia
-- MySQL AB, www.mysql.com   The best DATABASE COMPANY in the GALAXY
Thread
bk commit into 5.1 tree (malff:1.2516) BUG#25422marc.alff20 Jul
  • Re: bk commit into 5.1 tree (malff:1.2516) BUG#25422Konstantin Osipov25 Jul