List:Commits« Previous MessageNext Message »
From:Guilhem Bichot Date:November 27 2007 2:49pm
Subject:Re: Review II of MyISAM backup [Re: bk commit into 5.2 tree
(guilhem:1.2611)]
View as plain text  
Hello,

On Wed, Nov 21, 2007 at 10:21:28PM +0100, Ingo Strüwing wrote:
> Hi Guilhem,
> 
> Thank you for adopting some of my ideas!

And thanks for spending the time to find and explain them.

> I do now agree with the design of the patch.

And if we replaced 'w' with 't' in that sentence, we would have a big
problem...

> Please see below for some small issues. Most of them are comments about
> comments.
> 
> Guilhem Bichot, 06.11.2007 20:56:
> ...
> > ChangeSet@stripped, 2007-11-06 20:56:05+01:00, guilhem@stripped +59 -0
> >   WL#866 - Online backup driver for MyISAM.
> ...
> >   include/my_global.h@stripped, 2007-11-06 20:55:59+01:00, guilhem@stripped
> +5 -3
> >     When testing a C++ file against "gcc -Wold-style-cast", these casts
> >     surfaced. static_cast_C_or_CPP means "a static cast which works in
> >     C and in C++, without warnings".
> 
> Or "... works in C and C PreProcessor". ;-)
> Seriously. Why a long name for such a commonly needed macro?
> IMHO, CAST(type) would do the trick. Upper case suggest it's a macro.
> Hence one can guess it's for compatibility as many macros are. I have
> never seen a dynamic_cast in MySQL, so I would also guess it's static.

mmmm how about STATIC_CAST? My reasons:
- there are static_cast, const_cast, reinterpret_cast, all used in
MySQL (and dynamic_cast, unused)
- if I introduce CAST(), some colleagues are maybe going to use it in
their new C++ code, it would be some sort of regression, because I
find it less explicit that static_cast (you have to remember that CAST
is like static_cast).

> >   sql/sql_bitmap.h@stripped, 2007-11-06 20:56:00+01:00, guilhem@stripped +2
> -2
> >     When mysqld starts I get:
> >     ==25833== Conditional jump or move depends on uninitialised value(s)
> >     ==25833==    at 0x8185A2C: Bitmap<64>::is_set(unsigned) const
> (sql_bitmap.h:189)
> ...
> >     this difference does not look normal; I add here an initialization in
> >     Bitmap<64>'s default constructor, which removes the problem. This
> >     problem is specific of 5.2 as ha_myisam::keys_with_parts appeared in 5.2.
> 
> As a reviewer I am thankful for the reasoning, but a file comment like
> this should not be in the final push.
> 
> IMHO a file comment should say _what_ was changed. The reason, _why_ the
> new code is required, is better explained in a code comment. A plain
> initialization doesn't require much reasoning.

I cannot put a comment in the C++ code: this initialization is
probably merely something forgotten (the other Bitmap class in this
file has it), there is nothing more to say.
But if one day one wonders why we need to init, or who added this
line, it may look into "bk grep" / "bk revtool" and find some info
about the problem.
I changed the file comment to:

Fix for valgrind error (unrelated to MyISAM online backup driver):
Bitmap<64> needs to initialize in its constructor, like the any-other-size
Bitmap does. Bug specific of 6.0.

> ...
> >   storage/myisam/mi_log.c@stripped, 2007-11-06 20:56:01+01:00,
> guilhem@stripped +768 -80
> ...
> >     If _myisam_log_command() uses small numbers (file descriptor,
> >     file offset buffer length) it stores them in few bytes; if they are big
> >     it stores them in more bytes; how many bytes are used is denoted
> >     by the highest bit of the 'command' (MI_LOG_BIG_NUMBERS).
> 
> This is something we could tune. And if we do, we must do it before the
> first push.
> 
> You have:
>   +  if (info->dfile >= UINT_MAX16 || filepos >= UINT_MAX32 ||
>   +      length >= UINT_MAX16)
>   +  {
>   +    buff[0]= ((uchar) command) | MI_LOG_BIG_NUMBERS;
> 
> I think dfile >= UINT_MAX16 applies quite seldom. But how do we estimate
> the probability of the other two conditions? Do they have about the same
> probability? It is just a feeling, but I feel that 'filepos' might be
> above 4G much more often that reclen > 64K. Our default data pointer
> size is 6 bytes, because the former default of 4 bytes was too small way
> too often. When writes happen to a table, I believe it will go to the
> highest positions with higher probability.
> 
> OTOH we do not have much time to tune this value well. But a quick poll
> of knowlageable people might be better than nothing.
> 
> Just to exclude any doubts what I mean: I want to tune the number of
> bytes to write for 'filepos' and/or 'length' in a "small" log header to
> keep the avarage header length as small as possible.
> 
> My vote is 5 bytes for filepos.

I asked on #support and they suggested different ideas:
- that we should use 3 / 8
- that we should use 4 / 8
- that we should use myisam_data_pointer_size / 8.
- that we should always use 8.

But I see myisam_data_pointer_size as a separate problem. It has to be
large enough to accomodate most customers' tables (because when those
tables grow and reach the limit they get a real problem, so the
default has to be large enough).
Here we can choose the default to be any value which provides the best
storage for most tables.

One Support person said that most tables are <4GB.
I like 4 bytes: if a table is >4GB then it can considered large and
making a slightly big log is ok, as the table is large anyway.
For 3 bytes, one Support person suggested that storing in 3 is less
CPU-efficient than storing in 4; though true in theory, it is not in
our case, as mi_int4store() has a similar un-optimized implementation
as mi_int3store(). If that is later optimized, 4 could then become
better.

How about 4 / 8 ?

> ...
> >   storage/myisam/myisamdef.h@stripped, 2007-11-06 20:56:01+01:00,
> guilhem@stripped +111 -18
> ...
> >     It is read and set with atomic operations: this provides the needed
> >     synchronization between the table-writer thread (if the table is being
> >     written now) (which is the reader of "physical_logging") and the thread
> >     turning physical logging on or off (which is the writer of
> >     "physical_logging").
> 
> This is very good information. It's a pity that the code comment
> contains just a small summary of it. Perhaps I can learn something
> again. How do you proceed when trying to understand foreign code? Do you
> search for the changeset that introduced it and hope it's explained
> there?

yes but also I hope for a big introduction comment in the file.

> And what helps you most when you brood about a complicated merge
> of foreign code?

I understand your point (though even the code comments don't help me
much, I'd rather ask the author on irc :)

> What I did up to now was to hope for good code comments and, if
> available, read worklog specifications (what the community cannot always
> do).

and WL specs can be out-of-date.

> During merges I am always thankful when I find comments like "I added",
> "I removed", ... Excessive explanations in file comments don't help much
> in this situation.

well file comments are shown when I do a merge with "bk resolve" and "f".
 
> When searching for a changeset relating to certain places in a file, a
> few buzzwords like function or variable names are sufficient too.
> 
> In summary, from my way of work it would be better to have terse
> comments in the file comments and exhaustive comments in the code. But
> perhaps there is a better way of work?

I see your point.
I will try to rework the distribution a bit when I make the commit
against 6.0.

> ...
> > -typedef int (*IO_CACHE_CALLBACK)(struct st_io_cache*);
> > +typedef int (*IO_CACHE_CALLBACK)(struct st_io_cache *, const uchar *, uint,
> > +                                 my_off_t);
> 
> >From the patch that extends this type, I can guess what the new
> parameters are for. But if I would look at the io_cache independently
> from the patch some time later, I probably would have a a hard time to
> guess the same.
> 
> Can you please give parameter names and perhaps a comment?

done:
/** Function called when certain events happen to an IO_CACHE */
typedef int (*IO_CACHE_CALLBACK)(struct st_io_cache *cache,
                                 const uchar *buffert, uint length,
                                 my_off_t filepos);


> ...
> > +      int ret= my_pwrite(info->file, Buffer, Count, pos,
> > +                         info->myflags | MY_NABP);
> > +      if (unlikely(ret))
> > +        set_hard_write_error(info);
> 
> Just a hint. At this place I remember that I often had to add comments
> at such places after doing coverage testing. Do you have coverage
> testing on your todo list?

I have asked Lars how he viewed coverage testing for this patch. He
said we'd add more tests in December.

> ...
> >        if (!append_cache)
> >        {
> > +        /*
> > +          This post_write is really POST-write; callers depend on this! So
> > +          always call it after writing to the file, not before.
> > +        */
> 
> The comment is correct. However you call post_write() at other places
> too, without commenting it there. I suggest either to comment the same
> there too, or wipe it out completely. The name "post_write" is clear
> enough, I think. Or did you want to say something that I missed (the
> word "really" makes me suspect I could have missed something here).

I meant that if post_write was called before doing the write, the
physical logging algorithm would break (you remember, we must check
physical_logging_state *after* doing the write).

I removed that comment; I put:
      /*
        this is a post_write: physical_logging_state has to be checked after
        doing the table write (see mi_log_start_physical()).
      */
      info->rec_cache.post_write= log_flushed_write_cache_physical;

in mi_extra.c, and underlined "after" in my_sys.h:
  /** Called _after_ writing to disk; not honoured by SEQ_READ_APPEND */
  IO_CACHE_CALLBACK post_write;


> ...
> > -/*
> > -  These functions handle keyblock cacheing for ISAM and MyISAM tables.
> ...
> > -  as such gives the maximum number of in-use blocks at any time.
> > +/**
> > +   @file
> > +   These functions handle keyblock cacheing for ISAM and MyISAM tables.
> ...
> > +   as such gives the maximum number of in-use blocks at any time.
> > +*/
> 
> You changed the indentation from 2 to 3. This violates our coding
> guidelines. This does also apply to many function comments, but I don't
> repeat this complaint ans more.

It's not me, it's my xemacs :)
Now I have a problem, as I wrote to dev-public last week-end: I use
the recommended xemacs setting, but it puts 3 spaces on the line
after /**. I have asked colleagues for help. Meanwhile, I fixed by
hand all these occurences of 3 spaces that I could find, I hope I
missed none.

> ...
> > +/**
> > +   Does a my_pwrite() to the file and then calls callback. Arguments are those
> ...
> > +     @retval !=0    write or callback failed
> > +*/
> > +static inline int key_cache_pwrite(int Filedes, const uchar *Buffer,
> 
> The coding guidelines say that there shall be an empty line between
> function comment and its definition.

fixed (hope I missed none).

> ...
> > +  int ret, physical_logging= mi_get_physical_logging_state(share);
> > +  DBUG_ASSERT(!physical_logging ||
> > +              my_seek(share->kfile, 0L, MY_SEEK_END, MYF(0)) ==
> new_length);
> > +  ret= my_chsize(share->kfile, new_length, 0, MYF(0));
> > +  if (physical_logging && mi_log_index_pages_physical)
> > +    myisam_log_chsize_kfile_physical(share, new_length);
> 
> Shouldn't you have an atomic read here? Doesn't storing the value in a
> local variable invalidate any attempt to assure that you can never read
> the wrong value at the right moment?

yes! thank you!
I have to test my_seek==new_length before doing my_chsize, and read
the physical logging's state after my_chsize, no other way. I changed
to:

static int chsize_kfile(MI_INFO *info)
{
  MYISAM_SHARE *share= info->s;
  my_off_t new_length= info->state->key_file_length;
  int ret;
#ifndef DBUG_OFF
  my_bool no_length_change=
    (my_seek(share->kfile, 0L, MY_SEEK_END, MYF(0)) == new_length);
#endif

  ret= my_chsize(share->kfile, new_length, 0, MYF(0));

  if (unlikely(mi_log_index_pages_physical &&
               mi_get_physical_logging_state(share)))
  {
    DBUG_ASSERT(no_length_change);
    myisam_log_chsize_kfile_physical(share, new_length);
  }

  return ret;


> ...
> > +   Function to display and apply a MyISAM logical or physical log to tables.
> > +
> > +   Prints what is in a MyISAM (logical or physical) log, optionally applies
> > +   the changes to tables (all tables or only a set specified on the command
> > +   line). Is used both by the standalone program myisamlog and by the restore
> > +   code of the MyISAM online backup driver.
> 
> Er? The command line? The function reads it from 'param', I think. For
> the myisamlog program it comes from the commandline. But for the server?

right. I removed most of this comment, as it duplicates what is said
in the function's comment.

> ...
> > +    command=(uint) head_ptr[0];
> > +    command-= (big_numbers= (command & MI_LOG_BIG_NUMBERS));
> > +    if (my_b_read(&cache, head, head_len[command][big_numbers] - 1))
> 
> Did you test with big records? I would assume that 'big_numbers' becomes
> 128 from the expression "command & MI_LOG_BIG_NUMBERS". And you subtract
> this number from 'command' which is even more an evidence for this fact.

yes, bug. I hadn't tested with big records when I submitted the patch,
but since, I did. And I just added a .test for that, which fails
without the fix below.

> I suggest to use head_len[command][big_numbers ? 1 : 0]. At all other
> places, the exact value of 'big_numbers' doesn't matter.

I changed to:
  command-= (big_numbers= (command & MI_LOG_BIG_NUMBERS));
  big_numbers= test(big_numbers);
  if (my_b_read(etc))

> ...
> > +   Both logs:
> > +   - are idempotent
> 
> Hm. You are using this term quite frequently. Unfortunately I don't
> exactly get the meaning. A vague guess might be that applying a log
> multiple times reproduces the same table contents again. But there might
> be more to it.

yes, it's that.
Idempotent comes from Maths, a function F is idempotent if
F o F = F
I put an explanation:

  Both logs:
  - are idempotent (if you apply such log to a table T, then applying it a
  second time has no effect).

> ...
> > +/** the log_type global variable is probably legacy, it's always 0 now */
> 
> I would use the word 'obsolete' here, but I'm not native English either.

changed to obsolete. And added "const", so that compiler's
optimizer can replace:
#define GETPID() (log_type == 1 ? (long) myisam_pid : (long) my_thread_dbug_id())
by
#define GETPID() ((long) my_thread_dbug_id())
somehow.

> >  static int log_type=0;
> 
> ...
> > +#ifndef HAVE_MYISAM_PHYSICAL_LOGGING
> > +  DBUG_ASSERT(0);
> > +  DBUG_RETURN(1);
> > +#endif
> 
> Wouldn't it be better to put the rest of the function in the #else
> branch? It might avoid a compiler warning about unused code.

Normally, this symbol is always defined to be 1 (in myisam.h),
one needs to change it by hand if it wants it undefined. So if there
is a compiler warning it means dev changed it by hand, dev can live
with a compiler warning in that case.
I prefer to keep it like above: it is localized; it is a trap for
something which should not happen (mi_log() functions shouldn't be
called if the symbol is defined, indeed the backup driver is then
disabled, see in ha_myisam.cc:
#if !defined(EMBEDDED_LIBRARY) && defined(HAVE_MYISAM_PHYSICAL_LOGGING)
  myisam_hton->get_backup_engine= myisam_backup_engine;
#endif).
If I put a #else I have to put a
#endif /*HAVE_MYISAM_PHYSICAL_LOGGING*/ far down and I find it
clutters the reading.

> > +
> > +  /* starting/stopping are complex operations so split in functions */
> > +  switch (action)
> >    {
> > -    error=my_close(myisam_log_file,MYF(0)) ? my_errno : 0 ;
> > -    myisam_log_file= -1;
> > +  case MI_LOG_ACTION_OPEN:
> > +    error= mi_log_start_physical(log_filename, tables);
> > +    break;
> > +  case MI_LOG_ACTION_CLOSE_CONSISTENT:
> > +  case MI_LOG_ACTION_CLOSE_INCONSISTENT:
> > +    error= mi_log_stop_physical(action);
> > +    break;
> > +  default:
> > +    DBUG_ASSERT(0);
> >    }
> >    DBUG_RETURN(error);
> >  }
> 
> ...
> > +    log         = &myisam_logical_log;
> > +  else
> > +  {
> > +    DBUG_ASSERT(type == MI_LOG_PHYSICAL);
> > +    log         = &myisam_physical_log;
> 
> Leftspace at assignment operator, twice.

fixed

> ...
> > +  if (likely(my_b_inited(log) != NULL))
> > +  {
> > +    if (!logical)
> > +    {
> ...
> > +    }
> > +    /*
> > +      Don't be fooled: physical logging does not work with --external-locking
> > +      (the physical log is private to the one mysqld). The my_lock() is
> > +      relevant only for the logical log.
> > +    */
> 
> ... and thus we could place an 'else' here? Save the overhead of a
> function call which isn't needed anyway?

ok.
And I also put the corresponding file-unlock inside the if(logical).
 
> > +    error=my_lock(log->file,F_WRLCK,0L,F_TO_EOF,MYF(MY_SEEK_NOT_DONE));
> 
> ...
> > +    VOID(my_b_write(log, buff, bufflen));
> > +    if (buffert)
> > +      VOID(my_b_write(log, buffert, length));
> 
> Sigh. If 'buff' would be called 'header', it would have saved me ~ 10
> seconds. ;-) But it's from old code, so feel free to ignore this sigh.

changed all "buff" in that function and similar ones, to "header".

> ...
> > +   All writes (my_write, my_pwrite, memcpy to mmap'ed area, my_chsize) to the
> > +   data or index file are done this way:
> > +   @code
> > +   {
> > +     write_to_data_or_index_file;
> > +     if ((atomic read of MYISAM_SHARE::physical_logging)==1)
> 
> It is just a comment. But wouldn't != 0 be more correct anyway?

changed to != 0
 
> > +       write log record to physical log;
> 
> ...
> > +  if (action == MI_LOG_ACTION_CLOSE_CONSISTENT)
> > +  {
> > +    for (list_item= myisam_open_list; list_item; list_item=
> list_item->next)
> > +    {
> > +      MI_INFO *info= (MI_INFO*)list_item->data;
> > +      MYISAM_SHARE *share= info->s;
> > +      /*
> > +        Setting of the variable below always happens under THR_LOCK_myisam,
> > +        which we have here, so we don't need atomic operations to read here.
> > +      */
> > +      if (!info->s->physical_logging)
> 
> You could use 'share' here.

fixed

> > +        continue;
> > +      /*
> > +        Must take intern_lock, at least because key cache isn't safe if two
> > +        calls to flush_key_blocks_int() run concurrently on the same file.
> > +      */
> ...
> > +      pthread_mutex_unlock(&share->intern_lock);
> > +    } /* ... for (list_item=...) */
> > +  } /* ... if (action == MI_LOG_ACTION_CLOSE_CONSISTENT) */
> 
> This loop is correct and necessary IMHO. We should not change the patch
> here for the first version. But fo a later optimization we might
> consider to repeat the same loop before taking THR_LOCK_myisam. This
> could flush most of the data without blocking all tables. The above loop
> could then run faster and thus shorten the big lock. Of course this
> depends on how we estimate the "traffic" on the tables during the backup.

This is a loop over all tables, so it needs to walk the list of all
tables, which is protected by THR_LOCK_myisam...
In Maria, when I need to flush all tables, I have indeed worked around
that: in a first loop under THR_LOCK_maria I collect a list of
MARIA_SHARE*, in a second loop without THR_LOCK_maria I call
flush_pagecache_blocks_int()+ma_state_info_write() on those shares. To
make sure that those shares don't go away (my_free()) while I'm
looking at them, I put a "pin" on them in the first loop, a pin which
says "even if you close this table please don't my_free() the share",
and if a concurrent maria_close() closes this table, it notices the
pin, changes the pin to "please free this share", and then I free it
in the second loop.
But I'm still unsure of the above new code. So I'm a bit reluctant to
copy it to MyISAM now.
So I agree with your suggestion to leave it in first version, and
consider optimizing later. I put a note:

  /**
    Go through all open MyISAM tables.
    @todo consider an algorithm which would not keep THR_LOCK_myisam for the
    time of flushing all these tables' indices; we could do a first loop with
    THR_LOCK_myisam to collect shares and "pin" them; then a second loop
    without THR_LOCK_myisam, flushing and unpinning them.
  */
  for (list_item= myisam_open_list; list_item; list_item= list_item->next)


By the way, the similar loop where we flush_key_cache() during
mi_log_start_physical() is not needed, as log_key_cache_flush_physical
is registered with the table even from before the backup's start
(thus, any flush will reach the log, even though the block was put in
the key cache before backup started; and in mi_log_stop_physical() we
force a flush so we get all late blocks in the log).

> ...
> >        if (info->opt_flag & READ_CACHE_USED)
> >        {
> > +        /* QQ Why do we flush a READ_CACHE? it's a no-op */
> 
> I don't know either.

I left the comment in place then, for posterity.

> >  	if (flush_io_cache(&info->rec_cache))
> 
> ...
> > +   Here is how the MyISAM online backup works.
> > +   It is online because we dirtily copy the data and index files,
> > +   and the tables maintain a physical idempotent log of changes done to them
> > +   during the copy process, applying this log to the dirty copy yields a clean
> > +   table corresponding to how the original table was when logging ended.
> > +
> > +   A condition for this to work is that any update done to a table after the
> > +   copy process started must be present in the log. So if an update is
> > +   proceeding, we must wait for it to go to the log before starting the copy
> > +   process.
> 
> Tha last sentence doesn't make much sense to me. Things that happen to
> the table "before starting the copy process" do _not_ need to be in the
> log if I understand correctly.
> 
> Only after finishing with an update we must check if the copy process
> started already. Then it could be that not all of the update is in the
> copy. Hence we must log it then.
> 
> Or did you mean "if an update is  proceeding, we must wait for it to go
> to the log before ending the backup process" ?

The last sentence is indeed wrong, it comes from a previous version of
the patch. I have removed it, the paragraph now reads:

  A condition for this to work is that any update done to a table after the
  copy process started must be present in the log. See the comment of
  mi_log_start_physical() for how this is ensured.

because that function's comment has all the details about atomic read.
 
> ...
> > +retry:
> > +  pthread_mutex_lock(&THR_LOCK_myisam);
> > +  if (lock_state != LOCK_ERROR) // thread not already dead, kill it
> > +  {
> > +    /*
> > +      If the locking thread had no time to create THD (very unlikely), wait
> > +      for it.
> > +    */
> > +    if (unlikely(!lock_thd))
> > +    {
> > +      pthread_mutex_unlock(&THR_LOCK_myisam);
> > +      sleep(1);
> > +      goto retry;
> 
> So either the thread must be created or lock_state become LOCK_ERROR. If
> there is any chance that both do not happen, then it could be a last
> resort to escape the loop with thd->killed.

Rafal suggested the same, here's what I replied:

<quote>
There was a bug, the test should be:
  /* If thread started and not already dead, kill it */
  if ((lock_state != LOCK_NOT_STARTED) & (lock_state != LOCK_ERROR))

Indeed, if an error happens early (for example in Backup::begin());
the locking thread has not been started and will not be, we should not
wait for it to exist.

There was another bug: prelock() was:
  lock_state= LOCK_IN_PROGRESS;
  {
    pthread_t th;
    if (pthread_create(&th, &connection_attrib,
                       separate_thread_for_locking, this))
    {
      SET_STATE_TO_ERROR_AND_DBUG_RETURN;
    }
  }

But if pthread_create() fails, again we should not wait for the thread
to exist; this is fixed by putting
      lock_state= LOCK_ERROR;
in the {} of the if(pthread_create())) above.

With these two fixes in place, I don't think we need to test "killed"
in the sleep(1) loop (I mean the master's thread THD::killed).
Indeed:
we have entered the if(), so lock_state is LOCK_IN_PROGRESS or
LOCK_ACQUIRED.
If it is LOCK_IN_PROGRESS, the locking thread has successfully been
created, its function is running. By looking at that function we can
see that it always comes to setting lock_thd to !=NULL or to setting
lock_state to LOCK_ERROR.
If it is LOCK_ACQUIRED, lock_thd is !=NULL.

Testing THD::killed in the sleep(1) loop would even be dangerous:
if one does a KILL on the master thread (which is the client thread
doing BACKUP DATABASE), and the loop looked like (I added a test after
"retry:"):

retry:
  if (our_thd->killed) return;
  pthread_mutex_lock(&THR_LOCK_myisam);
  /* If thread started and not already dead, kill it */
  if ((lock_state != LOCK_IN_PROGRESS) & (lock_state != LOCK_ERROR))
  {
    /*
      If the locking thread has not yet created THD (very unlikely), wait
      for it.
    */
    if (unlikely(lock_thd == NULL))
    {
      pthread_mutex_unlock(&THR_LOCK_myisam);
      sleep(1);
      goto retry;
    }

then we may see our_thd->killed be !=0, return without killing the
locking thread. Backup::free() may be soon called by the backup kernel
but that does not help, as again kill_locking_thread() would return
without killing.
In other words, we must always try to kill the locking thread if it
exists, and wait for it to die. No matter the time it takes, as long
as we know it will happen (and we know it will, per the analysis
above).. Leaving it in existence risks that it holds table locks
forever.

</quote>


> ...
> > +#if 0
> > +    /*
> > +      Of course driver shouldn't pollute thd's state like this, it's just for
> > +      debugging now.
> > +    */
> > +    thd->proc_info="MyISAM dump going to start";
> > +#endif
> 
> Could you please replace '0' by something?

I removed this block.

> ...
> > +/**
> > +   Launches a separate thread ("locking thread") which will lock
> > +   tables. Locking in a separate thread is needed to have a non-blocking
> > +   prelock() (given that thr_lock() is blocking).
> 
> Here I would appreciate to see a brief explanation why we do not need to
> wait here or a reference to another place like in your other email:

I added
  prelock() is indeed not
  allowed to block, or it would block the entire backup kernel (see "HOW THE
  BACKUP WORKS" at the start of this file).

> It is explained in the beginning of myisam_backup_driver.cc:
> 
>    Now the Backup::prelock() request comes.
>    To finish the backup, we need to synchronize (=read-lock) all tables
>    ...
>    thread, and when finally that thread has managed to get its locks, we
> stop
>    logging and reply backup::READY.
>    ...
> 
> ...
> > +  /*
> > +    Note: we are copying an index file of a table, which may have instances in
> > +    the MySQL table cache, so after restore it will show up as
> > +    "warning: 1 client is using or hasn't closed the table properly".
> > +    Maybe do a quick index update on the table at the end of restore to
> > +    remove this warning. But how to know if the problem pre-dates backup ?
> 
> @todo If this is a problem, one could compare TABLE_SHARE::ref_count and
> MYISAM_SHARE::state.open_count at backup begin and log the difference.
> This may be more complicated though. Not all cached TABLE objects need
> to have open handlers.
> 
> But fixing this after restore seems more appropriate. When rebuilding
> indexes, we repair anyway. So we could do a quick repair even when
> copying indexes.

If there was a crash while the table was open, mysqld was restarted,
backup was taken, then it is important info that open_count>0 (it is a
hint that the user should check its table, backup or restore should
just not set open_count to 0).

On the other hand, if the open_count>0 merely comes from backup
copying an open table, we certainly must not leave it like this,
because, as I tested, this triggers, if running with --myisam-recover,
a table check (long operation) when one does:

create table t(a int);
insert into t values(1);
select * from t;
backup database test to "test.ba";
drop database test;
restore from "test.ba";
shut mysqld down and restart it with --myisam-recover
select * from t => causes table check as explained at
http://dev.mysql.com/doc/refman/5.1/en/myisam-start.html.
Doing a check after restore is certainly good DBA policy, but
sometimes time does not allow a check because the application must be
brought back online ASAP.

I asked Monty and implemented the solution he suggested:
- when in mi_open(), a client does the first open of a table (creates
MYISAM_SHARE), if open_count>0 (sign of a problem) it sets a flag
STATE_BAD_OPEN_COUNT in state.changed and writes state to disk.
- in backup, we copy the table, as usual
- in restore, if we see STATE_BAD_OPEN_COUNT we don't touch
open_count, otherwise we set it to 0 (because no flag means the table
is ok).
- a successful check or repair sets the flag down.

So, for second case above ("On the other hand"), restore resets
open_count to 0.
For the first case ("If there was a crash"), backup (as it opens
tables) sees open_count>0, sets the flag; restore leaves
open_count>0.

> > +result_t File_backup::get_data(Buffer &buf)
> > +{
> > +  int       res;
> > +  uint      howmuch= buf.size;
> > +  result_t  ret= backup::OK;
> > +
> > +  DBUG_ENTER("myisam_backup::File_backup::get_data");
> > +
> > +  buf.size= 1;
> > +  *buf.data= static_cast<uchar>(file_code);
> > +  howmuch--;
> > +  DBUG_ASSERT(howmuch > 0);
> 
> Too late? If buf.size was 0, it is now quite > 0.

thanks, fixed. Now I assert howmuch>=2 at the beginning.

> ...
> > +  // for when we have "end of stream" notifications:
> > +#if 0
> 
> Please replace '0' by something. E.g. TODO_HAVE_END_OF_STREAM

done

> ...
> > +  DBUG_ENTER("myisam_backup::Table_restore::post_restore");
> > +
> > +  if (!rebuild_index)
> > +    DBUG_RETURN(backup::OK); // nothing to do
> 
> @todo Perhaps a quick repair to get rid of the "open_count"?

I replied above to that one.

> ...
> > +  check_opt.init();
> > +  check_opt.flags|= T_VERY_SILENT | T_CALC_CHECKSUM | T_QUICK;
> 
> I suggest to consider T_REP_BY_SORT. And perhaps even
> thd->variables.myisam_repair_threads.

Table_restore::post_restore() calls
int ha_myisam::repair(THD* thd, HA_CHECK_OPT *check_opt)
which automatically adds T_REP_BY_SORT (because we don't ask for
T_EXTEND).
Then this calls
int ha_myisam::repair(THD *thd, MI_CHECK &param, bool do_optimize)
which does:

    if (mi_test_if_sort_rep(file,file->state->records,key_map,0) &&
	(local_testflag & T_REP_BY_SORT))
    {
      ...
      if (thd->variables.myisam_repair_threads>1)
      {
        error = mi_repair_parallel(&param, file, fixed_name,
            param.testflag & T_QUICK);
      }
      else
      {
        error = mi_repair_by_sort(&param, file, fixed_name,
            param.testflag & T_QUICK);
      }
    }
    else
    {
      error=  mi_repair(&param, file, fixed_name,
			param.testflag & T_QUICK);
    }


So, we can get to repair_parallel and repair_by_sort, and I verified
with --debug: if the table has an index, repair by sort is used by my
restore; and if I add --myisam-repair-threads=3 then parallel repair
is used. My restore seems to do the same thing as REPAIR TABLE QUICK:
- with REPAIR TABLE QUICK we get check_opt->flags=T_QUICK,
- with my restore we get
check_opt->flags=T_QUICK|T_CALC_CHECKSUM|T_VERY_SILENT; the last flag
is because restore shouldn't inform the user of non-critical problems
fixed by repair; the T_CALC_CHECKSUM I just removed, because
mi_repair*() add it anyway if needed (i.e. if the table has live
checksum).

I am committing a new patch in a few minutes.
The assertion in mi_reset() is still incorrect (it fires).

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /    Mr. Guilhem Bichot <guilhem@stripped>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Lead Software Engineer
/_/  /_/\_, /___/\___\_\___/   Bordeaux, France
       <___/   www.mysql.com   
Thread
bk commit into 5.2 tree (guilhem:1.2611)Guilhem Bichot6 Nov
  • Review II of MyISAM backup [Re: bk commit into 5.2 tree (guilhem:1.2611)]Ingo Strüwing21 Nov
    • Re: Review II of MyISAM backup [Re: bk commit into 5.2 tree(guilhem:1.2611)]Guilhem Bichot27 Nov
      • Re: Review II of MyISAM backup [Re: bk commit into 5.2 tree (guilhem:1.2611)]Ingo Strüwing27 Nov
  • Re: bk commit into 5.2 tree (guilhem:1.2611) WL#866Guilhem Bichot22 Nov