List:Maria Storage Engine« Previous MessageNext Message »
From:Sanja Byelkin Date:May 30 2008 6:53pm
Subject:Re: review of (sanja:2632) WL#4401
View as plain text  
On Fri, May 30, 2008 at 07:07:30PM +0300, Michael Widenius wrote:
> 
> Hi!
> 
> >>>>> "sanja" == sanja  <sanja@stripped> writes:
[skip]
> sanja> @@ -244,6 +250,10 @@
> sanja>    File directory_fd;
> sanja>    /* buffers for log writing */
> sanja>    struct st_translog_buffer buffers[TRANSLOG_BUFFERS_NO];
> sanja> +  /* Mask where 1 in position N mean that buffer N is not flushed */
> sanja> +  uint dirty_buffer_mask;
> 
> How many buffers do we need for max ?
> (Size of uint is not well defined)

We have 5 buffers, but compile time assert make us sure that we are
safe.

[skip]
> sanja> @@ -2044,7 +2068,6 @@
>  
> sanja>    if (new_file)
> sanja>    {
> sanja> -
> sanja>      /* move the horizon to the next file and its header page */
> sanja>      (*horizon)+= LSN_ONE_FILE;
> sanja>      (*horizon)= LSN_REPLACE_OFFSET(*horizon, TRANSLOG_PAGE_SIZE);
> sanja> @@ -2063,6 +2086,13 @@
> sanja>      translog_start_buffer(new_buffer, cursor, new_buffer_no);
> sanja>    }
> sanja>    log_descriptor.buffers[old_buffer_no].next_buffer_offset=
> new_buffer->offset;
> sanja> +  new_buffer->prev_last_lsn=
> sanja> +    ((log_descriptor.buffers[old_buffer_no].last_lsn != LSN_IMPOSSIBLE) ?
> sanja> +     log_descriptor.buffers[old_buffer_no].last_lsn :
> sanja> +     log_descriptor.buffers[old_buffer_no].prev_last_lsn);
> 
> You could introduce a new variable 'old_buffer' that is set to
> log_descriptor.buffers[old_buffer_no]. This would make the above lines
> a bit shorter and easier to read.
 
it will not work, because we will need to lock that buffer, and main
thing the buffer can be filled already with new data.

[skip]

> sanja> @@ -2390,7 +2421,6 @@
> sanja>    TRANSLOG_FILE *file= buffer->file;
> sanja>    uint8 ver= buffer->ver;
> sanja>    DBUG_ENTER("translog_buffer_flush");
> sanja> -  DBUG_ASSERT(buffer->file != NULL);
> sanja>    DBUG_PRINT("enter",
> sanja>               ("Buffer: #%u 0x%lx file: %d  offset: (%lu,0x%lx)  size:
> %lu",
> sanja>                (uint) buffer->buffer_no, (ulong) buffer,
> sanja> @@ -2399,6 +2429,9 @@
> sanja>                (ulong) buffer->size));
> sanja>    translog_buffer_lock_assert_owner(buffer);
>  
> sanja> +  if (buffer->file == NULL)
> sanja> +    goto out;
> sanja> +
> 
> Why do you need to unmark a buffer with file == NULL ?

Why no? we should be sure that we will not write buffer which is
written.

> Who could have marked this for flush ?

Usually thread which finished the buffer writes it to disk, the process
only delayed until the thread will unlock handler so flush process is
not main buffer writers usually.

> At the time file was set to NULL, wasn't the file already unmarked ?

it is just reference to the file which buffer used to write its content.

> If this is correct, please add a comment that answers the above.
> It would be nice to get rid of the above becasue:
> - It will be common that someone else flushes what you are going to
>   look at.
> - Taking an extra mutex + signal when not needed is expensive

Sorry I do not understand you.

[skip]
> sanja> +
> sanja> +  /*
> sanja> +    We will recheck information when will lock buffers one by
> sanja> +    one so we can use unprotected read here (this is just for
> sanja> +    speed up buffers processing)
> sanja> +  */
> sanja> +  dirty_buffer_mask= log_descriptor.dirty_buffer_mask;
> sanja> +  DBUG_PRINT("info", ("Dirty buffer mask: %lx  current buffer: %u",
> sanja> +                      (ulong) dirty_buffer_mask,
> sanja> +                      (uint) log_descriptor.bc.buffer_no));
> sanja> +  for (i= (log_descriptor.bc.buffer_no + 1) % TRANSLOG_BUFFERS_NO;
> sanja> +       i != log_descriptor.bc.buffer_no && !(dirty_buffer_mask
> & (1 << i));
> sanja> +       i= (i + 1) % TRANSLOG_BUFFERS_NO) {}
> sanja> +  start_buffer_no= i;
> 
> For the above operations to be fast, you should ensure that
> TRANSLOG_BUFFERS_NO is a power of 2, like 8 !

Currently it is 5 but increasing it to 8 will not have ani bad influence
except some memory wasting.

[skip]

> sanja> +    translog_unlock();
> sanja> +  }
> sanja> +  sent_to_disk= translog_get_sent_to_disk();
> sanja> +  if (cmp_translog_addr(lsn, sent_to_disk) > 0)
> sanja> +  {
> sanja> +
> sanja> +    DBUG_PRINT("info", ("Start buffer #: %u  last buffer #: %u",
> sanja> +                        (uint) start_buffer_no, (uint) last_buffer_no));
> sanja> +    last_buffer_no= (last_buffer_no + 1) % TRANSLOG_BUFFERS_NO;
> sanja> +    i= start_buffer_no;
> sanja> +    do
> sanja> +    {
> sanja> +      translog_buffer_lock(log_descriptor.buffers + i);
> sanja> +      DBUG_PRINT("info", ("Check buffer: 0x%lx  #: %u  "
> sanja> +                          "prev last LSN: (%lu,0x%lx)  "
> sanja> +                          "last LSN: (%lu,0x%lx)  status: %s",
> sanja> +                          (ulong)(log_descriptor.buffers + i),
> sanja> +                          (uint) i,
> sanja> +                         
> LSN_IN_PARTS(log_descriptor.buffers[i].prev_last_lsn),
> sanja> +                         
> LSN_IN_PARTS(log_descriptor.buffers[i].last_lsn),
> sanja> +                          (log_descriptor.buffers[i].file ?
> sanja> +                           "dirty" : "closed")));
> sanja> +      if (log_descriptor.buffers[i].prev_last_lsn <= lsn &&
> sanja> +          log_descriptor.buffers[i].file != NULL)
> sanja> +      {
> sanja> +        struct st_translog_buffer *buffer= log_descriptor.buffers + i;
> sanja> +        DBUG_ASSERT(flush_horizon <= buffer->offset +
> buffer->size);
> sanja> +        flush_horizon= buffer->offset + buffer->size;
> sanja> +        translog_buffer_flush(log_descriptor.buffers + i);
> sanja> +      }
> sanja> +      translog_buffer_unlock(log_descriptor.buffers + i);
> sanja> +      i= (i + 1) % TRANSLOG_BUFFERS_NO;
> sanja> +    } while (i != last_buffer_no);
> 
> Here you loop over start_buffer_no to last_buffer_no. What happens if
> start_buffer_no was wrong because you read the dirty buffers without a
> mutex and you read the value 0 when the real value should be 1?
> 
> In other words, how do you guarantee that all buffers that needs to be
> flushed are flushed?

Buffers can be filled only in circle in one direction.

1) read 1 instead 0 nothimng terrible - we just skipped it (somebody else
flushed the buffer we will recheck it when lock the buffer in any case
(it can be chenged even after our calculation)

2) read 0 instead of 1 even good for us. 1 can be appeared only when
we start new buffer and we ceternly are not interested in LSNs appeared
after we started flush process. But situation whan we read that appeared
1 of new buffer is rare (filleng 1 Mb bufer while we preparing for flush
has quite low possibiolity), and will need just to iterating through more
buffers.

> 
> <cut>
> 
> sanja>  out:
> sanja> -  pthread_mutex_unlock(&log_descriptor.log_flush_lock);
> sanja> +  pthread_mutex_lock(&log_descriptor.log_flush_lock);
> sanja> +  if (sent_to_disk != LSN_IMPOSSIBLE)
> sanja> +    log_descriptor.flushed= sent_to_disk;
> sanja> +  log_descriptor.flush_in_progress= 0;
> sanja> +  DBUG_PRINT("info", ("flush_in_progress is dropped"));
> sanja> +  pthread_cond_broadcast(&log_descriptor.log_flush_cond);
> sanja> +  pthread_mutex_unlock(&log_descriptor.log_flush_lock);\
> 
> Please move broadcast out of mutex (faster)
> 
> sanja> +++ b/storage/maria/unittest/ma_test_loghandler_multithread-t.c	2008-05-22
> 21:53:17 +0000
> <cut>
> 
> sanja> +static void *test_thread_flusher(void *arg)
> sanja> +{
> sanja> +  int param= *((int*) arg);
> sanja> +  int i;
> sanja> +
> sanja> +  my_thread_init();
> sanja> +
> sanja> +  for(i= 0; i < FLUSH_ITERATIONS; i++)
> sanja> +  {
> sanja> +    translog_flush(last_lsn);
> sanja> +    pthread_mutex_lock(&LOCK_thread_count);
> sanja> +    ok(1, "-- flush %d", param);
> sanja> +    pthread_mutex_unlock(&LOCK_thread_count);
> sanja> +  }
> 
> I don't see why you are testing by calling with same last_lsn many
> times, but I assume it doesn't make any harm.

yes. this variable is unprotected and go with some delay aftyer
appearing real LSN but our flush is resistance to incorrect LSNs and
delay is really what we have in real life. I also tested it with reading
and flushiong it up to "horizon" but think that this variant is better.

[skip]

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /    Mr. Oleksandr Byelkin <sanja@stripped>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Full-Time Developer
/_/  /_/\_, /___/\___\_\___/   Lugansk, Ukraine
       <___/   www.mysql.com
Thread
bzr commit into file:///home/bell/mysql/bzr/work-maria-newflush/ tree(sanja:2632) WL#4401sanja22 May
  • re: review of (sanja:2632) WL#4401Michael Widenius30 May
    • Re: review of (sanja:2632) WL#4401Sanja Byelkin30 May