List:Commits« Previous MessageNext Message »
From:Ingo Strüwing Date:September 30 2008 4:01pm
Subject:Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2693)
WL#4534
View as plain text  
Hi Rafal,

thank you for your comments.

Rafal Somla, 30.09.2008 14:54:
...
>   Actually the only callbacks which you
> need for reading are:
...
>   For all the other callbacks it is enough to provide some
> trivial implementations.

Yes, you mentioned it before. I just didn't have the full list. Anyway,
I put effort in those functions only, which are called during my tests.
The rest I will clean up, if necessary.

...
>   I
> use dynamic lists to store C++ object instances representing each item
> in the catalogue and I have arrays of pointers to these objects for fast
> indexing.

I seem to remember that you mentioned this before. But I forgot, as it
happened too early in my learning process. :)

>   This can be arranged as follows:
> 
> 1. When backup catalogue is read, store objects on dynamic lists as you
> do it now and count them.
> 2. After reading the catalogue, for example inside bcat_close()
> callback, allocate arrays of pointers of appropriate sizes and store
> indexes in these arrays.

Thanks for elaborating the good idea.

...
>>       - Why header.version is not filled with the library's verion.
> 
> This is a design flaw again. The library is written with the assumption
> that its user knows that he is reading correct version of the image.
> Thus the header should not have version member I think.

Right. This would relieve the confusion. :)
For a moment I wondered if the caller should fill this in. But I agree
that removing the element would be better.

OTOH, in an architecture that I personally would have preferred, the 10
bytes magic number and image version would be part of the image and we
would not have one library per version. Instead I would handle the
(expectedly) small differences in the image format within the few
affected places in the library functions. Then the version number in the
header would make real sense.

> 
>>       - Why snapshot data belong to two tables (in my backup image
>> example)
>>       while there is just one table.
> 
> Tables are numbered starting from 1. There is a special table number 0.
> Data marked as "table 0" is a common data which is not tied to any
> particular table. I'm not sure if native MyISAM driver sends any common
> data. If not, then having a data block for table 0 (hopefully empty)
> would indicate a flaw in backup kernel code writing table data.

In my test case, the data block for table 0 is 1025 bytes. I'll try to
investigate on it later.

> 
>>       - Why snapshot data is read in 512 + 15872 byte chunks instead
>> of 16K.
> 
> Not sure what do you mean. Does it mean that there are 2 low-level
> read() calls, one which returns 512 and the following which reads 15872
> bytes?

When enabling debug, I see

...
| >str_read
| | stream: want bytes: 512
| | stream: read bytes: 512
| <str_read
| >str_read
| | stream: want bytes: 15872
| | stream: read bytes: 15872
| <str_read
| >str_read
| | stream: want bytes: 512
| | stream: read bytes: 512
| <str_read
| >str_read
| | stream: want bytes: 15872
| | stream: read bytes: 15872
| <str_read
...

> 
> In any case, the library does not try to read data in 16K blocks since
> it assumes that OS is doing this in the background. What the library
> tries to do, is to keep as few data as possible inside internal buffers
> to avoid double buffering.

This does not exactly match my analyze of stream_v1_transport.c. There
seems to be an input buffer, which is filled with load_buffer() when
insufficient data is in the buffer.

The first read is 512 bytes, which is sufficient for the header and part
of the catalog. In my small test case it does even span into the first
table data chunk.

It looks like you read 512 bytes minimum and then, if requested the rest
of a block_size. That could explain the above numbers.

> 
> Inside the library it is assumed that it is ok and efficient to call
> low-level I/O for arbitrary amounts of data. E.g. to read single bytes.

It does not cost a disk access each time, but a system call has pretty
much overhead. It almost always pays off to buffer in the application.

> Making this efficient is the task of the as_read_m and as_write_m
> callbacks which can implement internal buffering if needed.

Ok. But then we would end up with triple buffering.

...
> I think it would be nicer to use the library, instead of compiling
> library sources. That is:
> 
> mysqlbackup_SOURCES = mysqlbackup.cc
> 
> mysqlbackup_LDADD = $(LDADD) $(CXXLDFLAGS)
> $(top_stcdir)/sql/backup/libbackupstream.la

Ok. Will fix.

Regards
Ingo
-- 
Ingo Strüwing, Database Group
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schröder, Wolfgang Engels, Dr. Roland Bömer
Vorsitzender des Aufsichtsrates: Martin Häring   HRB München 161028
Thread
bzr commit into mysql-6.0-backup branch (ingo.struewing:2693) WL#4534Ingo Struewing30 Sep
  • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2693)WL#4534Rafal Somla30 Sep
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2693)WL#4534Ingo Strüwing30 Sep
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2693)WL#4534Rafal Somla1 Oct
    • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2693)WL#4534Ingo Strüwing30 Sep
      • Re: bzr commit into mysql-6.0-backup branch (ingo.struewing:2693)WL#4534Rafal Somla1 Oct