Hi Ingo,
See my replies below. I have one general comment, and I'm sorry that it appears
only after you did all the coding. The fact is that you don't have to truly
implement many of the _v1_services.h functions if you are going only to read a
backup image. Actually the only callbacks which you need for reading are:
bcat_reset()
bcat_close()
bcat_add_item()
bcat_create_item()
and
bcat_iterator_get(...,BSTREAM_IT_DB)
i.e. the iterator enumerating all databases (used when reading metadata). For
all the other callbacks it is enough to provide some trivial implementations.
E.g. bcat_iterator_get(...) can simply return NULL for all other types of
iterators or it can always return the database iterator, regardless of the
requested one.
Ingo Struewing wrote:
> #At file:///home2/mydev/bzrroot/mysql-6.0-wl4534/
>
> 2693 Ingo Struewing 2008-09-30
> WL#4534 - Backup client program
>
> Quick and dirty prototype for early design review.
>
> First attempt is to implement catalog in C++.
> Uncertainty is about whether it may later be necessary to associate
> table numbers from the table data chunks with catalog data. This won't
> work efficiently with the list implementation. If it turns out that we
> need arrays in the catalog, we may not implement this in C++.
> As far as I know, we do not use the standard C++ library for
> portability reasons. And we don't have an equivalent for std::vector.
> So dynamic arrays (mysys/array.c) come to mind. But then mixing of
> C and C++ in the catalog doesn't seem right. An extension of the
> type-inheritance model of the stream library seems more appropriate.
> Especially as one could use these types directly.
> However, IMHO, this would prove that the catalog representation cannot
> be chosen freely when using the stream library.
Actually yes, information stored inside backup image refers to catalogue objects
by their "index" which I call "object coordinates within the catalogue". For
example, entries containing metadata, refer to the object which this metadata
describes by its coordinates. Similar, blocks of table data refer to the table
by its number within given snapshot, i.e. its coordinates.
This means that to fully present the contents of the image, you must be able to
find an object given its coordinates.
If we want to use arrays for efficiency, one issue is whether we need to have
dynamic arrays which can grow, as we don't know object counts in advance.
Although we know how many tables belong to each snapshot (this information is
read from the header and stored in st_bstream_snapshot_info) we don't have this
information for many other object types. For example, we don't know in advance
how many databases are there in the image, or how many non-table objects are
within each database.
One way of handling this is the way I use in the backup kernel code. I use
dynamic lists to store C++ object instances representing each item in the
catalogue and I have arrays of pointers to these objects for fast indexing. This
can be arranged as follows:
1. When backup catalogue is read, store objects on dynamic lists as you do it
now and count them.
2. After reading the catalogue, for example inside bcat_close() callback,
allocate arrays of pointers of appropriate sizes and store indexes in these arrays.
Btw. I don't share your concerns about mixing C and C++. I think it is perfectly
ok to use C code from C++. For example sql_string.cc defines String class which
is just a wrapper around string library written in C.
>
> Open questions:
>
> - Why do we need to bstream_next_chunk() before some bstream_rd_*()
> calls, but not others.
This is a design flaw. I agree that it should be consistent through the
interface and I think the user should not have to call bstream_next_chunk().
> - Why header.version is not filled with the library's verion.
This is a design flaw again. The library is written with the assumption that its
user knows that he is reading correct version of the image. Thus the header
should not have version member I think.
> - Why snapshot data belong to two tables (in my backup image example)
> while there is just one table.
Tables are numbered starting from 1. There is a special table number 0. Data
marked as "table 0" is a common data which is not tied to any particular table.
I'm not sure if native MyISAM driver sends any common data. If not, then having
a data block for table 0 (hopefully empty) would indicate a flaw in backup
kernel code writing table data.
> - Why snapshot data is read in 512 + 15872 byte chunks instead of 16K.
Not sure what do you mean. Does it mean that there are 2 low-level read() calls,
one which returns 512 and the following which reads 15872 bytes?
In any case, the library does not try to read data in 16K blocks since it
assumes that OS is doing this in the background. What the library tries to do,
is to keep as few data as possible inside internal buffers to avoid double
buffering.
Inside the library it is assumed that it is ok and efficient to call low-level
I/O for arbitrary amounts of data. E.g. to read single bytes. Making this
efficient is the task of the as_read_m and as_write_m callbacks which can
implement internal buffering if needed.
> - Why Backup-finish time and binlog positions are not available after
> reading summary.
>
Not saving binlog position was a bug which I just fixed and pushed to the
6.0-backup tree (BUG#35240). Need to check if my patch fixes also the finish
time problem.
> === modified file 'client/Makefile.am'
> --- a/client/Makefile.am 2008-08-20 16:21:14 +0000
> +++ b/client/Makefile.am 2008-09-30 09:14:57 +0000
> @@ -42,6 +42,7 @@ CLEANFILES = $(BUILT_SOURCES)
>
> bin_PROGRAMS = mysql \
> mysqladmin \
> + mysqlbackup \
> mysqlbinlog \
> mysqlcheck \
> mysqldump \
> @@ -57,6 +58,11 @@ mysql_LDADD = @readline_link@ @TERMCAP
> $(LDADD) $(CXXLDFLAGS)
> mysqladmin_SOURCES = mysqladmin.cc
>
> +mysqlbackup_SOURCES = mysqlbackup.cc \
> + $(top_srcdir)/sql/backup/stream_v1.c \
> + $(top_srcdir)/sql/backup/stream_v1_transport.c
> +mysqlbackup_LDADD = $(LDADD) $(CXXLDFLAGS)
> +
I think it would be nicer to use the library, instead of compiling library
sources. That is:
mysqlbackup_SOURCES = mysqlbackup.cc
mysqlbackup_LDADD = $(LDADD) $(CXXLDFLAGS)
$(top_stcdir)/sql/backup/libbackupstream.la
However, I do realize that the library is not properly versioned yet.
Rafal