Rafal,
My response and rebuttal. Explanations and proposed resolutions not
commented on are fully agreed.
Chuck
> To be implemented
> =================
>
> Handling duplicate database names when populating backup catalogue.
> -------------------------------------------------------------------
>
> > * I would like to see this method changed to eliminate
> duplicates. This
> > would solve BUG#34480.
> > +Image_info::add_db(const String &db_name, ulong pos)
>
> Adding test for duplicates to add_db() method would make
> sense, but there will
> be slight problem how to communicate this information to the
> caller. Currently
> method returns a pointer to Image_info::Db object, with NULL
> meaning that the
> database could not be added because of some error. If, in
> case of duplicate, we
> also return NULL, we would need a way to distinguish it from
> other error
> conditions. This would possibly lead to change of signature
> of the method and
> need to rewrite other parts of the code. At the same time, it
> is more flexible
> to have a separate method for checking if a given database is
> already in the
> catalogue. Such method, say Image_info::has_db(String name),
> could be called
> before inserting a database into the catalogue to avoid
> duplicates. The method
> could be useful in other context as well.
>
> I'll add the has_db() method to Image_info class.
>
ok. Good alternative.
> Fix index types
> ---------------
> > * Inconsistent use of data type for indexes. See below.
> Please choose one
> > and stick with it -- ulong seems too large but probably the
> better choice.
> > In this case, add_db uses ulong but get_db uses uint. This
> could be the
> > source of a unique bug. Indeed, later uint pos; is one of
> the attributes of
> > the db iterator. Likewise, table_count is uint.
> > + Db* get_db(uint pos) const;
> > + Table* get_table(uint snap_no, ulong pos) const;
>
> > * You're using uint for counts but ulong for position.
> Please change all to
> > ulong.
> > + uint table_count() const;
> > + uint db_count() const;
>
> Yes, need to fix that. Thanks for spotting the problem. My
> thinking here is as
> follows:
> - there can be a lots of tables, even more that 1M;
> - the possible number of databases is smaller and will never
> exceed 64K;
> - the possible number of backup/restore drivers used is even
> smaller - it is
> also limited to 256 by the backup stream format which uses 1
> byte for storing it.
>
> Therefore, the following types can be used for indexing these
> kinds of objects:
> - ulong for tables (and other per database objects),
> - uint for databases,
> - ushort for snapshots,
>
> I'll make code consistent wrt. using these types for indexes.
Ok. Please use #define or something for these values when used inline, e.g.
array[MAX_SNAP] or something.
> Remove SHOW_ARCHIVE command
> ---------------------------
>
> Good point - I'll wipe it out from the code.
>
> Remove TEST_ERROR_IF macros
> ---------------------------
> > * What is this? This cannot be correct. Table names may
> start with 'b' in
> > any database. Please remove this comment.
> > + FIXME: add_table should generate error if table name
> starts with 'b' (?)
> >
> > * I think I see why you have arbitrary starts with 'a' 'b'
> checks. Please
> > change the comments for all of these to be certain they are clearly
> > identified as debugging statements. My recommendation is to
> either remove
> > them completely or guard them with a DEBUG_EXECUTE_IF() macro.
>
> OK, I'll remove these. This is remnant from some early
> version of the code. I
> tried once to use these macros to create artificial error
> situations for testing
> error conditions. But I think we must find a better solution
> for that. I've
> checked that this code is not used in our tests any more.
>
> Names for IStream and OStream classes
> -------------------------------------
>
> > * Please add a note that your OStream and IStream are not
> to be confused
> > with ostream and istream respectfully. Also, 'OStream'
> violates our coding
> > guidelines. It should be 'Ostream'.
>
> I'll rename them to backup::Input_stream and
> backup::Output_stream, respectively.
>
Thank you!!! :))
> Use object services API for DDL blocker
> ---------------------------------------
> > * Although I do not like the reorganization of the code WRT calls to
> > progress tables functions and DDL operations, I must
> request that you change
> > the DDL calls to use the new si_objects abstractions for
> DDL blocker.
>
> Good point. We are not using the si_objects abstractions in
> the current code and
> this is a good occasion to change that - I'll add this to the patch.
>
> Fixes in documentation
> ----------------------
>
> - Wrong WL number in cset comments.
>
> > * Please restore this comment (or something like it). I
> think this is a very
> > important implementation detail that should not be overlooked.
> > - Note: The block_ddl() call must occur before the
> information_schema
> > - is read so that any new tables (e.g. CREATE
> in progress) can
> > - be counted. Waiting until after this step
> caused backup to
> > - skip new or dropped tables.
>
> > * Please make this doxygen format (kernel.cc):
> > +/*
> > + How to use backup kernel API to perform backup and
> restore operations
> > +
> =====================================================================
> > [...]
>
> > * Watch out for name misuse. The comment 'Path to the file
> where backup
> > image is located' should read 'Path to where the backup
> image file is
> > located'.
>
> > * Error in comments: obs:;Obj should be obs::Obj.
>
> > Also, mixed terms -- 'start' should 'end',
> > 'begin' should 'done'. Please repair.
>
> > * Watch references to methods, attributes, classes in
> comments, e.g.,
> > Backup_crate_ctx.
>
> - remove comment
> > * We determined in Orlando that this is correct. Please
> strike comment.
> > + TODO: check if this is correct
> > + */
> > + DBUG_PRINT("backup",("version %d",MYSQL_VERSION_ID));
> > + server_version.major= MYSQL_VERSION_ID / 10000;
> > + server_version.minor= (MYSQL_VERSION_ID % 10000) / 100;
> > + server_version.release= MYSQL_VERSION_ID % 100;
> > + server_version.extra.begin= (byte*)MYSQL_SERVER_VERSION;
> > + server_version.extra.end= server_version.extra.begin
>
> > * Why is this comment in the code? Please remove.
> > + field_list.push_back(new
> > Item_empty_string(STRING_WITH_LEN("backup_id")));
> > //op_str.c_ptr(), op_str.length()));
> > + protocol->send_fields(&field_list, Protocol::SEND_NUM_ROWS |
>
> I'll fix above issues and re-consult my spell checker.
>
> > * Wording issues in comments.
> > + Instances of this class store name of the object and
> other relevant
> > + information. For each type of objects a subclass of this
> class is derived
> > + which is specialized in storing information about that
> kind of objects.
> > ...reads strangely. It should read:
> > + Instances of this class store the name of the object and
> other relevant
> > + information. For each type of object, a subclass is
> derived which is
> > + specialized in storing specific information about an object type.
> > In this particular comment, what is 'other relevant
> information' and why
> > isn't this information in the subclasses?
>
> I'll reformulate the comment as you suggest. The 'other
> relevant information' is
> for example the number of snapshot containing data of a given
> table or a
> database to which it belongs.
>
> Misc issues
> -----------
>
> > * Please do not use hard coded values in code. Please
> change all such
> > references to #defines or other tunable mechanisms. Also, why 256?
> > + Snapshot_info *m_snap[256];
> > + if (info.snap_count() > 256)
>
> > * Here's another hard coded seemingly arbitrary value.
> Please use #define or
> > similar.
> > + for (uint no=0; no<256; ++no)
>
> Good point - I'll define the upper limit on number of
> snapshots as a constant.
>
> > * Why implement this function? Allow it to be virtual so
> that we do not need
> > to define a 'base' operation. The compiler is our friend
> (sometimes). ;)
> > +const char* Snapshot_info::name() const
> > +{ return "<Unknown>"; }
>
> I'll make it pure virtual so that each derived class must
> define it explicitly.
>
> > * Why was this line changed?
> > - bzero(m_snap, sizeof(m_snap));
> > + bzero(m_snap,sizeof(m_snap));
>
> > * More spacing issues.
> > + if (info.snap_count()==0 || info.table_count()==0) //
> nothing to backup
>
> > * Spacing violations for operators.
> > + return no+1;
>
> Right, I'll revise spacing and also cut the too long lines.
>
> > (...) I would request that 'no' be changed to something
> > more meaningful ('num' perhaps) -- 'no' is not a normal
> abbreviation for
> > number -- 'no.' is the correct abbreviation. Likewise,
> every m_no should be
> > changed.
>
> Would it be correct to replace 'no' with 'nr'? E.g.
> Buffer::m_no will become
> Buffer::m_nr.
Actually, what about num? m_num, your_num, my_num, etc.?
> Explanations
> ============
>
> Order of selecting backup drivers.
> ------------------------------------
>
> > * Are we certain of the order in which the default and
> consistent snapshot
> > drivers will be searched for a given table? Consistent
> snapshot should
> > always be 'asked' first.
>
> The order is as expected. In general it is determined by
> order in which
> Snapshot_info objects are stored in the
> Backup_info::snapshots list. When
> Backup_info instance is created, snapshots of the CS and
> default driver are
> pushed at the end of that list, in this order (see
> constructor of Backup_info).
> Later, when a native Snapshot_info object is added to the
> list, it is pushed in
> front of other drivers (see
> Backup_info::find_backup_engine()). Thus the order
> of Snaphsot_info objects on the snapshots list is as follows:
>
> - snapshots for native drivers,
> - snapshot for the CS driver,
> - snapshot for the default driver.
>
> This guarantees that CS driver will be used before the
> default driver, and
> native drivers (if exist) before the CS driver.
>
Ok, just checking. If not already there, can you add some comment to the
code supporting this so that 8 months from now we don't second guess
ourselves?
> Timing of DDL blocker calls
> ---------------------------
>
> > * Are the DDL blocker calls on restore setup the way the
> original code was?
> > Specifically, is the DDL blocker's exception code turned
> on? If not, restore
> > will fail and block itself. Sorry, I cannot tell from the
> patch -- too much
> > code reorganization.
>
> I think a correct order is ensured. The DDL blocker is enabled inside
> Backup_restore_ctx::prepare() which is called inside
> Backup_restore_ctx::prepare_for_backup/restore() before
> Backup/Restore_info
> object is created. This means that DDL blocker is active when
> manipulating the
> backup/restore catalogue. The blocker is de-activated inside
> Backup_restore_ctx::close() which is called at the end of
> operation (and also
> from destructor).
>
> The thd->DDL_exception flag is set up inside the object
> services code. I can
> agree that it is not the best place to do this. Makes more
> sense to me if the
> caller of object services sets this up, so that object
> services code doesn't
> make arbitrary decision about ignoring the DDL blocker. I'll
> chagne the code to
> a) not turn on thd->DDL_exception inside si_objects.cc and b)
> turn it on in
> Backup_restore_ctx::prepare() just after the DDL_blocker
> being activated.
Again, I think this information is vital to our mainainability. I think
these comments need to be in the code -- perhaps even in the documentation.
Please add them where you feel it is appropriate.
>
> Changes in the backup kernel and future multi thread code
> ---------------------------------------------------------
> > * Are any of your reorganizations of the code going to
> impede making this
> > code multithreaded? If so, please consider either fixing
> the problem now or
> > annotating the code itself with comments on how to mitigate
> any issues you
> > see.
>
> I don't think my changes will cause any problems when
> introducing multi-threaded
> code. Most of the changes are in the part of the backup
> kernel which handles
> backup/restore catalogue and object meta-data. There is no
> need to make this
> part multi-threaded. We will need several threads in the
> other part of the
> backup process, when table data is backed up using backup
> drivers. This is the
> code in data_backup.cc and I haven't introduced much changes there.
>
> backup::Logger class design and rationale
> -----------------------------------------
> > * Why were the backup progress calls moved to the logger?
> Does the logger
> > really log now (ref: comments/dialog about error
> reporting)? I am confused.
>
> > * I don't understand why the backup progress methods were
> abstracted and
> > pushed to inner classes. Can you please explain the
> rationale for this? To
> > me, this complicates the code as we would have to search for every
> > occurrence of the calls to work on them. Having them in the
> upper-level
> > method makes much more sense to me.
>
> > * I really don't like some of the abstractions. For
> example, these calls to
> > the progress reporting function do nothing more than make
> the calls a class.
> > If you want a class for the progress reporting, please
> issue requests to
> > change the progress code and do not add artificial
> organization to the code.
> > I really must insist on this as it needlessly complicates
> the already
> > complex code.
> > +void Logger::report_state(enum_backup_state state)
> > +{
> > + DBUG_ASSERT(m_state == RUNNING);
> > +
> > + // TODO: info about state change in the log?
> > + report_ob_state(m_op_id, state);
> > +}
>
> This class was introduced as part of implementation of the
> design for WL#3904
> (Error handling) and is described there. In particular, the HLS says:
>
> " - A special "backup context" object will be used for
> reporting errors. It must
> be passed to all functions which are supposed to report
> errors (or log any other
> backup related messages). Messages will be created using
> methods of this object.
> How the messages are delivered to the user will be hidden
> behind this interface.
>
> - In alpha version the messages will be written to error
> log. In future relases
> it is planned to add to INFORMATION_SCHEMA a table
> describing an ongoing
> backup/restore process and the messages will be stored
> there (and still written
> to the error log)."
>
> The idea is to abstract the functionality required by backup
> kernel to report
> information about on-going backup/restore operation.
> Originally, the kernel was
> reporting only errors, but now days there is more things
> which we want to report:
> - errors and warnings,
> - timing information (start/stop/vp),
> - operation progress information (changes of state),
> - information about decisions made in the process (e.g. which
> drivers were chosen),
> - statistical information (e.g. number of bytes read/written)
>
> It makes sense to put all this functionality into one place because:
> a) all these functions are related to reporting information
> about backup/restore
> process;
> b) we have overview of what types of information we need to report;
> c) we can define an interface for this functionality and
> separate if from
> implementation details;
> d) once the interface is defined, we can change the way
> information is reported
> (e.g. add new destinations) without changing existing calls
> in the backup kernel
> -- only the implementation of Logger class needs to be changed.
> e) it helps to maintain consistency in the way all kinds of
> information are
> reported.
>
> I think the abstraction is justified and will actually make
> the code much better
> and easier to maintain. The second issue raised also relates
> to the next subject.
>
> > * What does this mean? "When writing messages, cover the
> case when the type
> > of the operation (backup or restore) has not been decided yet."
>
> Normally, the information reported by Logger class concerns
> an ongoing backup or
> restore operation. This, for example, influences format of
> error/warning
> messages which are prefixed with "Backup:" or "Restore:". The
> type of operation
> (backup or restore) is decided when Logger::init() method is
> called. But it
> might happen that there is an error to report before we call
> init(). For example
> we do preparations like checking privileges or backup
> progress tables before we
> proceed with backup or restore. This is why it is important
> that error reporting
> methods of Logger work even before the init() method is
> called and type of
> operation is decided.
>
>
> Why there are one-line functions and methods
> ----------------------------------------------
>
> > * Why are there calls like this? Isn't this adding
> unnecessary complexity?
> > Are they transformations? If so why do we need them? If
> not, I recommend
> > removing these calls.
> > inline const char* se_name(storage_engine_ref se) {
> return se->name.str; }
>
> The purpose of such functions is to hide implementation
> details behind a clearly
> defined interface. This makes code much easier to maintain.
> Consider these two
> code snippets, where se is of type storeage_engine_ref:
>
> const char *name= se_name(se);
>
> and
>
> const char *name= se->name.str;
>
> The second alternative introduces more dependencies between
> this code and the
> rest of the server. It makes several assumptions:
> - value se is a pointer,
> - it is a pointer to a structure,
> - this structure has member 'name' which is a structure as well,
> - the se->name structure has member str of type char*,
>
> If any of these assumptions is violated by future changes in
> the server code, we
> need to change *all* places in the code where we used
> "se->name.str" construct
> to get to storage engine's name. While with alternative 1 we
> just need to update
> implementation of se_name() function.
>
> Another argument for solution 1 is that the code reads nicer
> and is easier to
> understand, especially if the se_name() function is
> documented properly.
>
> Note that using such functions bears no efficiency penalty as
> inline functions
> have zero call costs.
>
I partially agree, but it is messy and difficult to follow for reviewers. I
would suggest we limit the use of this technique or else the entire code
will be nothing but one-line functions. Where I disagree is that we are
sacrificing too much readability for very little gains in maintainability.
>
> Backup drivers and engines
> --------------------------
>
> > * We are mixing names again! We decided to call them
> 'drivers' but the patch
> > changes them back to 'engines'. Why? Please remove this and
> related changes.
> > - NATIVE_SNAPSHOT= BI_NATIVE, ///< snapshot created by
> native backup
> > driver
> > - DEFAULT_SNAPSHOT= BI_DEFAULT, ///< snapshot created by
> built-in,
> > blocking driver
> > - CS_SNAPSHOT= BI_CS ///< snapshot created by
> CS backup driver
> > + NATIVE_SNAPSHOT= BI_NATIVE, ///< snapshot created by
> native backup
> > engine
> > + DEFAULT_SNAPSHOT= BI_DEFAULT, ///< snapshot created by
> built-in,
> > blocking backup engine
> > + CS_SNAPSHOT= BI_CS ///< snapshot created by
> built-in CS
> > backup engine
> >
> > * Here is another example of using the wrong term. Shouldn't this be
> > 'find_backup_driver'?
> > + the best available backup engine for the table using @c
> > find_backup_engine()
>
> We (I) agreed to simplyfy user documentation by hiding the
> implementation
> details and not mention backup engines. If needed, it should
> be enough to refer
> to backup drivers only in the external documentation.
>
> However, backup drivers and backup engines are still two
> different things and it
> is not possible to ignore this difference in the code
> documentation. To recap,
> backup driver is an object which is created to backup data of
> a given list of
> tables. Similar, restore driver is an object used to restore
> data for given
> tables. Backup engine, is an object which provides
> backup/restore drivers when
> needed.
>
> One should think that backup engine is a static object which
> exists as long as
> the server is running, while drivers are temporary objects
> created for the
> duration of a single backup/restore operation. Also, each
> backup/restore driver
> instance is tied to a particular list of tables it is going
> to process. This
> list of tables is specified when driver is created using
> get_backup() or
> get_restore() method of backup driver. To talk about a driver
> we should have a
> list of tables in mind (because a driver exists only to
> backup/restore
> particular list of tables). When the list of tables is not in
> the context, it is
> better to talk about backup engine, as no drivers have been
> created yet.
>
> What is used to backup table data - backup driver or backup
> engine? The most
> precise answer is that this is done by backup engine using a
> backup driver. At
> the time when backup method is selected for a given table, no
> backup drivers are
> created yet because table lists are not completed yet. Thus
> saying that we
> search for a backup driver for a table is not entirely
> correct - we don't have
> any drivers to choose from at that time. We rather search for
> a backup engine
> which we will use to backup given table. When we decide which
> backup engine is
> used for each of the tables, then we will know the list of
> tables handled by
> each of the backup engines. Only then backup engines will
> create backup drivers
> to handle their lists.
I think the distinction is splitting hairs and not worth the struggle to
maintain that an engine exists. The boundary is artificial and induces
confusion if not used properly -- much like 'object' vs. 'class' in UML. I
request that we drop the concept of engine altogether in the comments and
documentation. In my mind it is an unimportant detail (for users) a small
technical detail of how a driver instance is created.
>
> Define constants for values used in the code
> --------------------------------------------
> > * Please do not use hard coded values in code. Please
> change all such
> > references to #defines or other tunable mechanisms. Also, why 256?
> > + Snapshot_info *m_snap[256];
> > + if (info.snap_count() > 256)
>
> > * Please make version number a #define or something similar
> which can be
> > changed without respect to hard coded numbers in the code itself.
>
> I don't think it is a good idea in these particular cases.
> The way I see it,
> defining a constant like this
>
> #define SOME_PARAM 47
>
> means that the code is parametrized by this value. Thus one
> can change the value
> of the parameter, recompile the code and get a fully
> functional program which
> uses the new value.
>
> But this is not the case with the format version number. The
> code which is
> written now supports only version 1 of the format and no
> other. Having definition
>
> #define FORMAT_VERSION 1
>
> if someone changes it to 2 and recompiles, he will get an
> incorrect code! This
> code will write backup image in format 1 but store in it
> information that it is
> using format version 2 - that is very bad.
>
> Therefore I think the format version number should be hard
> coded. If we want to
> change the format we must change the code anyway.
>
> > * Here's another hard coded seemingly arbitrary value.
> Please use #define or
> > similar.
> > + for (uint no=0; no<256; ++no)
>
> Similar, the maximal number of snapshots supported by the
> kernel is not a
> tunable parameter. It is determined by the format of backup
> image which uses one
> byte for storing this number. Therefore the 256 limit is hard
> coded and should
> not be changed.
>
> > * Here also.
> > + char buf[255]; // buffer for llstr
>
> Above code is taken from send_reply() function. It declares a
> buffer which is
> local to that function and used only for converting a long
> int number into its
> decimal representation. I think it is an over kill to
> parametrize the size of
> this buffer. It is ok to fix the size of a temporary buffer
> like this since it
> is well known how big it must be (to accommodate a decimal
> representation of a
> long int number) and there is no reason for changing its size.
>
I will agree with the temporary buffer statements you made, but I must
insist on #defines for the other values. It is the same argument you used
for the one-line functions. It's a maintainability issue. In this case, it
also improves readibility.
>
> Other issues
> ------------
>
> >> * What is this? "Use base POD structure fd_stream..."
>
> POD means "Plain Old Datastructure" as opposed to C++ classes
> which can have
> virtual tables. In plain C it is common to cast a pointer to
> re-interpret it as
> a pointer to a different structure. The backup stream library
> uses this a lot as
> a replacement of abstraction mechanisms absent in plain C.
> But when we have a
> class with virtual members, we cast it down to a POD type and
> then cast back,
> the result can be incorrect because C casts ignore virtual
> table pointers. To
> avoid this problem, I intraduced a POD base class for the
> stream classes.
> Casting back to the POD base is avoiding the problems with
> virtual table.
>
Ok, but please spell out POD. It isn't clear what that means without an
explanation.
> > * Does the CREATED state replace the READY and INIT states for the
> > Backup_info class?
>
> Yes, Backup_info has fewer states now because preparations
> for backup have been
> moved to Backup_restore_ctx class. When Backup_instance is
> created inside
> Backup_restore_ctx::prepare_for_backup() it is ready for
> selecting databases to
> be backed up. Then one needs to call Backup_info::close() and
> the object moves
> to CLOSED state and becomes ready for backup operation.
>
> > * Please remove references to online backup being a plugin
> until such time
> > as it is a plugin.
> > + @note This function is called when server shuts down its plugins.
>
> The note doesn't refer to online backup as a plugin. It only
> informs that the
> function initializing the online backup system is called
> during the server
> initialization sequence at the time when server loads its
> plugins (such as
> storage engines and such). This is important to know since
> the time at which the
> function is called determines what can be done inside it (how
> much of the server
> is already (de-)initialized). Do you see a way of
> re-formulating the note in a
> less ambiguous way?
>
Take out 'it's plugins'.
> > * The changes to the cmake file and the makefile do not
> agree. Please make
> > sure all changes to the makefile are reflected in the cmake
> file. It appears
> > that the restore_info is different. I could verify this if
> I could compile.
>
> I think this is OK. Restore_info is special because it has
> only header file --
> no .cc file. This is why it appears in Makefile, where header
> files are listed,
> but not in the cmake file where only source files are listed.
>
> > * Will the code continue to pollute the output to the
> console? If yes,
> > please change all such statements to DBUG_PRINT calls or
> possibly some other
> > logging method.
>
> It is more complex than that. We always use DBUG_PRINT to
> output diagnostic
> messages. Why these messages end up in the console instead of
> the trace file is
> still a mystery for me. When I tested this patch this never
> happened to me but I
> would be not surprised if the problems surfaces in the
> pushbuild. Since there is
> a separate bug registered for that issue (BUG#33173) I
> suggest that it will be
> addressed in the bug, not as part of this patch.
>
Ok. Good.
> > * This is another hard coded value that is sure to bite us
> later. Especially
> > if a third party vendor adds their own driver for their engine or an
> > alternative for one of ours. Please change this to a
> #define or something.
> > Why eight? Why not 11 or 20 or ...?
> > + native_snapshots(8)
>
> This value is not as hard coded as it might seem. This is
> only the initial size
> of the native_snapshots map, internally used to initialize
> the HASH structure.
> The map can grow dynamically as needed (because HASH does).
>
Please add comments to the code to explain this. I really detest using hard
coded values that are not explained away (as you have above) or are #defines
(and explained accordingly). I'll spot you the argument for temporary
buffers though (see above).
> Where we disagree
> =================
>
> > * Are we finished with the code shuffling? This sort of massive
> > reorganization causes a great deal of problems for
> simultaneous work. I
> > would recommend the existing kernel and associated drivers
> et. al. be frozen
> > for any changes other than adding new functionality (like dependency
> > checking, synch with replication, etc.) and bug fixes.
>
> While I agree that code shuffling and big refactoring causes
> a lot of troubles I
> don't agree with the suggestion that we freeze the code at
> the current stage. I
> keep reminding that parts of the backup kernel are prototype
> code which is not
> of production quality and needs to be rewritten. If we don't
> do that, it will
> hit us badly later both during development of new features
> and when we maintain
> the code. This is probably the last moment when we can try to
> improve our future
> situation by making the quality of released code as good as
> possible. Or at
> least remove the code which we know is bad.
>
Ok. Then what I would like to see is for us to define which parts you feel
are still prototype and the rest become identified as production code and
therefore more emphasis placed on minimizing changes to those portions that
are 'release' code. I just don't like this build-redesign-repeat loop that
we seem to get trapped into.
> I believe this patch is one of the last steps in the effort
> to remove the
> prototype code from the kernel. What remains is the code in
> data_backup.cc which
> talks to backup/restore drivers - it might need some
> refactoring. Perhaps
> rewrite of that code can be joined with making backup kernel
> multithreaded.
>
Ok. Let's not loose this bit of information (see above).
> Since big code changes cause a lot of problems when merging
> etc. I think it is
> fair to put the burden of that on the person who introduces
> these changes. We
> could think about correct push order to minimize problems.
> For example, when
> this patch is going to be pushed then:
>
> - first, I will wait until you finish and push any big
> patches you are working
> on,
> - then you will wait until I merge and push my patch,
> - then everyone updates his local trees and starts working
> from the new tree
>
This plan will work.
> This will slow down development a bit by creating a
> synchronization point, but
> will minimize troubles for you when I push my big patch.
>
> > * I have confirmed in other works that the TABLE_LIST is
> freed when the
> > thread is destroyed. In fact, I have had issues when I
> freed the table list
> > manually. I recommend removing the code to free the TABLE_LIST.
>
> I assume that you talk about free_table_list() function here.
> I think this
> function must stay as a counterpart to build_table_list().
> The latter function
> allocates memory using my_malloc() and this memory should be
> freed using
> my_free(). Responsibility for doing this lies on the caller
> who becomes an owner
> of the created list. It is possible that the caller will pass
> ownership of the
> list to some other code (e.g. the THD structure) but it does
> not have to happen
> necessarily. Thus, at least potentially, the caller of
> build_table_list() will
> have to free that list, and the free_table_list() function
> relieves him from the
> need to know how the list is organized internally. This
> ensures correct
> separation of the code increasing its maintainability. Note
> also, that you still
> use free_table_list() in the destructor of the default backup driver.
>
Ok, please check that there aren't any paths in the code where we delete
memory that is already deleted. That was my biggest worry.
> > * Please combine assertions into one statement. Debugging
> code with multiple
> > assertions together like the following can be annoying to
> the extreme when
> > debugging the code on Windows (not your fault -- it's a
> platform issue).
> > + DBUG_ASSERT(is_valid());
> > + DBUG_ASSERT(m_state == PREPARED_FOR_BACKUP);
> > + DBUG_ASSERT(m_thd);
> > + DBUG_ASSERT(m_stream);
> > + DBUG_ASSERT(m_catalog);
>
> There is a big advantage in having asserts organizing that
> way. When one of them
> fails, the error log of the server will point at the
> assertion which failed and
> give us more valuable information about the cause of the
> failure. Note that we
> don't always have access to the core file to examine variable
> values with a
> debugger. Knowing how hard it is to trace some bugs, I think
> that every piece of
> additional information is extremely important. Therefore I
> disagree to change
> the layout here.
>
Realize this will cause a lot problems for me should I have to debug the
code. I respectfully request you reconsider your position and/or limit the
use of the assertions. There are two questions I did not ask; 1) Are all of
these are needed? and 2) Are they really assertions or are these exceptions
that should be handled as gates? Please remove those that are not
specifically critical to the method (if possible). I won't budge on this one
because of the tremendous pains I have already suffered working on bugs in
the server code. Life is not grand on Windows in mysql and using assertions
like this makes it far worse.
> > * Remove conditional error statements from constructors.
> Constructors should
> > never have code that can produce errors that need to be
> acted on. Besides,
> > constructors should be fast. Example:
> > +Backup_restore_ctx::Backup_restore_ctx(THD *thd):
> > + m_state(CREATED), m_thd(thd), m_thd_options(thd->options),
> > + m_error(0), m_remove_loc(FALSE), m_stream(NULL), m_catalog(NULL)
> > +{
> > + /*
> > + Check for progress tables.
> > + */
> > + if (check_ob_progress_tables(thd))
> > + m_error= ER_BACKUP_PROGRESS_TABLES;
> > +}
> > In this case, the call to the progress tables should occur
> prior to creating
> > the instance of the class. If this is not possible, please
> move this code to
> > the prepare() method. The code that checks for the progress
> tables is not
> > fast and can potentially take a long time (it performs and
> open and lock
> > table command).
>
> Well, I never heard about a rule that a constructor should
> not check for errors
> or that it should be fast. I'm sure such rules exist but I'm
> not going to follow
> them because I don't agree with them. In fact, one of the
> main reasons why C++
> has exception mechanism is to have a nice way of reacting to
> errors inside
> constructors. We are not allowed to use exceptions, so we
> must work-around by
> setting a flag inside object and checking it after object has
> been constructed -
> just an annoying inconvenience.
>
> One reason for having complex constructors is to avoid having
> "partially
> constructed objects" around. The idea is to ensure that if
> construction of an
> object has been successful, then this object is fully
> operational, with all
> necessary resources allocated and all preparations done.
> Another approach would
> be to first construct a non-functional "stub" of an object
> and then explicitly
> initialize it before it can be used. But then we must be
> careful to not use the
> object before it was initialized (as compiler doesn't prevent
> that). This
> creates one more possibility to introduce errors into the
> code. In the first
> approach we are safer because it is not possible to use an
> object before it was
> created.
>
I think this may be a case where I will have to suggest we agree to disagree
because I strongly oppose such practices. For now, I will mark it down as
simply a difference in style or preferred practice. Leave it the way it is
but realize I will change it if tasked to work on this part of the code
because I think it is wrong to do it this way.
> > * I really don't like the continued use of the local
> version of the string
> > class. Why can we not use the calls in the strings library?
> Recommend using
> > the strings library and removing the duplication.
>
> This is not duplication - we are using the strings library, but via a
> convenience class String. This class is nothing more than a
> wrapper which calls
> functions from the strings library. Using such wrappers is
> not only convenient
> but it also avoids repeating the same code over and over
> again, decreasing the
> number of mistakes that can be made. Another advantage is to
> increase code
> maintainability. For example consider the convenience constructor:
>
> String(byte *begin, byte *end):
> ::String((char*)begin,end-begin,&::my_charset_bin) //
> FIXME: charset info
> {
> if (!begin)
> set((char*)NULL,0,NULL);
> }
>
> It not only contains safety checks, which otherwise would
> have to be done
> manually each time, but it also encapsulates the arbitrary
> decision to use
> binary character set for such strings. If, one day, we decide
> that we must use
> the system character set instead, we would have to change
> only this constructor
> instead of going and changing every place in the code where
> we create such a
> string (risking that we missed some of them).
>
> Finally, if your only argument is that you don't like it,
> then my counter
> argument is that I *do* like it, and very much indeed.
>
Ok, I understand now. I accept.
> Code formatting issues
> ----------------------
> > * Please fix spacing issues on assignments (there are a lot
> of these).
> > + virtual bool is_valid() =0;
>
> Our coding guidelines say nothing about spacing for pure
> virtual methods -
> declaring pure virtual method is not a variable assignment!
> Therefore I see no
> reason to not use this convention which I prefer.
>
> > * Format of doxygen comment incorrect.
> > +/** @file
>
> Why is it incorrect? I think it is perfectly valid format of
> doxygen comment.
> And our coding guidelines don't put any constraints on this format.
>
Ok, but all the others are:
/**
@file
That's all I was complaining about. It shouldn't be a big deal to change
that.
> > * Non-standard comments found.
>
> I have seen this comment from you before. When I saw it last
> time I have checked
> my code and couldn't find any comment I wrote which would
> violate our coding
> guidelines. However, I could find couple of your comments
> which do. For example
> this comment in be_default.cc
>
> > /**
> > * @brief Prelock call to setup locking.
> > *
> > * Launches a separate thread ("locking thread") which will lock
> > * tables. Locking in a separate thread is needed to have
> a non-blocking
> > * prelock() (given that thr_lock() is blocking).
> > */
> > result_t Backup::prelock()
> > {
> > DBUG_ENTER("Default_backup::prelock()");
> > DBUG_RETURN(locking_thd->start_locking_thread("default
> driver locking thread"));
> > }
>
> which breaks this rule from the guidelines:
>
> * When writing multi-line comments please put the '/*' and
> '*/' on their own
> lines, put the '*/' below the '/*', put a line break and a
> two-space indent
> after the '/*', do not use additional asterisks on the left
> of the comment.
>
> I suggest that either we all follow the guidelines strictly
> or stop raising such
> issues in our reviews. Needles to say that I'm strongly for
> the second
> alternative...
Agreed. I will endeavor to do a better job of listing exactly what I find
wrong and restrain from blanket statements of the critical nature. I went
back through the patches and found one that I think spurred my comment:
+/********************************************************************
+
+ Classes for representing various object types.
+
+ ********************************************************************/
Which does violate the coding guidelines.
Chuck