List:Internals« Previous MessageNext Message »
From:Mats Kindahl Date:June 11 2009 7:41pm
Subject:RFC: draft of physical structure of server code
View as plain text  
Hi all,

As part of the reengineering project we are focusing on supporting the
new server development model, eliminate some sources of errors that we
are aware of, and allow the server to be developed in a modular
fashion. To these goals, we are aiming at defining a physical
structure of the server that is intended to be future-proof.

The physical structure is concerned with organization of files in such
a manner that it is easy to add new features and easy to modify
existing code with a minimum risk of introducing new bugs.

In order to accomplish this, we want to have a structure that supports
hierarchical testing, that is, directly testing individual components of
the server as well as testing subsystems of the server.

In addition, the structure shall be such that addition of features or
bug fixes normally are isolated from each other physically, that is,
are placed in different files and/or different parts of the
systems. This will reduce the risk of merges causing inadvertent
changes to the code.

Note that this proposal mentions a few coding rules that are needed to
support the proposed structure, but the goal of this work is *not* to
produce a set of coding guidelines.  The maintenance of coding
guidelines is a separate issue and is maintained by the coding
guidelines committee (which is currently headed by Kostja).

A part of the over-all goal is also to allow this structuring to
proceed without affecting existing development and that it shall be
possible to implement the transition in an iterative manner.

This draft has been through several revision rounds already, trying to
take many aspects into account while still not imposing rules that are
to strict to allow future development of the server.  We would very
much like to have feedback from the MySQL development community on
this draft.

Several projects have already been affected by the lack of a
specification of an internal structure. In order to not stall
development any further, we have to roll this out no later than June
25, 2009, so we may or may not be able to accommodate for feedback
that arrives after this date.


High-level structure
====================

We envision that the system consists of a number of *packages* that
together make up the code of the system. In order to build the server,
and associated components, we have a *build frame* or *build system*
that is used to manage and, especially, build the system.

In order to support the easy addition and removal of features, we
assume that each feature is contained in a separate package (see
below) and a minimum of changes shall be required (preferably none) to
code outside this package to introduce the feature. To support this
convention, the build frame has to be independent on the number and
type of packages that are available, and use generic methods for
deciding what packages are to be included in the build.  This in turn
requires the packages to provide the necessary information so that the
build frame can do its job.


Components
==========

A component consist of a set of header files and a set of associated
C/C++ files. The component is the smallest unit of the physical
design.

Typically, each component consists of a header file and a C/C++ file
with a common base name, for example "parser.h" and
"parser.cc". However, there are some cases where it makes sense to
have multiple header files for a component and cases when it makes
sense to have multiple source files.

- Using several header files can be used to present multiple
  interfaces into a single component.

- Using several source files could be mandated when the linker is
  file-based, and will just map symbols on file-level (loading/linking
  entire files, not individual functions).

In these cases, the files of each component shall have a common
prefix distinguishable from other components.


    =========== =================================================
    Component   Files
    =========== =================================================
    rpl_filter  rpl_filter.h rpl_filter.cc
    reg_main    reg_main_internal.h reg_main_public.h reg_main.cc
    =========== =================================================



Packages
========

Packages are collections of components that serves a common purpose.
This formulation is deliberately not exact since what actually makes
sense to turn into a package vary from case to case. However, the
following issues should be considered when deciding whether a
candidate package makes sense as a package:

- Can the candidate package be released independently of the rest of
  the server? If not, i.e., changes to this package is likely to
  require changes to other packages, then maybe it should not be a
  package.

  Releasing here does *not* mean distributing the code in isolation,
  it means releasing, e.g., a new version of the package for use with
  the rest of the server.

- Is the candidate package very small, e.g., a single component? In
  this case it might make sense to group several such candidate
  packages with similar purpose into a single package.

  A typical example would be support for individual character sets,
  that does not make sense to place in a single package each, but is
  sensible as a package of "character set information".


Package naming and structure
----------------------------

Each package is represented as a directory. The basic assumption is
that everything related to a package should be placed in the
directory. This includes, but is not limited to: header files, source
files, and unit tests.

Basic goals and assumptions are:

- Changes in the package internals should not inadvertently affect
  other packages that use the package

- It shall be possible to support third-party solutions as package in
  the package structure and shall not require re-organization to fit
  the package structure

The package directories will be placed in a directory alongside the
``sql/`` directory. Apart from that, all packages are placed at the
same level. We are placing the packages in a new directory to be able
to distinguish between "unorganized" and "organized" code.

The following package directories are proposed (some directories
already exists and almost have the basic structure proposed):

    ========== ==============================================
    Package    Purpose
    ========== ==============================================
    storage/   Storage engines
    server/    Server modules
    common/    Common utilities
    ========== ==============================================

There are some other directories that are being considered, such as
``mysys/``, and the above list will be extended as needed.

Package names shall be small letters only, with underscore to separate
individual words in the package name. Note that the package name may
not start with an underscore.  This choice of name is used to allow
the package name to be used both as a file name, a C/C++ symbol, and
as identifier in other tools (such as Doxygen).

Examples: registry, query_model


File names
~~~~~~~~~~

The choice and restrictions on file names is governed by the current
coding style.

The coding style takes into account operating system restrictions and
restrictions imposed by tools such as the compiler, linker, and other
processing tools. However, the physical structure itself does not
impose any special requirements on the file names.


Package namespace
~~~~~~~~~~~~~~~~~

All symbols of a package shall be placed in a single namespace, and
the namespace name shall be the same as the name of the package.
Since package names as specified above are legal C/C++ symbol names,
this will always be possible.


Package interfaces
------------------

For each package, there is a set of interfaces into the package. Each
interface is represented physically as a header file, meaning that
each package have one or more interfaces, but potentially have header
files that are not package interfaces.  Note that the an interface
into a package is *all* that is in a header file, meaning that we do
not place any specific requirement on the form of the interface *in
this proposal*.

The package owner shall be able to decide what interface files are
available for use, but initially we will not be able to do this for
practical reasons: it requires support from the build frame.  However,
the rules outlined below on interface usage will not have to change
when the transition is made to a build frame that support this.


Interface usage
~~~~~~~~~~~~~~~

In order to use an interface of a package, the header file is included
using the form:

    #include "package_name/interface.h"

The include path is set up by the build system so that this is
possible. Note that it is an error to include a file that is not a
package interface or not a header file of the same package.  Ideally,
the build frame will not allow this, but before that feature is
implemented in the build frame, it will be possible to do by mistake.

Header files of the same package are included using the form:

    #include "header.h"

This is required since the build frame eventually will not support
inclusion of files that are not package interfaces.

For package interface files from the same package, this form should
also be used. The reason for this is that if the role of a header file
changes (from package interface to internal file or the opposite)
shall not require changing the include directives.

Coding requirements
===================

This section outlines some basic rules that are meant to avoid common
problems associated with developing for a package structure as well as
allowing tool-support for checking and manipulating components and
packages.  The need of tool support is necessary to allow the system
to grow, since manually resolving issues will unnecessarily waste
effort on maintaining inconsistencies.

The aim is to keep the rules to a bare minimum and specifically only
consider issues that (potentially can) traverse package boundaries or
that cause problems when maintaining or operating the build frame.
Issues on what is "good coding style" is maintained separately and not
part of this proposal.  This is done to restrict the scope of the
proposal.


Every header file should be self-sufficient
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For every header file "header.h", the following program shall compile
without errors:

    #include "header.h"

The reason is that when using a header file "header.h", it should be
sufficient to include "header.h" holding the functionality sought
after. If it is necessary to include any other files before "header.h"
because there are definitions required by "header.h", we have two
problems:

1. It is hard to find out what dependencies are needed, and it will
   eventually lead to a trial and error approach that we are now
   seeing.

2. If the dependencies change, the file might include more files than
   necessary.



Every header file should have an include guard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For a header file "header.h" in package "package", the include guard
should have the name ``PACKAGE_HEADER_INCLUDED``. We choose to
standardize the include guard so that we can use external include
guards if the need should arise. We omit the extension from the name,
since header files may have a number of different extensions and we do
not want to standardize on any one of them.

Existing include guards that are not violating the C/C++ standard will
not be changed initially, but developers are encouraged to make the
change if they are changing the header file.


Source and header files should only include definitions it needs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For header files, it is critical to use forward declarations when that
suffices.  The problems with including definitions that are not needed
are twofold:

1. It introduces additional dependencies that are not necessary since
   definitions contain references to stuff that *it* needs. Note that
   dependencies may not only be on header files, but that unintended
   symbols may be pulled into the system.

2. It unnecessarily increase the compile time since it requires
   opening *at least* one more file (but usually several).  This
   problem is, however, secondary.


There shall be no convenience include files inside the server
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Convenience include files are include files whose only purpose is to
bundle other include files.

The reason to why we want to avoid this *inside* the server is that it
introduces unnecessary dependencies between packages. Should some
include file be added to the convenience include file because *one*
component needs it, *all* components that include this convenience
include will be affected.

In addition, it has been observed that "common" definitions have been
added to such convenience include files and have introduced at least
one circular include dependency.

However, convenience include files serve a purpose for maintaining
interfaces *into* the server is accepted (for example, to make it
easier to work with the client interface). For these files it is,
however, critical that they are convenience includes and not contain
separate definitions.


No ``using`` directives in header files at namespace level
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Placing using directives at namespace level in header files will force
any file that includes the header file to resolve symbols in a
namespace they have no control over. This can lead to subtle and hard
to find bugs, and should therefore not be used.


No ``using`` declarations before #include directives
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Placing a using declaration before including another file will place
all the symbols of the included file in a namespace and should not be
used.


Entities declared in a component shall be defined in the component
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

An object or function declared in the header file of a component shall
be defined in the same component (usually in an implementation
file). The reason for this rule is that it shall be easy to know what
components that need to be linked in order to use the component. If
some definition is in another file, it will be hard to find and manage
the right dependencies between components in the system.


No gratuitous link-time dependencies between components
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Such dependencies can occur if a component, for example, declares an
``extern`` variable and do not include the proper header file. All
dependencies shall be explicit in the sense that they shall be visible
in the file as an ``#include`` directive. This will allow dependencies
between components to be clearly visible and in addition can be
detected and tracked automatically by tools.

-- 
Mats Kindahl
Senior Software Engineer
Database Technology Group
Sun Microsystems
Thread
RFC: draft of physical structure of server codeMats Kindahl11 Jun
  • Re: RFC: draft of physical structure of server codeKonstantin Osipov16 Jun
    • Re: RFC: draft of physical structure of server codeMats Kindahl16 Jun
  • Re: RFC: draft of physical structure of server codeMARK CALLAGHAN16 Jun
    • Re: RFC: draft of physical structure of server codeMats Kindahl16 Jun
      • Re: RFC: draft of physical structure of server codeMARK CALLAGHAN16 Jun