Let me play a devil's advocate role and throw in my grumpy 2 cents.
First, let us clarify what exactly constitutes the public handler API.
It is much more than the set of functions defined in the
sql/handler.h header file. Presently, you have to include the
entire transitive closure of all public methods of all classes used as
arguments. There are many storage engines out there, especially the
transactional and the column-oriented ones, that use THD member
variables and member functions. Download and take a look at the
Infobright Community Edition source code to see what I mean.
Second, the entire MySQL code based is glued in hundreds of different
ways by using the same classes and structures. In software
architecture terms, MySQL has an absolutely monolithic domain model.
To decouple different processing areas, you will have to create
independent domain models for each, so that you could consistently
implement separation of concerns. Presently, MySQL lacks separation of
concerns: the same parse tree is used at every stage of query
processing. On the other hand, in Postgres, a parse tree is
represented by a Query structure (a tree of C structures), the query
optimizer generates and considers a bunch of throwaway Path tree
structures, then the query planner generates a tree of Plan nodes
(Plan tree). The query execution is a state machine; Init...(_)
functions generate various PlannerInfo structures/nodes, which are
ultimately responsible for runtime query execution. In the Postgres
world, only Query tree and Plan tree need to be copyable and
serializable. This makes sorce tree organization simpler and
cleaner. Without implementing such separation of concerns, MySQL
architecture will remain monolithic. [To be fair, Postgres code is
monolithic as well, but in a very different way]
Third, for many storage engines, there will be extra software
engineering cost of chasing your refactoring changes, especially if
some of the changes go against storage engine's architectural
I wonder if the MySQL refactoring news may indicate a tug of war
between US based and EU based Sun / MySQL teams? We already have one
MySQL refactoring project under way? It is called Drizzle. Is this a
"Drizzle Reloaded" project? By the way, Drizzle can be used as an
illustration of how easy it is to go too far: views and prepared
statements are no longer supported in Drizzle.
On Tue, May 12, 2009 at 12:39 PM, Jay Pipes <Jay.Pipes@stripped> wrote:
> Jay Pipes wrote:
>> Mats Kindahl wrote:
>>> Alaric Snell-Pym wrote:
>>>> Excellent news!
>>>> One word of warning, though: make sure it's a series of small steps.
>>>> It's far too easy, with this sort of thing, ending up going off on
>>>> huge yak-shaving tangents. By all means take lots of small steps
>>>> towards a lofty distant goal, but make sure each step is useful in its
>>>> own right (even if just by allowing other steps to happen), or you can
>>>> get lost on a branch that will never merge ;-)
>>> Yes, we don't want to do the work in macro steps (at least not at this
>>> we want to proceed carefully.
>>>> I see that a few macroscopic tasks have appeared on the Forge already,
>>>> but I'd like to add something I think could be changed for the better,
>>>> on a grass-roots level throughout the codebase:
>>>> I see a lot of methods that are called with arguments and return a
>>>> value, but most of their input and output is actually through member
>>>> fields of the object - not that the method is operating on its object
>>>> per se, but that the caller actually puts things into the member
>>>> fields, then calls the method, then inspects the results in member
>>>> fields. For example, in the storage engine API, update_row is called
>>>> with a buffer in unireg format, which it almost universally ignores
>>>> and instead uses the array of Field objects set up in the handler
>>>> object by the caller. And we spent some time in debugging our storage
>>>> engine - it would return rows fine when you did selects on the table,
>>>> but when you did certain types of join, it would fail to return any
>>>> rows, despite our logging clearly showing we'd returned rows to MySQL
>>>> - because it seems that sometimes MySQL not only looks at the return
>>>> value of rnd_next, but also checks the 'status' member of the table
>>>> object to see if the current row of the table is valid or not. So our
>>>> rnd_next had to assign success/failure to table->status as well as
>>>> returning success, and then everything worked OK. Doh.
>>>> Making the inputs and outputs of every method/function explicit,
>>>> rather than sneaking stuff in and out via members, will make the
>>>> calling interfaces between things a lot easier to read, which will
>>>> reduce the chances of developers working on a module introducing bugs!
>>>> Plus, it'll simplify the classes a lot, and make them easier to read,
>>>> as they will end up with only members that really relate to the actual
>>>> domain object - eg, the table or whatever - rather than members that
>>>> are part of the calling protocol of particular operations on the
>>>> class. Less short-term mutable state in classes means they can be
>>>> shared between threads in more and more contexts, too, as function
>>>> arguments and return values live only on the thread-local stacks!
>>> Yes; the fact that the handler interface doesn't really honor the arguments
>>> been a major bummer for me several times. This is actually because some
>>> that we support internally ignore the argument and use the stored records
>>> record and record instead, which means that every engine (with the
>>> exception of a few) started doing that. So now you have to both pass the
>>> argument and the record to make sure that all engines work.
>>> Just getting clear semantics on how this part of the handler interface
>>> and add assertion to weed out the bad usages, would simplify the code
>>> significantly and improve the speed for all.
>>> However, do you know of any other interfaces that work this way? I am
>>> not aware of any other, but then I don't know every corner of the code like
>>> does. :)
>> "External" interfaces? See all the plugin "interfaces". There's no
> enforcement of types really at all. Just passing void *'s around.
>> As for the internal interfaces, I would suggest cleaning up the class interfaces
> of THD, JOIN, and other major classes to enforce public accessors and getters, protecting
> private member variables behind a clean API. This would, eventually, make some of
> these classes semi-usable in public interfaces. Right now, the passing of the THD*
> everywhere, and THD having basically a bunch of public member variables, means that there
> is no enforcement of state changes through an interface. This leads to serious
> problems where the "internal" state of a THD is actually public and cannot be seen as
> reliable for the lifetime of a session's requests.
>> Another thing to think about in your refactoring efforts is detaching the THD
> from its current inheritance from Statement, Query_arena and ilink. Without doing
> this, and using encapsulation so that a THD can
>> have multiple Statements, it will be very difficult to work on any future
> parallelization efforts.
> MySQL Internals Mailing List
> For list archives: http://lists.mysql.com/internals
> To unsubscribe: http://lists.mysql.com/internals?unsub=1