From: Jay Pipes Date: May 12 2009 4:39pm Subject: Re: MySQL Reengineering Project List-Archive: http://lists.mysql.com/internals/36642 Message-Id: <4A09A652.5010901@sun.com> MIME-Version: 1.0 Content-Type: text/plain; CHARSET=US-ASCII; format=flowed Content-Transfer-Encoding: 7BIT Jay Pipes wrote: > Mats Kindahl wrote: >> >> Alaric Snell-Pym wrote: >>> Excellent news! >>> >>> One word of warning, though: make sure it's a series of small steps. >>> It's far too easy, with this sort of thing, ending up going off on >>> huge yak-shaving tangents. By all means take lots of small steps >>> towards a lofty distant goal, but make sure each step is useful in its >>> own right (even if just by allowing other steps to happen), or you can >>> get lost on a branch that will never merge ;-) >> >> Yes, we don't want to do the work in macro steps (at least not at this >> point), >> we want to proceed carefully. >> >>> I see that a few macroscopic tasks have appeared on the Forge already, >>> but I'd like to add something I think could be changed for the better, >>> on a grass-roots level throughout the codebase: >>> >>> I see a lot of methods that are called with arguments and return a >>> value, but most of their input and output is actually through member >>> fields of the object - not that the method is operating on its object >>> per se, but that the caller actually puts things into the member >>> fields, then calls the method, then inspects the results in member >>> fields. For example, in the storage engine API, update_row is called >>> with a buffer in unireg format, which it almost universally ignores >>> and instead uses the array of Field objects set up in the handler >>> object by the caller. And we spent some time in debugging our storage >>> engine - it would return rows fine when you did selects on the table, >>> but when you did certain types of join, it would fail to return any >>> rows, despite our logging clearly showing we'd returned rows to MySQL >>> - because it seems that sometimes MySQL not only looks at the return >>> value of rnd_next, but also checks the 'status' member of the table >>> object to see if the current row of the table is valid or not. So our >>> rnd_next had to assign success/failure to table->status as well as >>> returning success, and then everything worked OK. Doh. >>> >>> Making the inputs and outputs of every method/function explicit, >>> rather than sneaking stuff in and out via members, will make the >>> calling interfaces between things a lot easier to read, which will >>> reduce the chances of developers working on a module introducing bugs! >>> Plus, it'll simplify the classes a lot, and make them easier to read, >>> as they will end up with only members that really relate to the actual >>> domain object - eg, the table or whatever - rather than members that >>> are part of the calling protocol of particular operations on the >>> class. Less short-term mutable state in classes means they can be >>> shared between threads in more and more contexts, too, as function >>> arguments and return values live only on the thread-local stacks! >> >> Yes; the fact that the handler interface doesn't really honor the >> arguments has >> been a major bummer for me several times. This is actually because >> some engines >> that we support internally ignore the argument and use the stored records >> record[0] and record[1] instead, which means that every engine (with the >> exception of a few) started doing that. So now you have to both pass the >> argument and the record to make sure that all engines work. >> >> Just getting clear semantics on how this part of the handler interface >> works, >> and add assertion to weed out the bad usages, would simplify the code >> significantly and improve the speed for all. > > ++ > >> However, do you know of any other interfaces that work this way? I am >> personally >> not aware of any other, but then I don't know every corner of the code >> like Serg >> does. :) > > "External" interfaces? See all the plugin "interfaces". There's no > enforcement of types really at all. Just passing void *'s around. > > As for the internal interfaces, I would suggest cleaning up the class > interfaces of THD, JOIN, and other major classes to enforce public > accessors and getters, protecting private member variables behind a > clean API. This would, eventually, make some of these classes > semi-usable in public interfaces. Right now, the passing of the THD* > everywhere, and THD having basically a bunch of public member variables, > means that there is no enforcement of state changes through an > interface. This leads to serious problems where the "internal" state of > a THD is actually public and cannot be seen as reliable for the lifetime > of a session's requests. > > Another thing to think about in your refactoring efforts is detaching > the THD from its current inheritance from Statement, Query_arena and > ilink. Without doing this, and using encapsulation so that a THD can s/encapsulation/composition/ > have multiple Statements, it will be very difficult to work on any > future parallelization efforts. > > Cheers, > > Jay >