Mats Kindahl wrote:
>
> Alaric Snell-Pym wrote:
>> Excellent news!
>>
>> One word of warning, though: make sure it's a series of small steps.
>> It's far too easy, with this sort of thing, ending up going off on
>> huge yak-shaving tangents. By all means take lots of small steps
>> towards a lofty distant goal, but make sure each step is useful in its
>> own right (even if just by allowing other steps to happen), or you can
>> get lost on a branch that will never merge ;-)
>
> Yes, we don't want to do the work in macro steps (at least not at this point),
> we want to proceed carefully.
>
>> I see that a few macroscopic tasks have appeared on the Forge already,
>> but I'd like to add something I think could be changed for the better,
>> on a grass-roots level throughout the codebase:
>>
>> I see a lot of methods that are called with arguments and return a
>> value, but most of their input and output is actually through member
>> fields of the object - not that the method is operating on its object
>> per se, but that the caller actually puts things into the member
>> fields, then calls the method, then inspects the results in member
>> fields. For example, in the storage engine API, update_row is called
>> with a buffer in unireg format, which it almost universally ignores
>> and instead uses the array of Field objects set up in the handler
>> object by the caller. And we spent some time in debugging our storage
>> engine - it would return rows fine when you did selects on the table,
>> but when you did certain types of join, it would fail to return any
>> rows, despite our logging clearly showing we'd returned rows to MySQL
>> - because it seems that sometimes MySQL not only looks at the return
>> value of rnd_next, but also checks the 'status' member of the table
>> object to see if the current row of the table is valid or not. So our
>> rnd_next had to assign success/failure to table->status as well as
>> returning success, and then everything worked OK. Doh.
>>
>> Making the inputs and outputs of every method/function explicit,
>> rather than sneaking stuff in and out via members, will make the
>> calling interfaces between things a lot easier to read, which will
>> reduce the chances of developers working on a module introducing bugs!
>> Plus, it'll simplify the classes a lot, and make them easier to read,
>> as they will end up with only members that really relate to the actual
>> domain object - eg, the table or whatever - rather than members that
>> are part of the calling protocol of particular operations on the
>> class. Less short-term mutable state in classes means they can be
>> shared between threads in more and more contexts, too, as function
>> arguments and return values live only on the thread-local stacks!
>
> Yes; the fact that the handler interface doesn't really honor the arguments has
> been a major bummer for me several times. This is actually because some engines
> that we support internally ignore the argument and use the stored records
> record[0] and record[1] instead, which means that every engine (with the
> exception of a few) started doing that. So now you have to both pass the
> argument and the record to make sure that all engines work.
>
> Just getting clear semantics on how this part of the handler interface works,
> and add assertion to weed out the bad usages, would simplify the code
> significantly and improve the speed for all.
++
> However, do you know of any other interfaces that work this way? I am personally
> not aware of any other, but then I don't know every corner of the code like Serg
> does. :)
"External" interfaces? See all the plugin "interfaces". There's no
enforcement of types really at all. Just passing void *'s around.
As for the internal interfaces, I would suggest cleaning up the class
interfaces of THD, JOIN, and other major classes to enforce public
accessors and getters, protecting private member variables behind a
clean API. This would, eventually, make some of these classes
semi-usable in public interfaces. Right now, the passing of the THD*
everywhere, and THD having basically a bunch of public member variables,
means that there is no enforcement of state changes through an
interface. This leads to serious problems where the "internal" state of
a THD is actually public and cannot be seen as reliable for the lifetime
of a session's requests.
Another thing to think about in your refactoring efforts is detaching
the THD from its current inheritance from Statement, Query_arena and
ilink. Without doing this, and using encapsulation so that a THD can
have multiple Statements, it will be very difficult to work on any
future parallelization efforts.
Cheers,
Jay