Øystein Grøvlen wrote:
> Jay,
>
> Thanks for taking time to share your ideas. I agree that we could gain
> a lot if we could pass value objects around instead of going through all
> the transformations you describe. However, this will be a pretty large
> task, but I think what I have done so far is a step in the right
> direction, and hopefully this work can evolve further.
Agreed :)
> Thanks,
>
> --
> Øystein
>
> Jay Pipes wrote:
>> Hi! I've looked through your code and have a few comments inline.
>> Please understand these are just suggestions; I'm actually very
>> supportive of the concept of Value objects in the runtime, and I'll
>> try to explain some alternate strategies to ponder...
>>
>> I will say ahead of time that I was not surprised to see this thread
>> immediately devolve into a discussion on the memory and performance
>> ramifications of the code change. It seems that the performance of
>> the bits most often takes precedence over everything else, especially
>> at the cost of maintainability and code readability. A Value Object
>> system is designed to increase code maintainability and ease of use.
>> It is *not* designed around performance. It may well be true that a
>> Value object system may add additional memcpy's. It may also be true
>> that a Value object system may reduce the *total* number of
>> conversions/copies that need to be done in the runtime.
>>
>> Hopefully the comments below will shed some light on this, but I will
>> state now that my fondness for the Value object system stems not from
>> a desire for increased performance, but from a need for greater system
>> maintainability.
>>
>> :)
>>
>> Øystein Grøvlen wrote:
>>> Hi,
>>>
>>> I am working on extracting much duplicated type conversion code from
>>> Item classes and Field classes into a Value class. (See
>>> http://forge.mysql.com/worklog/task.php?id=4760). Note that the
>>> architectural review for this worklog is still pending, but a
>>> preliminary patch for the Field classes part can be found at
>>> http://lists.mysql.com/commits/74564.
>>> Comments are very welcomed.
>>>
>>> I am currently looking at the interaction between Field and Protocol,
>>> and would appreciate some input.
>>>
>>> Each Field subclass has a send_binary() method (a bit misleading name
>>> since it depends on the protocol whether what it sends is binary or
>>> textual.) Currently, these methods calls a corresponding
>>> Protocol::store_xxx() method, and in my preliminary patch this looks
>>> something like this:
>>>
>>> bool Field_short::send_binary(Protocol *protocol)
>>> {
>>> return protocol->store_short(value().get_int());
>>> }
>>>
>>> I think it would be a good idea to add a Protocol::store(Value)
>>> method that will be called from Field::send_binary(Protocol*). Then
>>> all the overriding send_binary methods in the Field subclasses could
>>> be dropped.
>>> Protocol_text::store(Value) could just call Value::to_string() to get
>>> the textual representation of the Value. The question is what to do
>>> for Protocol_binary and other protocols.
>>
>> May I present an alternate strategy?
>>
>> Currently, you have one large mysql::Value class (kudos on using
>> namespacing BTW!) which "understands" how to convert its value into a
>> number of different native "types" (custom String class, integers,
>> my_decimal, etc... using the to_xxx() methods).
>>
>> Up until now, this is basically what the Item class and its derived
>> classes already do via the val_str(), val_int(), etc methods.
>>
>> So, you've actually just added an additional layer of abstraction on
>> top of the already complex Item tree. The benefit of a Value object
>> system has yet to be realized since you have, to now, only really
>> duplicated the current system of type conversion within the server.
>>
>> The true advantage of a Value object system is in two respects:
>>
>> * the immutability of Value objects, once constructed
>> * the ability of Value objects to transform themselves into another
>> Value object via construction
>>
>> By this last bullet point I mean something like the following:
>>
>> Number a= Number(20080911123059);
>> DateTime b= DateTime(a);
>>
>> This may look like a trivial piece of code...but in there is some
>> beauty which can remove or encapsulate a large chunk of type
>> conversion code in the server and allow code to be written in a more
>> natural (IMHO) manner.
>>
>> Assume that Number and DateTime are "Value objects". Perhaps they
>> inherit from a base mysql::Value class, perhaps they don't. By saying
>> they are "Value objects", I mean that:
>>
>> a) Number and DateTime objects, once constructed, are immutable
>> b) Number and DateTime can be constructed from instances of each other
>>
>> Ignore, for now, the implementation of the above Value objects Number
>> and DateTime (operator and constructor overloads). If we take the
>> above interface (of construction via another Value object instance),
>> then we can begin to refactor two distinct parts of the parser and
>> runtime which revolve around:
>>
>> * Construction of constant things
>> * Transformation and reduction of constant things into other constant
>> things
>>
>> An example of the former would be in the Item tree constructed via a
>> simple statement such as:
>>
>> CREATE TABLE t1 (
>> b DATETIME NOT NULL
>> );
>>
>> SELECT b FROM t1 WHERE b BETWEEN '20080911123059' AND '20090911123059';
>>
>> In the current server, the SELECT statement above is parsed into a
>> Select_lex structure which contains a series of Item, Table and Field
>> objects. The Item objects represent the constant strings
>> '20080911123059' and '20090911123059' as Item_string objects. The
>> Field object represents the "b" field as a Field_datetime.
>>
>> Because Field_datetime is evaluated by the MyISAM and HEAP storage
>> engines and runtime as 64-bit integers, each Item_string's val_int()
>> method is called to return a signed 64-bit integer number that is
>> passed to the runtime during its evaluation of the COND representing
>> the between condition on the "b" field. (Transformation #1)
>>
>> These 64-bit signed integer are then passed to the
>> Field_datetime::store() method in the runtime and the runtime asks the
>> storage engine to give it an appropriate key to use in reading the
>> data from the t1 table.
>>
>> Before it can pass the data to the storage engine, however,
>> Field_datetime::store() must first verify that the passed integer is
>> indeed a valid datetime. So, it "parses" the signed 64-bit integer
>> into a timestamp-like number (see sql/time.cc:number_to_datetime())
>> represented by the MYSQL_TIME struct. This struct contains temporal
>> information like year, month, day, etc. (Transformation #2)
>>
>> Field_datetime::store() then places the pieces of this MYSQL_TIME
>> struct into a series of raw uchar* bytes, pointed to by the
>> Field::ptr member. (Transformation #3)
>>
>> The storage engine can then use these raw uchar* bytes to find and
>> retrieve the records needed by the call from the runtime to the record
>> pointer needed during send of the record back to the client over the
>> Protocol.
>>
>> However, the Protocol itself will see (in Item::send()) that the
>> returned Field data type is of type MYSQL_TYPE_DATETIME, and will
>> construct a MYSQL_TIME instance by calling
>> Field_datetime::get_date(*MYSQL_TIME).
>>
>> The get_date() method then takes the value of the retrieved uchar*
>> records, converts the uchar* into a signed 64-bit integer (via macros
>> in sql/korr.h) (Transformation #4). These 64-bit integers are then
>> passed to the number_to_datetime() method to construct the MYSQL_TIME
>> structs. (Transformation #5) and these are passed back to the
>> Item::send() method.
>>
>> Item::send() then calls Protocol::store(*MYSQL_TIME), passing in the
>> MYSQL_TIME structs. This method then transforms each MYSQL_TIME
>> struct into a series of char* via the datetime_to_str() methods, which
>> calls sprintf() to change the data into a textual format of
>> "YYYY-MM-DD HH:MM:SS" (Transformation #6) which is then sent along the
>> wire in text format...
>>
>> As you can see, there are quite a few transformations which occur for
>> both the constant strings as well as the datetime data sent/retrieved
>> from the storage engine.
>>
>> Once functions such as DATE_FORMAT(), CAST(), STR_TO_DATE() and others
>> get involved, the whole process can be downright overwhelming! :)
>>
>> Currently, when you follow the code in the server, you weave in and
>> out of the various val_xxx(), store(xxx), get_xxx() methods, and in
>> the case of decimal and datetime conversions, all their associated
>> routines. It can be very difficult to follow at times.
>>
>> A Value object implementation can simplify much of this code spaghetti
>> and even get rid of many of the Item classes entirely, namely all of
>> the Item_xxx_typecast classes and the Cached_item classes.
>>
>> So, how to go about implementing a Value object system in the runtime?
>>
>> 1) Create a hierarchy of Value classes subclassing from a Value class.
>> You'll need classes for temporal objects like Date and Timestamp as
>> well as classes for a Number (don't have to break it into Decimal and
>> Natural, I'd encapsulate all that in one class) and a String class of
>> course...but not like the current String class; you'll want an
>> immutable one.
>>
>> 2) Instead of the parser creating Item_string or Item_num objects, it
>> would create immutable Value objects of type String and Number.
>>
>> 3) For each column in a SELECT's result and each WHERE condition you
>> might create a vector<> of Value object pointers. Instead of calling
>> val_int(), val_str() and likewise, simply use the Session's vector<>
>> of Value objects as needed. Push and pop objects off the vector as
>> needed, and construct new Value objects from other ones.
>>
>> For example, Let's take the example above. The parser would construct
>> a new String value object from the string "20080911123059".
>>
>> After parsing, some analysis of the parsed nodes is done, including a
>> name resolution step. During this step, the "b" column would be
>> determined to be of type DATETIME. A DateTime value object for each
>> side of the condition would then be pushed onto the Session's object
>> vector<> like so:
>>
>> String *left_side= <call to get the left String value of the condition>
>> session.values.push_back((Value *) new DateTime(left_side));
>>
>> Within the optimizer, instead of using Field_datetime::store() to both
>> validate and transform the datetime-string data, the optimizer would
>> simply access the condition's constants like so:
>>
>> DateTime *left_side= (DateTime *) session.values[0];
>> DateTime *right_side= (DateTime *) session.values[1];
>>
>> If temporal functions were used in a statement (say, DATE_FORMAT()),
>> it could work with DateTimes as DateTimes, with calendrical
>> calculations native to a DateTime object, instead of constantly having
>> to convert args[0] to a MYSQL_TIME or an integer or a string (just see
>> the amount of code in the temporal built-in functions for doing
>> conversion...)
>>
>> For implementing DATE_ADD(), for instance, there's a ton of code which
>> could be encapsulated in a Value object system that understands how to
>> add and subtract dates and times properly...
>>
>> Anyway, these are all just thoughts to get the ideas flowing, and
>> nothing more. In Drizzle-land, we're constantly thinking about this
>> very problem and how to tackle it. It's not an easy one to address
>> with the current architecture, but hopefully we can share our
>> successes and failures and collaborate together on this :)
>>
>> Cheers!
>>
>> Jay
>>