Hi! I've looked through your code and have a few comments inline.
Please understand these are just suggestions; I'm actually very
supportive of the concept of Value objects in the runtime, and I'll try
to explain some alternate strategies to ponder...
I will say ahead of time that I was not surprised to see this thread
immediately devolve into a discussion on the memory and performance
ramifications of the code change. It seems that the performance of the
bits most often takes precedence over everything else, especially at the
cost of maintainability and code readability. A Value Object system is
designed to increase code maintainability and ease of use. It is *not*
designed around performance. It may well be true that a Value object
system may add additional memcpy's. It may also be true that a Value
object system may reduce the *total* number of conversions/copies that
need to be done in the runtime.
Hopefully the comments below will shed some light on this, but I will
state now that my fondness for the Value object system stems not from a
desire for increased performance, but from a need for greater system
maintainability.
:)
Øystein Grøvlen wrote:
> Hi,
>
> I am working on extracting much duplicated type conversion code from
> Item classes and Field classes into a Value class. (See
> http://forge.mysql.com/worklog/task.php?id=4760). Note that the
> architectural review for this worklog is still pending, but a
> preliminary patch for the Field classes part can be found at
> http://lists.mysql.com/commits/74564.
> Comments are very welcomed.
>
> I am currently looking at the interaction between Field and Protocol,
> and would appreciate some input.
>
> Each Field subclass has a send_binary() method (a bit misleading name
> since it depends on the protocol whether what it sends is binary or
> textual.) Currently, these methods calls a corresponding
> Protocol::store_xxx() method, and in my preliminary patch this looks
> something like this:
>
> bool Field_short::send_binary(Protocol *protocol)
> {
> return protocol->store_short(value().get_int());
> }
>
> I think it would be a good idea to add a Protocol::store(Value) method
> that will be called from Field::send_binary(Protocol*). Then all the
> overriding send_binary methods in the Field subclasses could be dropped.
> Protocol_text::store(Value) could just call Value::to_string() to get
> the textual representation of the Value. The question is what to do for
> Protocol_binary and other protocols.
May I present an alternate strategy?
Currently, you have one large mysql::Value class (kudos on using
namespacing BTW!) which "understands" how to convert its value into a
number of different native "types" (custom String class, integers,
my_decimal, etc... using the to_xxx() methods).
Up until now, this is basically what the Item class and its derived
classes already do via the val_str(), val_int(), etc methods.
So, you've actually just added an additional layer of abstraction on top
of the already complex Item tree. The benefit of a Value object system
has yet to be realized since you have, to now, only really duplicated
the current system of type conversion within the server.
The true advantage of a Value object system is in two respects:
* the immutability of Value objects, once constructed
* the ability of Value objects to transform themselves into another
Value object via construction
By this last bullet point I mean something like the following:
Number a= Number(20080911123059);
DateTime b= DateTime(a);
This may look like a trivial piece of code...but in there is some beauty
which can remove or encapsulate a large chunk of type conversion code in
the server and allow code to be written in a more natural (IMHO) manner.
Assume that Number and DateTime are "Value objects". Perhaps they
inherit from a base mysql::Value class, perhaps they don't. By saying
they are "Value objects", I mean that:
a) Number and DateTime objects, once constructed, are immutable
b) Number and DateTime can be constructed from instances of each other
Ignore, for now, the implementation of the above Value objects Number
and DateTime (operator and constructor overloads). If we take the above
interface (of construction via another Value object instance), then we
can begin to refactor two distinct parts of the parser and runtime which
revolve around:
* Construction of constant things
* Transformation and reduction of constant things into other constant things
An example of the former would be in the Item tree constructed via a
simple statement such as:
CREATE TABLE t1 (
b DATETIME NOT NULL
);
SELECT b FROM t1 WHERE b BETWEEN '20080911123059' AND '20090911123059';
In the current server, the SELECT statement above is parsed into a
Select_lex structure which contains a series of Item, Table and Field
objects. The Item objects represent the constant strings
'20080911123059' and '20090911123059' as Item_string objects. The Field
object represents the "b" field as a Field_datetime.
Because Field_datetime is evaluated by the MyISAM and HEAP storage
engines and runtime as 64-bit integers, each Item_string's val_int()
method is called to return a signed 64-bit integer number that is passed
to the runtime during its evaluation of the COND representing the
between condition on the "b" field. (Transformation #1)
These 64-bit signed integer are then passed to the
Field_datetime::store() method in the runtime and the runtime asks the
storage engine to give it an appropriate key to use in reading the data
from the t1 table.
Before it can pass the data to the storage engine, however,
Field_datetime::store() must first verify that the passed integer is
indeed a valid datetime. So, it "parses" the signed 64-bit integer into
a timestamp-like number (see sql/time.cc:number_to_datetime())
represented by the MYSQL_TIME struct. This struct contains temporal
information like year, month, day, etc. (Transformation #2)
Field_datetime::store() then places the pieces of this MYSQL_TIME struct
into a series of raw uchar* bytes, pointed to by the Field::ptr
member. (Transformation #3)
The storage engine can then use these raw uchar* bytes to find and
retrieve the records needed by the call from the runtime to the record
pointer needed during send of the record back to the client over the
Protocol.
However, the Protocol itself will see (in Item::send()) that the
returned Field data type is of type MYSQL_TYPE_DATETIME, and will
construct a MYSQL_TIME instance by calling
Field_datetime::get_date(*MYSQL_TIME).
The get_date() method then takes the value of the retrieved uchar*
records, converts the uchar* into a signed 64-bit integer (via macros in
sql/korr.h) (Transformation #4). These 64-bit integers are then passed
to the number_to_datetime() method to construct the MYSQL_TIME structs.
(Transformation #5) and these are passed back to the Item::send() method.
Item::send() then calls Protocol::store(*MYSQL_TIME), passing in the
MYSQL_TIME structs. This method then transforms each MYSQL_TIME struct
into a series of char* via the datetime_to_str() methods, which calls
sprintf() to change the data into a textual format of "YYYY-MM-DD
HH:MM:SS" (Transformation #6) which is then sent along the wire in text
format...
As you can see, there are quite a few transformations which occur for
both the constant strings as well as the datetime data sent/retrieved
from the storage engine.
Once functions such as DATE_FORMAT(), CAST(), STR_TO_DATE() and others
get involved, the whole process can be downright overwhelming! :)
Currently, when you follow the code in the server, you weave in and out
of the various val_xxx(), store(xxx), get_xxx() methods, and in the case
of decimal and datetime conversions, all their associated routines. It
can be very difficult to follow at times.
A Value object implementation can simplify much of this code spaghetti
and even get rid of many of the Item classes entirely, namely all of the
Item_xxx_typecast classes and the Cached_item classes.
So, how to go about implementing a Value object system in the runtime?
1) Create a hierarchy of Value classes subclassing from a Value class.
You'll need classes for temporal objects like Date and Timestamp as well
as classes for a Number (don't have to break it into Decimal and
Natural, I'd encapsulate all that in one class) and a String class of
course...but not like the current String class; you'll want an immutable
one.
2) Instead of the parser creating Item_string or Item_num objects, it
would create immutable Value objects of type String and Number.
3) For each column in a SELECT's result and each WHERE condition you
might create a vector<> of Value object pointers. Instead of calling
val_int(), val_str() and likewise, simply use the Session's vector<> of
Value objects as needed. Push and pop objects off the vector as needed,
and construct new Value objects from other ones.
For example, Let's take the example above. The parser would construct a
new String value object from the string "20080911123059".
After parsing, some analysis of the parsed nodes is done, including a
name resolution step. During this step, the "b" column would be
determined to be of type DATETIME. A DateTime value object for each
side of the condition would then be pushed onto the Session's object
vector<> like so:
String *left_side= <call to get the left String value of the condition>
session.values.push_back((Value *) new DateTime(left_side));
Within the optimizer, instead of using Field_datetime::store() to both
validate and transform the datetime-string data, the optimizer would
simply access the condition's constants like so:
DateTime *left_side= (DateTime *) session.values[0];
DateTime *right_side= (DateTime *) session.values[1];
If temporal functions were used in a statement (say, DATE_FORMAT()), it
could work with DateTimes as DateTimes, with calendrical calculations
native to a DateTime object, instead of constantly having to convert
args[0] to a MYSQL_TIME or an integer or a string (just see the amount
of code in the temporal built-in functions for doing conversion...)
For implementing DATE_ADD(), for instance, there's a ton of code which
could be encapsulated in a Value object system that understands how to
add and subtract dates and times properly...
Anyway, these are all just thoughts to get the ideas flowing, and
nothing more. In Drizzle-land, we're constantly thinking about this
very problem and how to tackle it. It's not an easy one to address with
the current architecture, but hopefully we can share our successes and
failures and collaborate together on this :)
Cheers!
Jay
> One option for Protocol_binary would be to introduce a
> Value::to_binary() method. However, I feel that the Value object may
> take on too much responsibility if it is to know about value
> representation in different protocols and have a to_xxx() method for each.
>
> An alternative is to add a Value::type() method and let each protocol do
> its own conversion based on the type. This creates another dependency
> since the type representation used by Value will have to be exposed.
>
> Both alternatives means that Value needs to know the exact SQL type of a
> Value. For the current patch, it is sufficient for Value to know it is
> an integer, whether it is a 1-byte integer or a 4-byte is not relevant
> for conversion to real, string, or decimal.
>
> Opinions?
>
> --
> Øystein Grøvlen, Senior Staff Engineer
> Sun Microsystems, Database Group
> Trondheim, Norway
>
>
>
>
>
>