List:Internals« Previous MessageNext Message »
From:Jay Pipes Date:May 24 2009 3:28pm
Subject:Re: Value objects and Protocol (WL#4760)
View as plain text  
Hi!  I've looked through your code and have a few comments inline. 
Please understand these are just suggestions; I'm actually very 
supportive of the concept of Value objects in the runtime, and I'll try 
to explain some alternate strategies to ponder...

I will say ahead of time that I was not surprised to see this thread 
immediately devolve into a discussion on the memory and performance 
ramifications of the code change.  It seems that the performance of the 
bits most often takes precedence over everything else, especially at the 
cost of maintainability and code readability. A Value Object system is 
designed to increase code maintainability and ease of use.  It is *not* 
designed around performance.  It may well be true that a Value object 
system may add additional memcpy's.  It may also be true that a Value 
object system may reduce the *total* number of conversions/copies that 
need to be done in the runtime.

Hopefully the comments below will shed some light on this, but I will 
state now that my fondness for the Value object system stems not from a 
desire for increased performance, but from a need for greater system 
maintainability.

:)

Øystein Grøvlen wrote:
> Hi,
> 
> I am working on extracting much duplicated type conversion code from 
> Item classes and Field classes into a Value class. (See 
> http://forge.mysql.com/worklog/task.php?id=4760).  Note that the 
> architectural review for this worklog is still pending, but a 
> preliminary patch for the Field classes part can be found at 
> http://lists.mysql.com/commits/74564.
> Comments are very welcomed.
> 
> I am currently looking at the interaction between Field and Protocol, 
> and would appreciate some input.
> 
> Each Field subclass has a send_binary() method (a bit misleading name 
> since it depends on the protocol whether what it sends is binary or 
> textual.)  Currently, these methods calls a corresponding 
> Protocol::store_xxx() method, and in my preliminary patch this looks 
> something like this:
> 
> bool Field_short::send_binary(Protocol *protocol)
> {
>   return protocol->store_short(value().get_int());
> }
> 
> I think it would be a good idea to add a Protocol::store(Value) method 
> that will be called from Field::send_binary(Protocol*).  Then all the 
> overriding send_binary methods in the Field subclasses could be dropped.
> Protocol_text::store(Value) could just call Value::to_string() to get 
> the textual representation of the Value.  The question is what to do for 
> Protocol_binary and other protocols.

May I present an alternate strategy?

Currently, you have one large mysql::Value class (kudos on using 
namespacing BTW!) which "understands" how to convert its value into a 
number of different native "types" (custom String class, integers, 
my_decimal, etc... using the to_xxx() methods).

Up until now, this is basically what the Item class and its derived 
classes already do via the val_str(), val_int(), etc methods.

So, you've actually just added an additional layer of abstraction on top 
of the already complex Item tree.  The benefit of a Value object system 
has yet to be realized since you have, to now, only really duplicated 
the current system of type conversion within the server.

The true advantage of a Value object system is in two respects:

* the immutability of Value objects, once constructed
* the ability of Value objects to transform themselves into another 
Value object via construction

By this last bullet point I mean something like the following:

Number a= Number(20080911123059);
DateTime b= DateTime(a);

This may look like a trivial piece of code...but in there is some beauty 
which can remove or encapsulate a large chunk of type conversion code in 
the server and allow code to be written in a more natural (IMHO) manner.

Assume that Number and DateTime are "Value objects".  Perhaps they 
inherit from a base mysql::Value class, perhaps they don't.  By saying 
they are "Value objects", I mean that:

a) Number and DateTime objects, once constructed, are immutable
b) Number and DateTime can be constructed from instances of each other

Ignore, for now, the implementation of the above Value objects Number 
and DateTime (operator and constructor overloads).  If we take the above 
interface (of construction via another Value object instance), then we 
can begin to refactor two distinct parts of the parser and runtime which 
revolve around:

* Construction of constant things
* Transformation and reduction of constant things into other constant things

An example of the former would be in the Item tree constructed via a 
simple statement such as:

CREATE TABLE t1 (
   b DATETIME NOT NULL
);

SELECT b FROM t1 WHERE b BETWEEN '20080911123059' AND '20090911123059';

In the current server, the SELECT statement above is parsed into a 
Select_lex structure which contains a series of Item, Table and Field 
objects.  The Item objects represent the constant strings 
'20080911123059' and '20090911123059' as Item_string objects.  The Field 
object represents the "b" field as a Field_datetime.

Because Field_datetime is evaluated by the MyISAM and HEAP storage 
engines and runtime as 64-bit integers, each Item_string's val_int() 
method is called to return a signed 64-bit integer number that is passed 
to the runtime during its evaluation of the COND representing the 
between condition on the "b" field. (Transformation #1)

These 64-bit signed integer are then passed to the 
Field_datetime::store() method in the runtime and the runtime asks the 
storage engine to give it an appropriate key to use in reading the data 
from the t1 table.

Before it can pass the data to the storage engine, however, 
Field_datetime::store() must first verify that the passed integer is 
indeed a valid datetime.  So, it "parses" the signed 64-bit integer into 
a timestamp-like number (see sql/time.cc:number_to_datetime()) 
represented by the MYSQL_TIME struct. This struct contains temporal 
information like year, month, day, etc. (Transformation #2)

Field_datetime::store() then places the pieces of this MYSQL_TIME struct 
  into a series of raw uchar* bytes, pointed to by the Field::ptr 
member. (Transformation #3)

The storage engine can then use these raw uchar* bytes to find and 
retrieve the records needed by the call from the runtime to the record 
pointer needed during send of the record back to the client over the 
Protocol.

However, the Protocol itself will see (in Item::send()) that the 
returned Field data type is of type MYSQL_TYPE_DATETIME, and will 
construct a MYSQL_TIME instance by calling 
Field_datetime::get_date(*MYSQL_TIME).

The get_date() method then takes the value of the retrieved uchar* 
records, converts the uchar* into a signed 64-bit integer (via macros in 
sql/korr.h) (Transformation #4).  These 64-bit integers are then passed 
to the number_to_datetime() method to construct the MYSQL_TIME structs. 
(Transformation #5) and these are passed back to the Item::send() method.

Item::send() then calls Protocol::store(*MYSQL_TIME), passing in the 
MYSQL_TIME structs.  This method then transforms each MYSQL_TIME struct 
into a series of char* via the datetime_to_str() methods, which calls 
sprintf() to change the data into a textual format of "YYYY-MM-DD 
HH:MM:SS" (Transformation #6) which is then sent along the wire in text 
format...

As you can see, there are quite a few transformations which occur for 
both the constant strings as well as the datetime data sent/retrieved 
from the storage engine.

Once functions such as DATE_FORMAT(), CAST(), STR_TO_DATE() and others 
get involved, the whole process can be downright overwhelming! :)

Currently, when you follow the code in the server, you weave in and out 
of the various val_xxx(), store(xxx), get_xxx() methods, and in the case 
of decimal and datetime conversions, all their associated routines.  It 
can be very difficult to follow at times.

A Value object implementation can simplify much of this code spaghetti 
and even get rid of many of the Item classes entirely, namely all of the 
Item_xxx_typecast classes and the Cached_item classes.

So, how to go about implementing a Value object system in the runtime?

1) Create a hierarchy of Value classes subclassing from a Value class. 
You'll need classes for temporal objects like Date and Timestamp as well 
as classes for a Number (don't have to break it into Decimal and 
Natural, I'd encapsulate all that in one class) and a String class of 
course...but not like the current String class; you'll want an immutable 
one.

2) Instead of the parser creating Item_string or Item_num objects, it 
would create immutable Value objects of type String and Number.

3) For each column in a SELECT's result and each WHERE condition you 
might create a vector<> of Value object pointers.  Instead of calling 
val_int(), val_str() and likewise, simply use the Session's vector<> of 
Value objects as needed. Push and pop objects off the vector as needed, 
and construct new Value objects from other ones.

For example, Let's take the example above.  The parser would construct a 
new String value object from the string "20080911123059".

After parsing, some analysis of the parsed nodes is done, including a 
name resolution step.  During this step, the "b" column would be 
determined to be of type DATETIME.  A DateTime value object for each 
side of the condition would then be pushed onto the Session's object 
vector<> like so:

String *left_side= <call to get the left String value of the condition>
session.values.push_back((Value *) new DateTime(left_side));

Within the optimizer, instead of using Field_datetime::store() to both 
validate and transform the datetime-string data, the optimizer would 
simply access the condition's constants like so:

DateTime *left_side= (DateTime *) session.values[0];
DateTime *right_side= (DateTime *) session.values[1];

If temporal functions were used in a statement (say, DATE_FORMAT()), it 
could work with DateTimes as DateTimes, with calendrical calculations 
native to a DateTime object, instead of constantly having to convert 
args[0] to a MYSQL_TIME or an integer or a string (just see the amount 
of code in the temporal built-in functions for doing conversion...)

For implementing DATE_ADD(), for instance, there's a ton of code which 
could be encapsulated in a Value object system that understands how to 
add and subtract dates and times properly...

Anyway, these are all just thoughts to get the ideas flowing, and 
nothing more.  In Drizzle-land, we're constantly thinking about this 
very problem and how to tackle it.  It's not an easy one to address with 
the current architecture, but hopefully we can share our successes and 
failures and collaborate together on this :)

Cheers!

Jay

> One option for Protocol_binary would be to introduce a 
> Value::to_binary() method. However, I feel that the Value object may 
> take on too much responsibility if it is to know about value 
> representation in different protocols and have a to_xxx() method for each.
> 
> An alternative is to add a Value::type() method and let each protocol do 
> its own conversion based on the type.  This creates another dependency 
> since the type representation used by Value will have to be exposed.
> 
> Both alternatives means that Value needs to know the exact SQL type of a 
> Value.  For the current patch, it is sufficient for Value to know it is 
> an integer, whether it is a 1-byte integer or a 4-byte is not relevant 
> for conversion to real, string, or decimal.
> 
> Opinions?
> 
> -- 
> Øystein Grøvlen, Senior Staff Engineer
> Sun Microsystems, Database Group
> Trondheim, Norway
> 
> 
> 
> 
> 
> 

Thread
Value objects and Protocol (WL#4760)Øystein Grøvlen22 May
  • Re: Value objects and Protocol (WL#4760)Konstantin Osipov22 May
    • Re: Value objects and Protocol (WL#4760)Alex Esterkin22 May
      • Re: Value objects and Protocol (WL#4760)Brian Aker22 May
        • Re: Value objects and Protocol (WL#4760)Alex Esterkin22 May
          • Re: Value objects and Protocol (WL#4760)Brian Aker22 May
          • Re: Value objects and Protocol (WL#4760)Sergei Golubchik23 May
        • Re: Value objects and Protocol (WL#4760)Konstantin Osipov23 May
          • Re: Value objects and Protocol (WL#4760)Brian Aker23 May
    • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen22 May
      • Re: Value objects and Protocol (WL#4760)Konstantin Osipov22 May
        • Re: Value objects and Protocol (WL#4760)Alex Esterkin22 May
          • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen25 May
        • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen24 May
          • Re: Value objects and Protocol (WL#4760)Konstantin Osipov24 May
      • Re: Value objects and Protocol (WL#4760)Konstantin Osipov23 May
        • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen25 May
          • Re: Value objects and Protocol (WL#4760)Michael Widenius6 Jun
            • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen8 Jun
            • Re: Value objects and Protocol (WL#4760)Jay Pipes8 Jun
              • Re: Value objects and Protocol (WL#4760)Jay Pipes8 Jun
  • Re: Value objects and Protocol (WL#4760)Jay Pipes24 May
    • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen25 May
      • Re: Value objects and Protocol (WL#4760)Jay Pipes26 May
    • Re: [Drizzle-discuss] Value objects and Protocol (WL#4760)Jim Starkey28 May
Re: Value objects and Protocol (WL#4760)Brian Aker22 May