List:Internals« Previous MessageNext Message »
From:Øystein Grøvlen Date:May 25 2009 12:08pm
Subject:Re: Value objects and Protocol (WL#4760)
View as plain text  
Jay,

Thanks for taking time to share your ideas.  I agree that we could gain 
a lot if we could pass value objects around instead of going through all 
the transformations you describe.  However, this will be a pretty large 
task, but I think what I have done so far is a step in the right 
direction, and hopefully this work can evolve further.

Thanks,

--
Øystein

Jay Pipes wrote:
> Hi!  I've looked through your code and have a few comments inline. 
> Please understand these are just suggestions; I'm actually very 
> supportive of the concept of Value objects in the runtime, and I'll try 
> to explain some alternate strategies to ponder...
> 
> I will say ahead of time that I was not surprised to see this thread 
> immediately devolve into a discussion on the memory and performance 
> ramifications of the code change.  It seems that the performance of the 
> bits most often takes precedence over everything else, especially at the 
> cost of maintainability and code readability. A Value Object system is 
> designed to increase code maintainability and ease of use.  It is *not* 
> designed around performance.  It may well be true that a Value object 
> system may add additional memcpy's.  It may also be true that a Value 
> object system may reduce the *total* number of conversions/copies that 
> need to be done in the runtime.
> 
> Hopefully the comments below will shed some light on this, but I will 
> state now that my fondness for the Value object system stems not from a 
> desire for increased performance, but from a need for greater system 
> maintainability.
> 
> :)
> 
> Øystein Grøvlen wrote:
>> Hi,
>>
>> I am working on extracting much duplicated type conversion code from 
>> Item classes and Field classes into a Value class. (See 
>> http://forge.mysql.com/worklog/task.php?id=4760).  Note that the 
>> architectural review for this worklog is still pending, but a 
>> preliminary patch for the Field classes part can be found at 
>> http://lists.mysql.com/commits/74564.
>> Comments are very welcomed.
>>
>> I am currently looking at the interaction between Field and Protocol, 
>> and would appreciate some input.
>>
>> Each Field subclass has a send_binary() method (a bit misleading name 
>> since it depends on the protocol whether what it sends is binary or 
>> textual.)  Currently, these methods calls a corresponding 
>> Protocol::store_xxx() method, and in my preliminary patch this looks 
>> something like this:
>>
>> bool Field_short::send_binary(Protocol *protocol)
>> {
>>   return protocol->store_short(value().get_int());
>> }
>>
>> I think it would be a good idea to add a Protocol::store(Value) method 
>> that will be called from Field::send_binary(Protocol*).  Then all the 
>> overriding send_binary methods in the Field subclasses could be dropped.
>> Protocol_text::store(Value) could just call Value::to_string() to get 
>> the textual representation of the Value.  The question is what to do 
>> for Protocol_binary and other protocols.
> 
> May I present an alternate strategy?
> 
> Currently, you have one large mysql::Value class (kudos on using 
> namespacing BTW!) which "understands" how to convert its value into a 
> number of different native "types" (custom String class, integers, 
> my_decimal, etc... using the to_xxx() methods).
> 
> Up until now, this is basically what the Item class and its derived 
> classes already do via the val_str(), val_int(), etc methods.
> 
> So, you've actually just added an additional layer of abstraction on top 
> of the already complex Item tree.  The benefit of a Value object system 
> has yet to be realized since you have, to now, only really duplicated 
> the current system of type conversion within the server.
> 
> The true advantage of a Value object system is in two respects:
> 
> * the immutability of Value objects, once constructed
> * the ability of Value objects to transform themselves into another 
> Value object via construction
> 
> By this last bullet point I mean something like the following:
> 
> Number a= Number(20080911123059);
> DateTime b= DateTime(a);
> 
> This may look like a trivial piece of code...but in there is some beauty 
> which can remove or encapsulate a large chunk of type conversion code in 
> the server and allow code to be written in a more natural (IMHO) manner.
> 
> Assume that Number and DateTime are "Value objects".  Perhaps they 
> inherit from a base mysql::Value class, perhaps they don't.  By saying 
> they are "Value objects", I mean that:
> 
> a) Number and DateTime objects, once constructed, are immutable
> b) Number and DateTime can be constructed from instances of each other
> 
> Ignore, for now, the implementation of the above Value objects Number 
> and DateTime (operator and constructor overloads).  If we take the above 
> interface (of construction via another Value object instance), then we 
> can begin to refactor two distinct parts of the parser and runtime which 
> revolve around:
> 
> * Construction of constant things
> * Transformation and reduction of constant things into other constant 
> things
> 
> An example of the former would be in the Item tree constructed via a 
> simple statement such as:
> 
> CREATE TABLE t1 (
>   b DATETIME NOT NULL
> );
> 
> SELECT b FROM t1 WHERE b BETWEEN '20080911123059' AND '20090911123059';
> 
> In the current server, the SELECT statement above is parsed into a 
> Select_lex structure which contains a series of Item, Table and Field 
> objects.  The Item objects represent the constant strings 
> '20080911123059' and '20090911123059' as Item_string objects.  The Field 
> object represents the "b" field as a Field_datetime.
> 
> Because Field_datetime is evaluated by the MyISAM and HEAP storage 
> engines and runtime as 64-bit integers, each Item_string's val_int() 
> method is called to return a signed 64-bit integer number that is passed 
> to the runtime during its evaluation of the COND representing the 
> between condition on the "b" field. (Transformation #1)
> 
> These 64-bit signed integer are then passed to the 
> Field_datetime::store() method in the runtime and the runtime asks the 
> storage engine to give it an appropriate key to use in reading the data 
> from the t1 table.
> 
> Before it can pass the data to the storage engine, however, 
> Field_datetime::store() must first verify that the passed integer is 
> indeed a valid datetime.  So, it "parses" the signed 64-bit integer into 
> a timestamp-like number (see sql/time.cc:number_to_datetime()) 
> represented by the MYSQL_TIME struct. This struct contains temporal 
> information like year, month, day, etc. (Transformation #2)
> 
> Field_datetime::store() then places the pieces of this MYSQL_TIME struct 
>  into a series of raw uchar* bytes, pointed to by the Field::ptr member. 
> (Transformation #3)
> 
> The storage engine can then use these raw uchar* bytes to find and 
> retrieve the records needed by the call from the runtime to the record 
> pointer needed during send of the record back to the client over the 
> Protocol.
> 
> However, the Protocol itself will see (in Item::send()) that the 
> returned Field data type is of type MYSQL_TYPE_DATETIME, and will 
> construct a MYSQL_TIME instance by calling 
> Field_datetime::get_date(*MYSQL_TIME).
> 
> The get_date() method then takes the value of the retrieved uchar* 
> records, converts the uchar* into a signed 64-bit integer (via macros in 
> sql/korr.h) (Transformation #4).  These 64-bit integers are then passed 
> to the number_to_datetime() method to construct the MYSQL_TIME structs. 
> (Transformation #5) and these are passed back to the Item::send() method.
> 
> Item::send() then calls Protocol::store(*MYSQL_TIME), passing in the 
> MYSQL_TIME structs.  This method then transforms each MYSQL_TIME struct 
> into a series of char* via the datetime_to_str() methods, which calls 
> sprintf() to change the data into a textual format of "YYYY-MM-DD 
> HH:MM:SS" (Transformation #6) which is then sent along the wire in text 
> format...
> 
> As you can see, there are quite a few transformations which occur for 
> both the constant strings as well as the datetime data sent/retrieved 
> from the storage engine.
> 
> Once functions such as DATE_FORMAT(), CAST(), STR_TO_DATE() and others 
> get involved, the whole process can be downright overwhelming! :)
> 
> Currently, when you follow the code in the server, you weave in and out 
> of the various val_xxx(), store(xxx), get_xxx() methods, and in the case 
> of decimal and datetime conversions, all their associated routines.  It 
> can be very difficult to follow at times.
> 
> A Value object implementation can simplify much of this code spaghetti 
> and even get rid of many of the Item classes entirely, namely all of the 
> Item_xxx_typecast classes and the Cached_item classes.
> 
> So, how to go about implementing a Value object system in the runtime?
> 
> 1) Create a hierarchy of Value classes subclassing from a Value class. 
> You'll need classes for temporal objects like Date and Timestamp as well 
> as classes for a Number (don't have to break it into Decimal and 
> Natural, I'd encapsulate all that in one class) and a String class of 
> course...but not like the current String class; you'll want an immutable 
> one.
> 
> 2) Instead of the parser creating Item_string or Item_num objects, it 
> would create immutable Value objects of type String and Number.
> 
> 3) For each column in a SELECT's result and each WHERE condition you 
> might create a vector<> of Value object pointers.  Instead of calling 
> val_int(), val_str() and likewise, simply use the Session's vector<> of 
> Value objects as needed. Push and pop objects off the vector as needed, 
> and construct new Value objects from other ones.
> 
> For example, Let's take the example above.  The parser would construct a 
> new String value object from the string "20080911123059".
> 
> After parsing, some analysis of the parsed nodes is done, including a 
> name resolution step.  During this step, the "b" column would be 
> determined to be of type DATETIME.  A DateTime value object for each 
> side of the condition would then be pushed onto the Session's object 
> vector<> like so:
> 
> String *left_side= <call to get the left String value of the condition>
> session.values.push_back((Value *) new DateTime(left_side));
> 
> Within the optimizer, instead of using Field_datetime::store() to both 
> validate and transform the datetime-string data, the optimizer would 
> simply access the condition's constants like so:
> 
> DateTime *left_side= (DateTime *) session.values[0];
> DateTime *right_side= (DateTime *) session.values[1];
> 
> If temporal functions were used in a statement (say, DATE_FORMAT()), it 
> could work with DateTimes as DateTimes, with calendrical calculations 
> native to a DateTime object, instead of constantly having to convert 
> args[0] to a MYSQL_TIME or an integer or a string (just see the amount 
> of code in the temporal built-in functions for doing conversion...)
> 
> For implementing DATE_ADD(), for instance, there's a ton of code which 
> could be encapsulated in a Value object system that understands how to 
> add and subtract dates and times properly...
> 
> Anyway, these are all just thoughts to get the ideas flowing, and 
> nothing more.  In Drizzle-land, we're constantly thinking about this 
> very problem and how to tackle it.  It's not an easy one to address with 
> the current architecture, but hopefully we can share our successes and 
> failures and collaborate together on this :)
> 
> Cheers!
> 
> Jay
> 
Thread
Value objects and Protocol (WL#4760)Øystein Grøvlen22 May
  • Re: Value objects and Protocol (WL#4760)Konstantin Osipov22 May
    • Re: Value objects and Protocol (WL#4760)Alex Esterkin22 May
      • Re: Value objects and Protocol (WL#4760)Brian Aker22 May
        • Re: Value objects and Protocol (WL#4760)Alex Esterkin22 May
          • Re: Value objects and Protocol (WL#4760)Brian Aker22 May
          • Re: Value objects and Protocol (WL#4760)Sergei Golubchik23 May
        • Re: Value objects and Protocol (WL#4760)Konstantin Osipov23 May
          • Re: Value objects and Protocol (WL#4760)Brian Aker23 May
    • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen22 May
      • Re: Value objects and Protocol (WL#4760)Konstantin Osipov22 May
        • Re: Value objects and Protocol (WL#4760)Alex Esterkin22 May
          • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen25 May
        • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen24 May
          • Re: Value objects and Protocol (WL#4760)Konstantin Osipov24 May
      • Re: Value objects and Protocol (WL#4760)Konstantin Osipov23 May
        • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen25 May
          • Re: Value objects and Protocol (WL#4760)Michael Widenius6 Jun
            • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen8 Jun
            • Re: Value objects and Protocol (WL#4760)Jay Pipes8 Jun
              • Re: Value objects and Protocol (WL#4760)Jay Pipes8 Jun
  • Re: Value objects and Protocol (WL#4760)Jay Pipes24 May
    • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen25 May
      • Re: Value objects and Protocol (WL#4760)Jay Pipes26 May
    • Re: [Drizzle-discuss] Value objects and Protocol (WL#4760)Jim Starkey28 May
Re: Value objects and Protocol (WL#4760)Brian Aker22 May