List:Internals« Previous MessageNext Message »
From:Jay Pipes Date:May 26 2009 5:20pm
Subject:Re: Value objects and Protocol (WL#4760)
View as plain text  
Øystein Grøvlen wrote:
> Jay,
> Thanks for taking time to share your ideas.  I agree that we could gain 
> a lot if we could pass value objects around instead of going through all 
> the transformations you describe.  However, this will be a pretty large 
> task, but I think what I have done so far is a step in the right 
> direction, and hopefully this work can evolve further.

Agreed :)

> Thanks,
> -- 
> Øystein
> Jay Pipes wrote:
>> Hi!  I've looked through your code and have a few comments inline. 
>> Please understand these are just suggestions; I'm actually very 
>> supportive of the concept of Value objects in the runtime, and I'll 
>> try to explain some alternate strategies to ponder...
>> I will say ahead of time that I was not surprised to see this thread 
>> immediately devolve into a discussion on the memory and performance 
>> ramifications of the code change.  It seems that the performance of 
>> the bits most often takes precedence over everything else, especially 
>> at the cost of maintainability and code readability. A Value Object 
>> system is designed to increase code maintainability and ease of use.  
>> It is *not* designed around performance.  It may well be true that a 
>> Value object system may add additional memcpy's.  It may also be true 
>> that a Value object system may reduce the *total* number of 
>> conversions/copies that need to be done in the runtime.
>> Hopefully the comments below will shed some light on this, but I will 
>> state now that my fondness for the Value object system stems not from 
>> a desire for increased performance, but from a need for greater system 
>> maintainability.
>> :)
>> Øystein Grøvlen wrote:
>>> Hi,
>>> I am working on extracting much duplicated type conversion code from 
>>> Item classes and Field classes into a Value class. (See 
>>>  Note that the 
>>> architectural review for this worklog is still pending, but a 
>>> preliminary patch for the Field classes part can be found at 
>>> Comments are very welcomed.
>>> I am currently looking at the interaction between Field and Protocol, 
>>> and would appreciate some input.
>>> Each Field subclass has a send_binary() method (a bit misleading name 
>>> since it depends on the protocol whether what it sends is binary or 
>>> textual.)  Currently, these methods calls a corresponding 
>>> Protocol::store_xxx() method, and in my preliminary patch this looks 
>>> something like this:
>>> bool Field_short::send_binary(Protocol *protocol)
>>> {
>>>   return protocol->store_short(value().get_int());
>>> }
>>> I think it would be a good idea to add a Protocol::store(Value) 
>>> method that will be called from Field::send_binary(Protocol*).  Then 
>>> all the overriding send_binary methods in the Field subclasses could 
>>> be dropped.
>>> Protocol_text::store(Value) could just call Value::to_string() to get 
>>> the textual representation of the Value.  The question is what to do 
>>> for Protocol_binary and other protocols.
>> May I present an alternate strategy?
>> Currently, you have one large mysql::Value class (kudos on using 
>> namespacing BTW!) which "understands" how to convert its value into a 
>> number of different native "types" (custom String class, integers, 
>> my_decimal, etc... using the to_xxx() methods).
>> Up until now, this is basically what the Item class and its derived 
>> classes already do via the val_str(), val_int(), etc methods.
>> So, you've actually just added an additional layer of abstraction on 
>> top of the already complex Item tree.  The benefit of a Value object 
>> system has yet to be realized since you have, to now, only really 
>> duplicated the current system of type conversion within the server.
>> The true advantage of a Value object system is in two respects:
>> * the immutability of Value objects, once constructed
>> * the ability of Value objects to transform themselves into another 
>> Value object via construction
>> By this last bullet point I mean something like the following:
>> Number a= Number(20080911123059);
>> DateTime b= DateTime(a);
>> This may look like a trivial piece of code...but in there is some 
>> beauty which can remove or encapsulate a large chunk of type 
>> conversion code in the server and allow code to be written in a more 
>> natural (IMHO) manner.
>> Assume that Number and DateTime are "Value objects".  Perhaps they 
>> inherit from a base mysql::Value class, perhaps they don't.  By saying 
>> they are "Value objects", I mean that:
>> a) Number and DateTime objects, once constructed, are immutable
>> b) Number and DateTime can be constructed from instances of each other
>> Ignore, for now, the implementation of the above Value objects Number 
>> and DateTime (operator and constructor overloads).  If we take the 
>> above interface (of construction via another Value object instance), 
>> then we can begin to refactor two distinct parts of the parser and 
>> runtime which revolve around:
>> * Construction of constant things
>> * Transformation and reduction of constant things into other constant 
>> things
>> An example of the former would be in the Item tree constructed via a 
>> simple statement such as:
>> );
>> SELECT b FROM t1 WHERE b BETWEEN '20080911123059' AND '20090911123059';
>> In the current server, the SELECT statement above is parsed into a 
>> Select_lex structure which contains a series of Item, Table and Field 
>> objects.  The Item objects represent the constant strings 
>> '20080911123059' and '20090911123059' as Item_string objects.  The 
>> Field object represents the "b" field as a Field_datetime.
>> Because Field_datetime is evaluated by the MyISAM and HEAP storage 
>> engines and runtime as 64-bit integers, each Item_string's val_int() 
>> method is called to return a signed 64-bit integer number that is 
>> passed to the runtime during its evaluation of the COND representing 
>> the between condition on the "b" field. (Transformation #1)
>> These 64-bit signed integer are then passed to the 
>> Field_datetime::store() method in the runtime and the runtime asks the 
>> storage engine to give it an appropriate key to use in reading the 
>> data from the t1 table.
>> Before it can pass the data to the storage engine, however, 
>> Field_datetime::store() must first verify that the passed integer is 
>> indeed a valid datetime.  So, it "parses" the signed 64-bit integer 
>> into a timestamp-like number (see sql/ 
>> represented by the MYSQL_TIME struct. This struct contains temporal 
>> information like year, month, day, etc. (Transformation #2)
>> Field_datetime::store() then places the pieces of this MYSQL_TIME 
>> struct  into a series of raw uchar* bytes, pointed to by the 
>> Field::ptr member. (Transformation #3)
>> The storage engine can then use these raw uchar* bytes to find and 
>> retrieve the records needed by the call from the runtime to the record 
>> pointer needed during send of the record back to the client over the 
>> Protocol.
>> However, the Protocol itself will see (in Item::send()) that the 
>> returned Field data type is of type MYSQL_TYPE_DATETIME, and will 
>> construct a MYSQL_TIME instance by calling 
>> Field_datetime::get_date(*MYSQL_TIME).
>> The get_date() method then takes the value of the retrieved uchar* 
>> records, converts the uchar* into a signed 64-bit integer (via macros 
>> in sql/korr.h) (Transformation #4).  These 64-bit integers are then 
>> passed to the number_to_datetime() method to construct the MYSQL_TIME 
>> structs. (Transformation #5) and these are passed back to the 
>> Item::send() method.
>> Item::send() then calls Protocol::store(*MYSQL_TIME), passing in the 
>> MYSQL_TIME structs.  This method then transforms each MYSQL_TIME 
>> struct into a series of char* via the datetime_to_str() methods, which 
>> calls sprintf() to change the data into a textual format of 
>> "YYYY-MM-DD HH:MM:SS" (Transformation #6) which is then sent along the 
>> wire in text format...
>> As you can see, there are quite a few transformations which occur for 
>> both the constant strings as well as the datetime data sent/retrieved 
>> from the storage engine.
>> Once functions such as DATE_FORMAT(), CAST(), STR_TO_DATE() and others 
>> get involved, the whole process can be downright overwhelming! :)
>> Currently, when you follow the code in the server, you weave in and 
>> out of the various val_xxx(), store(xxx), get_xxx() methods, and in 
>> the case of decimal and datetime conversions, all their associated 
>> routines.  It can be very difficult to follow at times.
>> A Value object implementation can simplify much of this code spaghetti 
>> and even get rid of many of the Item classes entirely, namely all of 
>> the Item_xxx_typecast classes and the Cached_item classes.
>> So, how to go about implementing a Value object system in the runtime?
>> 1) Create a hierarchy of Value classes subclassing from a Value class. 
>> You'll need classes for temporal objects like Date and Timestamp as 
>> well as classes for a Number (don't have to break it into Decimal and 
>> Natural, I'd encapsulate all that in one class) and a String class of 
>> course...but not like the current String class; you'll want an 
>> immutable one.
>> 2) Instead of the parser creating Item_string or Item_num objects, it 
>> would create immutable Value objects of type String and Number.
>> 3) For each column in a SELECT's result and each WHERE condition you 
>> might create a vector<> of Value object pointers.  Instead of calling 
>> val_int(), val_str() and likewise, simply use the Session's vector<> 
>> of Value objects as needed. Push and pop objects off the vector as 
>> needed, and construct new Value objects from other ones.
>> For example, Let's take the example above.  The parser would construct 
>> a new String value object from the string "20080911123059".
>> After parsing, some analysis of the parsed nodes is done, including a 
>> name resolution step.  During this step, the "b" column would be 
>> determined to be of type DATETIME.  A DateTime value object for each 
>> side of the condition would then be pushed onto the Session's object 
>> vector<> like so:
>> String *left_side= <call to get the left String value of the condition>
>> session.values.push_back((Value *) new DateTime(left_side));
>> Within the optimizer, instead of using Field_datetime::store() to both 
>> validate and transform the datetime-string data, the optimizer would 
>> simply access the condition's constants like so:
>> DateTime *left_side= (DateTime *) session.values[0];
>> DateTime *right_side= (DateTime *) session.values[1];
>> If temporal functions were used in a statement (say, DATE_FORMAT()), 
>> it could work with DateTimes as DateTimes, with calendrical 
>> calculations native to a DateTime object, instead of constantly having 
>> to convert args[0] to a MYSQL_TIME or an integer or a string (just see 
>> the amount of code in the temporal built-in functions for doing 
>> conversion...)
>> For implementing DATE_ADD(), for instance, there's a ton of code which 
>> could be encapsulated in a Value object system that understands how to 
>> add and subtract dates and times properly...
>> Anyway, these are all just thoughts to get the ideas flowing, and 
>> nothing more.  In Drizzle-land, we're constantly thinking about this 
>> very problem and how to tackle it.  It's not an easy one to address 
>> with the current architecture, but hopefully we can share our 
>> successes and failures and collaborate together on this :)
>> Cheers!
>> Jay

Value objects and Protocol (WL#4760)Øystein Grøvlen22 May
  • Re: Value objects and Protocol (WL#4760)Konstantin Osipov22 May
    • Re: Value objects and Protocol (WL#4760)Alex Esterkin22 May
      • Re: Value objects and Protocol (WL#4760)Brian Aker22 May
        • Re: Value objects and Protocol (WL#4760)Alex Esterkin22 May
          • Re: Value objects and Protocol (WL#4760)Brian Aker22 May
          • Re: Value objects and Protocol (WL#4760)Sergei Golubchik23 May
        • Re: Value objects and Protocol (WL#4760)Konstantin Osipov23 May
          • Re: Value objects and Protocol (WL#4760)Brian Aker23 May
    • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen22 May
      • Re: Value objects and Protocol (WL#4760)Konstantin Osipov22 May
        • Re: Value objects and Protocol (WL#4760)Alex Esterkin22 May
          • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen25 May
        • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen24 May
          • Re: Value objects and Protocol (WL#4760)Konstantin Osipov24 May
      • Re: Value objects and Protocol (WL#4760)Konstantin Osipov23 May
        • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen25 May
          • Re: Value objects and Protocol (WL#4760)Michael Widenius6 Jun
            • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen8 Jun
            • Re: Value objects and Protocol (WL#4760)Jay Pipes8 Jun
              • Re: Value objects and Protocol (WL#4760)Jay Pipes8 Jun
  • Re: Value objects and Protocol (WL#4760)Jay Pipes24 May
    • Re: Value objects and Protocol (WL#4760)Øystein Grøvlen25 May
      • Re: Value objects and Protocol (WL#4760)Jay Pipes26 May
    • Re: [Drizzle-discuss] Value objects and Protocol (WL#4760)Jim Starkey28 May
Re: Value objects and Protocol (WL#4760)Brian Aker22 May