List:Commits« Previous MessageNext Message »
From:Martin Hansson Date:December 30 2010 9:21am
Subject:Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)
Bug#58165
View as plain text  
Hi again!

Alexander Barkov wrote:
> Hi Martin,
>
>
> Martin Hansson wrote:
>> Hi Alexander and Sergey!
>>
>> Alexander Barkov skrev 2010-12-29 11.14:
>>> Hi Sergey, Martin,
>>>
>>> Sergey Glukhov wrote:
>>>> Hi Martin,
>>>>
>>>> On 12/28/2010 11:33 AM, Martin Hansson wrote:
>>>>> Hi Sergey,
>>>>>
>>>>> Sergey Glukhov skrev 2010-12-23 12.44:
>>>>>> Hi Martin, see comments below
>>>>>>
>>>>>> On 12/22/2010 03:30 PM, Martin Hansson wrote:
>>>>>>> #At file:///data0/martin/bzrroot/bug58165/5.1bt-lazy_copy/
> based 
>>>>>>> on
> revid:sergey.glukhov@stripped
>>>>>>>
>>>>>>>
>>>>>>>      modified:
>>>>>>>        mysql-test/r/func_str.result
>>>>>>>        mysql-test/t/func_str.test
>>>>>>>        sql/item_strfunc.cc
>>>>>>>        sql/sql_string.cc
>>>>>> ...
>>>>>>> === modified file 'sql/item_strfunc.cc'
>>>>>>> --- a/sql/item_strfunc.cc    2010-11-11 10:25:23 +0000
>>>>>>> +++ b/sql/item_strfunc.cc    2010-12-22 12:30:13 +0000
>>>>>>> @@ -1305,8 +1305,15 @@ String
> *Item_func_substr_index::val_str(
>>>>>>>     null_value=0;
>>>>>>>     uint delimiter_length= delimiter->length();
>>>>>>>     if (!res->length() || !delimiter_length || !count)
>>>>>>> -    return&my_empty_string;        // Wrong parameters
>>>>>>> -
>>>>>>> +  {
>>>>>>> +    if (str->copy("", 0, default_charset_info))        //
> Wrong 
>>>>>>> parameters
>>>>>>> +    {
>>>>>>> +      /* Out of memory */
>>>>>>> +      null_value= true;
>>>>>>> +      return NULL;
>>>>>>> +    }
>>>>>>> +    return str;
>>>>>>> +  }
>>>>>>>     res->set_charset(collation.collation);
>>>>>>>
>>>>>>>   #ifdef USE_MB
>>>>>>>
>>>>>>
>>>>>> Using my_empty_string is not safe, the bug shows it.
>>>>>> So we should remove my_empty_string from all places
>>>>>> where it is used. 
>>>>> Removing my_empty_string altogether is way too large to be done as 
>>>>> an MRU bug fix (Believe me, I tried.) I suggest filing a separate 
>>>>> report about it. Let's make this customer happy first and then 
>>>>> we'll deal with the other problems we discovered while working on
> it.
>>>> I was too optimistic with the request about removing my_empty_string
>>>> from all places. As far as I can see problematic places are
>>>> set_var.cc, sql_class.cc. However we could fix all places in
>>>> item_strfunc.cc at least. it should be not a problem.
>>>> It seems we should introduce new method, say
>>>>
>>>> Item::empty_string()
>>>> {
>>>>   str_value.set("", 0, collation.collation);
>>>>   return &str_value;
>>>> }
>>>
>>> Sergey, I like your idea about introducing this new method.
>>> Please check if it can be moved to Item_str_func:
>>>
>>> String *Item_str_func::empty_string()
>>> {
>>>   str_value.set("", 0, collation.collation);
>>>   return &str_value;
>>> }
>> Nice idea, will save us from cluttering up the code.
>>>
>>>>
>>>> and replace &my_empty_string with empty_string().
>>>>
>>>
>>> It should be safe to replace all &my_empty_string instances
>>> to the new method calls in item_str_func.cc,
>>> even under terms of a MRU bug. Looks like a fairly small change:
>>> "grep my_empty_string item_strfunc.cc" reports about 22 lines.
>> done. I've added a comment that it should be removed.
>>>
>>>
>>> Note, my_empty_string is a reminder from 4.0 times,
>>> when we had a single global character set for the entire server.
>>> my_empty_string should be eventually removed everywhere from the
>>> sources. Fixing item_strfunc.cc looks like a good start.
>> Thank you for the background. It's really strange that the bug has 
>> stayed under the radar for that long.
>
> I think this is because it shows up under very rare conditions,
> e.g. wrong arguments, or end-of-memory.
I can agree about out-of-memory, that doesn't happen all the time :-), 
but wrong arguments should happen more often that every 10 years ;-) I 
guess most people would discover the data type was wrong and didn't file 
a report.
>
>>
>> I also have a question for you: Why do we even return a *pointer* to 
>> a String object from the Item_*::val_str() methods? I see no good 
>> reason for this. The String class is really small, only 5 small 
>> members and we could almost always exploit the return value 
>> optimization anyways. We lose all of the safety benefits of the 
>> shared string buffers by passing pointers around. For instance this 
>> bug would not exists if we did that. To a C++ programmer this is 
>> completely insane.
>>
>> Some historical reason?
>
> This is done for performance reasons.
But it doesn't boost performance. Quite on the contrary, see below.
>
> If we returned objects instead of pointers, then for every row
> we'd have to do "new String(...)", then allocate Ptr,
Nope. "new String()" is done only if you want to heap allocate strings 
(or MEM_ROOT, to be precise). By returning String objects, they are 
allocated on the stack instead, which is cheaper.

This is how to receive a string on the stack:

void foo() {
  ...
  String myString = getMeAString();

  // Calls the copy constructor.
  // myString.Ptr is now the same as myStringToo.Ptr.
  // Nothing is allocated.
  String myStringToo = myString;
  ...
}

This is how to declare a function returning a string:

String getMeAString() {

  // This string will actually be allocated on the stack in foo's stack 
frame
  // since it is constructed as part of the return statement.
  // This is called the return value optimization.
  return String();

}

> then send result to the client side in protocol.cc, and then
> free the memory allocated for the String and its Ptr.
Stack memory is freed automatically. And note that I'm not talking about 
the string buffer pointed to by String::Ptr. That buffer would be 
allocated and deallocated exactly the same way. However, we wouldn't 
have to allocate as many copies as we have to now because we could 
benefit from the safety that the String class provides, see below.
>
> Now String is created only one time per Item, but what's more important,
> Ptr() is allocated one time! So val_str() reuses the same buffer
> for every row.
The buffer has very little to do with the String object. Only when 
realloc() is called is a new buffer allocated.

Thus, the current implementation is actually *worse* for performance. 
First and foremost: We have to care about allocating and freeing String 
objects in various MEM_ROOTs. Secondly, since we reuse the same String 
*object*, we cannot quite be sure whether the same String object, and 
hence buffer, is used in different places.

The String class is designed to be very efficient if several String 
objects point to the same buffer. Every String object knows if it is 
sharing the buffer (*Ptr) with another object. So unless you are going 
to write to the buffer, you can just use it. And if you are going to 
write to it, you only copy it if another String object owns it. If you 
own it yourself, you don't have to copy it. That is the purpose if 
copy_if_not_alloced().

If you instead return pointers, all these nice things go out the window. 
The String object keeps track of whether it owns the buffer or if some 
other String object owns it. But if you use the same String *object*, 
nothing can be taken for granted.

For instance, using my_empty_string would be fine if we didn't return 
pointers. A val_str() function would return a copy of the 
my_empty_string object, all of its fields would be copies from the 
original instance, hence pointing to the same buffer, except alloced == 
false. Then if you needed to write to the buffer you would do 
copy_if_not_alloced(). If we didn't have to write to it, no allocation 
whatsoever would occur. What happens now is we _always_ allocate a 
buffer, we _always_ allocate at least one String object, then we write 
an empty string to the buffer.

Besides, Instead of sharing a char*, there should also be a 
String_buffer class that would be shared by String objects. Then the 
fields Alloced_length, str_charset and alloced should be moved into that 
class making the String class even smaller; the size of two ints. This 
would give a substantial perfomance gain.
>
>>
>> Thank you for the review!
>
> Will you commit the new version soon?
Done.

Best Regards

Martin

Thread
bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534) Bug#58165Martin Hansson28 Dec
Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson28 Dec
Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov29 Dec
  • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson29 Dec
    • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov29 Dec
      • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson30 Dec
        • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov12 Jan
          • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson12 Jan
            • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov14 Jan