List:Commits« Previous MessageNext Message »
From:Alexander Barkov Date:January 12 2011 9:21am
Subject:Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)
View as plain text  
Hi Martin,

I'm sorry for being slow. We've been on holiday.

Martin Hansson wrote:
> Hi again!
> Alexander Barkov wrote:
>> Hi Martin,
>> Martin Hansson wrote:
>>> Hi Alexander and Sergey!
>>> Alexander Barkov skrev 2010-12-29 11.14:
>>>> Hi Sergey, Martin,
>>>> Sergey Glukhov wrote:
>>>>> Hi Martin,
>>>>> On 12/28/2010 11:33 AM, Martin Hansson wrote:
>>>>>> Hi Sergey,
>>>>>> Sergey Glukhov skrev 2010-12-23 12.44:
>>>>>>> Hi Martin, see comments below
>>>>>>> On 12/22/2010 03:30 PM, Martin Hansson wrote:
>>>>>>>> #At
> file:///data0/martin/bzrroot/bug58165/5.1bt-lazy_copy/ based 
>>>>>>>> on
> revid:sergey.glukhov@stripped
>>>>>>>>      modified:
>>>>>>>>        mysql-test/r/func_str.result
>>>>>>>>        mysql-test/t/func_str.test
>>>>>>>>        sql/
>>>>>>>>        sql/
>>>>>>> ...
>>>>>>>> === modified file 'sql/'
>>>>>>>> --- a/sql/    2010-11-11 10:25:23 +0000
>>>>>>>> +++ b/sql/    2010-12-22 12:30:13 +0000
>>>>>>>> @@ -1305,8 +1305,15 @@ String
> *Item_func_substr_index::val_str(
>>>>>>>>     null_value=0;
>>>>>>>>     uint delimiter_length= delimiter->length();
>>>>>>>>     if (!res->length() || !delimiter_length ||
> !count)
>>>>>>>> -    return&my_empty_string;        // Wrong
> parameters
>>>>>>>> -
>>>>>>>> +  {
>>>>>>>> +    if (str->copy("", 0, default_charset_info))      
>  // Wrong 
>>>>>>>> parameters
>>>>>>>> +    {
>>>>>>>> +      /* Out of memory */
>>>>>>>> +      null_value= true;
>>>>>>>> +      return NULL;
>>>>>>>> +    }
>>>>>>>> +    return str;
>>>>>>>> +  }
>>>>>>>>     res->set_charset(collation.collation);
>>>>>>>>   #ifdef USE_MB
>>>>>>> Using my_empty_string is not safe, the bug shows it.
>>>>>>> So we should remove my_empty_string from all places
>>>>>>> where it is used. 
>>>>>> Removing my_empty_string altogether is way too large to be done
> as 
>>>>>> an MRU bug fix (Believe me, I tried.) I suggest filing a separate
>>>>>> report about it. Let's make this customer happy first and then 
>>>>>> we'll deal with the other problems we discovered while working on
> it.
>>>>> I was too optimistic with the request about removing my_empty_string
>>>>> from all places. As far as I can see problematic places are
>>>>>, However we could fix all places in
>>>>> at least. it should be not a problem.
>>>>> It seems we should introduce new method, say
>>>>> Item::empty_string()
>>>>> {
>>>>>   str_value.set("", 0, collation.collation);
>>>>>   return &str_value;
>>>>> }
>>>> Sergey, I like your idea about introducing this new method.
>>>> Please check if it can be moved to Item_str_func:
>>>> String *Item_str_func::empty_string()
>>>> {
>>>>   str_value.set("", 0, collation.collation);
>>>>   return &str_value;
>>>> }
>>> Nice idea, will save us from cluttering up the code.
>>>>> and replace &my_empty_string with empty_string().
>>>> It should be safe to replace all &my_empty_string instances
>>>> to the new method calls in,
>>>> even under terms of a MRU bug. Looks like a fairly small change:
>>>> "grep my_empty_string" reports about 22 lines.
>>> done. I've added a comment that it should be removed.
>>>> Note, my_empty_string is a reminder from 4.0 times,
>>>> when we had a single global character set for the entire server.
>>>> my_empty_string should be eventually removed everywhere from the
>>>> sources. Fixing looks like a good start.
>>> Thank you for the background. It's really strange that the bug has 
>>> stayed under the radar for that long.
>> I think this is because it shows up under very rare conditions,
>> e.g. wrong arguments, or end-of-memory.
> I can agree about out-of-memory, that doesn't happen all the time :-), 
> but wrong arguments should happen more often that every 10 years ;-) I 
> guess most people would discover the data type was wrong and didn't file 
> a report.
>>> I also have a question for you: Why do we even return a *pointer* to 
>>> a String object from the Item_*::val_str() methods? I see no good 
>>> reason for this. The String class is really small, only 5 small 
>>> members and we could almost always exploit the return value 
>>> optimization anyways. We lose all of the safety benefits of the 
>>> shared string buffers by passing pointers around. For instance this 
>>> bug would not exists if we did that. To a C++ programmer this is 
>>> completely insane.
>>> Some historical reason?
>> This is done for performance reasons.
> But it doesn't boost performance. Quite on the contrary, see below.
>> If we returned objects instead of pointers, then for every row
>> we'd have to do "new String(...)", then allocate Ptr,
> Nope. "new String()" is done only if you want to heap allocate strings 
> (or MEM_ROOT, to be precise). By returning String objects, they are 
> allocated on the stack instead, which is cheaper.
> This is how to receive a string on the stack:
> void foo() {
>  ...
>  String myString = getMeAString();
>  // Calls the copy constructor.
>  // myString.Ptr is now the same as myStringToo.Ptr.
>  // Nothing is allocated.
>  String myStringToo = myString;
>  ...
> }
> This is how to declare a function returning a string:
> String getMeAString() {
>  // This string will actually be allocated on the stack in foo's stack 
> frame
>  // since it is constructed as part of the return statement.
>  // This is called the return value optimization.
>  return String();
> }
>> then send result to the client side in, and then
>> free the memory allocated for the String and its Ptr.
> Stack memory is freed automatically. And note that I'm not talking about 
> the string buffer pointed to by String::Ptr. That buffer would be 
> allocated and deallocated exactly the same way. However, we wouldn't 
> have to allocate as many copies as we have to now because we could 
> benefit from the safety that the String class provides, see below.
>> Now String is created only one time per Item, but what's more important,
>> Ptr() is allocated one time! So val_str() reuses the same buffer
>> for every row.
> The buffer has very little to do with the String object. Only when 
> realloc() is called is a new buffer allocated.
> Thus, the current implementation is actually *worse* for performance. 
> First and foremost: We have to care about allocating and freeing String 
> objects in various MEM_ROOTs. Secondly, since we reuse the same String 
> *object*, we cannot quite be sure whether the same String object, and 
> hence buffer, is used in different places.
> The String class is designed to be very efficient if several String 
> objects point to the same buffer. Every String object knows if it is 
> sharing the buffer (*Ptr) with another object. So unless you are going 
> to write to the buffer, you can just use it. And if you are going to 
> write to it, you only copy it if another String object owns it. If you 
> own it yourself, you don't have to copy it. That is the purpose if 
> copy_if_not_alloced().
> If you instead return pointers, all these nice things go out the window. 
> The String object keeps track of whether it owns the buffer or if some 
> other String object owns it. But if you use the same String *object*, 
> nothing can be taken for granted.

Buffers are still shared. For example, if you do


Item_func_left returns pointer to the same buffer with what
Item_func_concat has returned.

Also, String objects are also shared.
They are created only at fix_fields time and then reused for each row.

If we returned objects instead of pointers, we'd have to allocate
objects per row, which I believe would slow down performance.

> For instance, using my_empty_string would be fine if we didn't return 
> pointers. A val_str() function would return a copy of the 
> my_empty_string object, all of its fields would be copies from the 
> original instance, hence pointing to the same buffer, except alloced == 
> false. Then if you needed to write to the buffer you would do 
> copy_if_not_alloced(). If we didn't have to write to it, no allocation 
> whatsoever would occur. What happens now is we _always_ allocate a 
> buffer, we _always_ allocate at least one String object, then we write 
> an empty string to the buffer.
> Besides, Instead of sharing a char*, there should also be a 
> String_buffer class that would be shared by String objects. Then the 
> fields Alloced_length, str_charset and alloced should be moved into that 
> class making the String class even smaller; the size of two ints. This 
> would give a substantial perfomance gain.

This is a good idea. But I'm afraid we cannot afford this change now -
too much code is affected.

>>> Thank you for the review!
>> Will you commit the new version soon?
> Done.
> Best Regards
> Martin

bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534) Bug#58165Martin Hansson28 Dec
Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson28 Dec
Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov29 Dec
  • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson29 Dec
    • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov29 Dec
      • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson30 Dec
        • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov12 Jan
          • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson12 Jan
            • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov14 Jan