List:Commits« Previous MessageNext Message »
From:Martin Hansson Date:January 12 2011 11:13am
Subject:Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)
View as plain text  
Alexander Barkov skrev 2011-01-12 10.21:
> Hi Martin,
> I'm sorry for being slow. We've been on holiday.
> Martin Hansson wrote:
>> Hi again!
>> Alexander Barkov wrote:
>>> Hi Martin,
>>> Martin Hansson wrote:
>>>> Hi Alexander and Sergey!
>>>> Alexander Barkov skrev 2010-12-29 11.14:
>>>>> Hi Sergey, Martin,
>>>>> Sergey Glukhov wrote:
>>>>>> Hi Martin,
>>>>>> On 12/28/2010 11:33 AM, Martin Hansson wrote:
>>>>>>> Hi Sergey,
>>>>>>> Sergey Glukhov skrev 2010-12-23 12.44:
>>>>>>>> Hi Martin, see comments below
>>>>>>>> On 12/22/2010 03:30 PM, Martin Hansson wrote:
>>>>>>>>> #At
> file:///data0/martin/bzrroot/bug58165/5.1bt-lazy_copy/ 
>>>>>>>>> based on 
> revid:sergey.glukhov@stripped
>>>>>>>>>      modified:
>>>>>>>>>        mysql-test/r/func_str.result
>>>>>>>>>        mysql-test/t/func_str.test
>>>>>>>>>        sql/
>>>>>>>>>        sql/
>>>>>>>> ...
>>>>>>>>> === modified file 'sql/'
>>>>>>>>> --- a/sql/    2010-11-11 10:25:23
> +0000
>>>>>>>>> +++ b/sql/    2010-12-22 12:30:13
> +0000
>>>>>>>>> @@ -1305,8 +1305,15 @@ String
> *Item_func_substr_index::val_str(
>>>>>>>>>     null_value=0;
>>>>>>>>>     uint delimiter_length= delimiter->length();
>>>>>>>>>     if (!res->length() || !delimiter_length ||
> !count)
>>>>>>>>> -    return&my_empty_string;        // Wrong
> parameters
>>>>>>>>> -
>>>>>>>>> +  {
>>>>>>>>> +    if (str->copy("", 0, default_charset_info))  
>      // 
>>>>>>>>> Wrong parameters
>>>>>>>>> +    {
>>>>>>>>> +      /* Out of memory */
>>>>>>>>> +      null_value= true;
>>>>>>>>> +      return NULL;
>>>>>>>>> +    }
>>>>>>>>> +    return str;
>>>>>>>>> +  }
>>>>>>>>>     res->set_charset(collation.collation);
>>>>>>>>>   #ifdef USE_MB
>>>>>>>> Using my_empty_string is not safe, the bug shows it.
>>>>>>>> So we should remove my_empty_string from all places
>>>>>>>> where it is used. 
>>>>>>> Removing my_empty_string altogether is way too large to be
> done 
>>>>>>> as an MRU bug fix (Believe me, I tried.) I suggest filing a 
>>>>>>> separate report about it. Let's make this customer happy
> first 
>>>>>>> and then we'll deal with the other problems we discovered
> while 
>>>>>>> working on it.
>>>>>> I was too optimistic with the request about removing
> my_empty_string
>>>>>> from all places. As far as I can see problematic places are
>>>>>>, However we could fix all places in
>>>>>> at least. it should be not a problem.
>>>>>> It seems we should introduce new method, say
>>>>>> Item::empty_string()
>>>>>> {
>>>>>>   str_value.set("", 0, collation.collation);
>>>>>>   return &str_value;
>>>>>> }
>>>>> Sergey, I like your idea about introducing this new method.
>>>>> Please check if it can be moved to Item_str_func:
>>>>> String *Item_str_func::empty_string()
>>>>> {
>>>>>   str_value.set("", 0, collation.collation);
>>>>>   return &str_value;
>>>>> }
>>>> Nice idea, will save us from cluttering up the code.
>>>>>> and replace &my_empty_string with empty_string().
>>>>> It should be safe to replace all &my_empty_string instances
>>>>> to the new method calls in,
>>>>> even under terms of a MRU bug. Looks like a fairly small change:
>>>>> "grep my_empty_string" reports about 22 lines.
>>>> done. I've added a comment that it should be removed.
>>>>> Note, my_empty_string is a reminder from 4.0 times,
>>>>> when we had a single global character set for the entire server.
>>>>> my_empty_string should be eventually removed everywhere from the
>>>>> sources. Fixing looks like a good start.
>>>> Thank you for the background. It's really strange that the bug has 
>>>> stayed under the radar for that long.
>>> I think this is because it shows up under very rare conditions,
>>> e.g. wrong arguments, or end-of-memory.
>> I can agree about out-of-memory, that doesn't happen all the time 
>> :-), but wrong arguments should happen more often that every 10 years 
>> ;-) I guess most people would discover the data type was wrong and 
>> didn't file a report.
>>>> I also have a question for you: Why do we even return a *pointer* 
>>>> to a String object from the Item_*::val_str() methods? I see no 
>>>> good reason for this. The String class is really small, only 5 
>>>> small members and we could almost always exploit the return value 
>>>> optimization anyways. We lose all of the safety benefits of the 
>>>> shared string buffers by passing pointers around. For instance this 
>>>> bug would not exists if we did that. To a C++ programmer this is 
>>>> completely insane.
>>>> Some historical reason?
>>> This is done for performance reasons.
>> But it doesn't boost performance. Quite on the contrary, see below.
>>> If we returned objects instead of pointers, then for every row
>>> we'd have to do "new String(...)", then allocate Ptr,
>> Nope. "new String()" is done only if you want to heap allocate 
>> strings (or MEM_ROOT, to be precise). By returning String objects, 
>> they are allocated on the stack instead, which is cheaper.
>> This is how to receive a string on the stack:
>> void foo() {
>>  ...
>>  String myString = getMeAString();
>>  // Calls the copy constructor.
>>  // myString.Ptr is now the same as myStringToo.Ptr.
>>  // Nothing is allocated.
>>  String myStringToo = myString;
>>  ...
>> }
>> This is how to declare a function returning a string:
>> String getMeAString() {
>>  // This string will actually be allocated on the stack in foo's 
>> stack frame
>>  // since it is constructed as part of the return statement.
>>  // This is called the return value optimization.
>>  return String();
>> }
>>> then send result to the client side in, and then
>>> free the memory allocated for the String and its Ptr.
>> Stack memory is freed automatically. And note that I'm not talking 
>> about the string buffer pointed to by String::Ptr. That buffer would 
>> be allocated and deallocated exactly the same way. However, we 
>> wouldn't have to allocate as many copies as we have to now because we 
>> could benefit from the safety that the String class provides, see below.
>>> Now String is created only one time per Item, but what's more 
>>> important,
>>> Ptr() is allocated one time! So val_str() reuses the same buffer
>>> for every row.
>> The buffer has very little to do with the String object. Only when 
>> realloc() is called is a new buffer allocated.
>> Thus, the current implementation is actually *worse* for performance. 
>> First and foremost: We have to care about allocating and freeing 
>> String objects in various MEM_ROOTs. Secondly, since we reuse the 
>> same String *object*, we cannot quite be sure whether the same String 
>> object, and hence buffer, is used in different places.
>> The String class is designed to be very efficient if several String 
>> objects point to the same buffer. Every String object knows if it is 
>> sharing the buffer (*Ptr) with another object. So unless you are 
>> going to write to the buffer, you can just use it. And if you are 
>> going to write to it, you only copy it if another String object owns 
>> it. If you own it yourself, you don't have to copy it. That is the 
>> purpose if copy_if_not_alloced().
>> If you instead return pointers, all these nice things go out the 
>> window. The String object keeps track of whether it owns the buffer 
>> or if some other String object owns it. But if you use the same 
>> String *object*, nothing can be taken for granted.
> Buffers are still shared. 
That is the whole point. Buffers should be shared, but with the current 
design we don't know whether they are shared or not. Hence when we're in 
doubt we must always copy.
> For example, if you do
>   SELECT LEFT(CONCAT(...))...;
> Item_func_left returns pointer to the same buffer with what
> Item_func_concat has returned.
Do you mean this?

String *Item_func_left::val_str(String *str)

It returns a pointer to a String object. Without analyzing the code, I 
suspect that Item_func_concat::val_int() returns a pointer to a String 
object, and then Item_func_left probably returns a pointer to the same 
String object. Not good.
> Also, String objects are also shared.
This is the problem.
> They are created only at fix_fields time and then reused for each row.
I'm not familiar with this, but this sounds like a bad idea. The buffers 
should be reused, not the String objects.
> If we returned objects instead of pointers, we'd have to allocate
> objects per row, which I believe would slow down performance.
Seriously, how much could this cost, given that they are always 
allocated on the stack? The price would be the same as for 5 local 
>> For instance, using my_empty_string would be fine if we didn't return 
>> pointers. A val_str() function would return a copy of the 
>> my_empty_string object, all of its fields would be copies from the 
>> original instance, hence pointing to the same buffer, except alloced 
>> == false. Then if you needed to write to the buffer you would do 
>> copy_if_not_alloced(). If we didn't have to write to it, no 
>> allocation whatsoever would occur. What happens now is we _always_ 
>> allocate a buffer, we _always_ allocate at least one String object, 
>> then we write an empty string to the buffer.
>> Besides, Instead of sharing a char*, there should also be a 
>> String_buffer class that would be shared by String objects. Then the 
>> fields Alloced_length, str_charset and alloced should be moved into 
>> that class making the String class even smaller; the size of two 
>> ints. This would give a substantial perfomance gain.
> This is a good idea. But I'm afraid we cannot afford this change now -
> too much code is affected.
Of course. The Eternal Argument. :-) But it is good to identify what the 
real problem is. Then you can make better choices in the future. For 
instance we could make sure that any new functions returning String's 
will return objects rather than pointers.

Best Regards

bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534) Bug#58165Martin Hansson28 Dec
Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson28 Dec
Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov29 Dec
  • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson29 Dec
    • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov29 Dec
      • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson30 Dec
        • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov12 Jan
          • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Martin Hansson12 Jan
            • Re: bzr commit into mysql-5.1-bugteam branch (martin.hansson:3534)Bug#58165Alexander Barkov14 Jan