List:Commits« Previous MessageNext Message »
From:V Narayanan Date:November 18 2008 6:33am
Subject:Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)
Bug#39616
View as plain text  
Here is the correct output from dgcov

narayanan@narayanan-lt:~/Work/mysql/W-M/mysql-5.0-bugteam-39616$ perl 
~/internals/dev/dgcov/dgcov.pl --local
.
-------------------------------------------------------------------------------

85 line(s) in 1 source file(s) modified in revision(s).
0 line(s) not covered by tests.
For documentation, see http://forge.mysql.com/wiki/DGCov_doc

I pasted the --uncommitted output previously, sorry about that

Narayanan

V Narayanan wrote:
> Hi Ingo,
>
> Thank you for the detailed comments/reviews,
>
> Please find some inline answers
>
> Ingo Strüwing wrote:
>> Hi V,
>>
>> V Narayanan, 11.11.2008 13:53:
>> ...
>>  
>>>  2712 V Narayanan    2008-11-11
>>>       Bug#39616: Missing quotes from .CSV crashes server
>>>             When a CSV file contained comma separated elements       
>>> that were not enclosed in quotes, it was causing the
>>>       mysql server to crash.
>>>             The algorithm that parsed the content of a row in
>>>       mysql 5.0 was assuming that the values of the fields
>>>       in a .CSV file will be enclosed in quotes and will be
>>>       separated by commas.
>>>             This was causing the algorithm to fail when the content
>>>       of the file resembled the following
>>>       3,"with quotes"
>>>     
>>
>> That's ok, but the bug reporter made it even more clear by using 3,"sans
>> quotes". ;-)
>>   
> ok! reverted it back to sans quotes!
>
>>  
>>>       The CSV engine that is part of mysql 5.0 was expecting
>>>       the above to be
>>>       "3","with quotes"
>>>             The above is just one example of where the engine was
>>>       failing for what would be recognized as a valid .CSV       
>>> file content otherwise.
>>>             The proposed fix changes the previous algorithm being used
>>>       to parse rows from the .CSV file to handle two separate
>>>       cases
>>>             1) When the current field of the row is enclosed in quotes
>>>       2) When the current field of the row is not enclosed in quotes
>>> modified:
>>>   sql/examples/ha_tina.cc
>>>
>>> per-file messages:
>>>   sql/examples/ha_tina.cc
>>>     The function ha_tina::find_current_row(byte *buf)
>>>     contains the logic used to parse rows from the     .CSV file.
>>>         The proposed fix uses the following algorithm to handle
>>>     the parsing of a row in the .CSV file
>>>         BEGIN
>>>     1) Store the EOL (end of line) for the current row
>>>     2) Until all the fields in the current query have not been 
>>>        filled
>>>        2.1) If the current character begins with a quote
>>>             2.1.1) Until EOL has not been reached
>>>                    a) If end of current field is reached, move
>>>                       to next field and jump to step 2.3
>>>                    b) If current character begins with \\ handle
>>>                       \\n, \\r, \\, \\"
>>>                    c) else append the current character into the buffer
>>>                       before checking that EOL has not been reached.
>>>         2.2) If the current character does not begin with a quote
>>>              2.2.1) Until EOL has not been reached
>>>                     a) If the end of field has been reached move to the
>>>                        next field and jump to step 2.3
>>>                     b) append the current character into the buffer
>>>         2.3) Store the current field value and jump to 2)
>>>     TERMINATE
>>>     
>>
>> It is nice to have this comment. But I would rather have this as a code
>> comment, e.g. in the function comment of ha_tina::find_current_row(),
>> than in a revision file comment. IMHO the file comments should tell what
>> has been changed, not how it was done, nor why it was done, let alone a
>> description of the algorithm. How and why belong to the global revision
>> comment, at least a brief version. If more seems appropriate, I'd add it
>> to the bug report and point to it from the revision comment.
>> Descriptions of algorithms belong to code comments.
>>   
>
> I agree! Moved the algorithm to the code comments.
>
>> These questions may be unrelated to the problem, but please clarify:
>> What is "end of field"? Is it a comma in the row? What do we do with
>> leading/trailing/intermediate spaces if the field is not quoted?
>>   
>
> Sorry about my ambiguity,
>
> In the parsing logic used in the CSV engine EOL is \n
>
> End of field is a "(quotes) followed by a  ,(comma) if the given field 
> begins with
> "(quotes)
>
> Or it is just a ,(comma) (for the fields that do not begin with a 
> "(quotes))
>
> We do not do anything about leading or trailing spaced if the field is 
> not
> quoted. It becomes part of the field. This behavior matches that of 
> 5.1 and
> 6.0.
>
>>  
>>>         The current algorithm basically separates the parsing of the 
>>> row into
>>>     two cases
>>>     
>>
>> What is the "current" algorithm here? The one that currently exists in
>> 5.0? Or the new one as described above?
>>
>> In the first case, I'd prefer to have it called "the old algorithm". In
>> the second case, I'd prefer "the proposed algorithm".
>>   
>
> Sorry!
>
> Changed as advised!
>>  
>>>         1) when the current field has quotes (step 2.1)
>>>     2) when the current field does not have quotes (step 2.2)
>>>         1) is similar to the previous algorithm except that it is 
>>> now handled as
>>>     as special case of parsing with quotes and includes a check for 
>>> testing
>>>     that EOL has not been reached before writing into the field 
>>> buffer in
>>>     step 2.1.1->c)
>>>     
>>
>> Does this mean that EOL markers cannot be part of a quoted string? Won't
>> it be better to collect characters until the field length is reached,
>> before reporting HA_ERR_END_OF_FILE? While I write this, I notice that
>> HA_ERR_WRONG_IN_RECORD (Record-file is crashed) may be a better error to
>> return in this case.
>>   
>
> EOL is just \n. I have again taken the clue for this behavior from
> 5.1 and 6.0 here.
>
> I put it as HA_ERR_END_OF_FILE  because the absence of a trailing quote
> actually results in reaching End of file. The for loop for the case 
> when the
> field is enclosed in quotes keeps on looping until EOF is reached.
>
> Ideally speaking the errors returned in 5.1 and 6.0 should be 
> replicated here
> but this I thought would be beyond this bug issue. I would have had to 
> copy
> the goto err: being used in 5.1 and 6.0 here, it would have almost 
> been close
> to adding functionality and would have moved away from the scope of this
> bug.
>> Hm. Forget about this EOL remark. I found that CSV never allowed EOL in
>> quoted strings and doesn't even in 5.1.
>>
>>  
>>> ...
>>> === modified file 'sql/examples/ha_tina.cc'
>>> --- a/sql/examples/ha_tina.cc    2008-03-29 15:50:46 +0000
>>> +++ b/sql/examples/ha_tina.cc    2008-11-11 12:53:48 +0000
>>>     
>>
>> The code changes look ok on the first glance. But according to my
>> experience, mere code reading does not always reveal all bugs.
>>
>> The code must be tested. I strongly recommend coverage testing. Please
>> see http://forge.mysql.com/wiki/DGCov and its link to the documentation.
>> Even though coverage testing can by far not detect every problem either,
>> it is frequently amazing to watch what the code is doing when I try to
>> add tests so that it runs through all code lines.
>>
>> For this patch you might need a hand-crafted CSV file that contains all
>> situations you can think of. If this doesn't run through all code lines,
>> you may need to add more rows with more cases, until you either have
>> full coverage, or are sure that certain lines cannot be used. Then you
>> can either delete them or attach "purecov" annotations (see dgcov doc).
>>   
>
> Done!
>
> narayanan@narayanan-lt:~/Work/mysql/W-M/mysql-5.0-bugteam-39616$ perl 
> ~/internals/dev/dgcov/dgcov.pl --uncommitted
>
> ------------------------------------------------------------------------------- 
>
>
> 0 line(s) in 0 source file(s) modified in revision(s).
> 0 line(s) not covered by tests.
> For documentation, see http://forge.mysql.com/wiki/DGCov_doc
>
>> You can put that CSV file in mysql-test/std-data and copy it over the
>> table file by using the mysqltest commands --remove_file and
>> --copy_file. There are other test cases that use files from std-data for
>> your reference.
>>   
>
> Using an external file was not necessary,
>
> I used --write_file
>
>> So in summary I have given some remarks about commenting and about
>> testing. Due to the latter I set the bug report back to "In progress"
>> for now.
>>
>> Regards
>> Ingo
>>   
>
> Thanks again!
>
> Narayanan
>

Thread
bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712) Bug#39616V Narayanan11 Nov
  • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616Ramil Kalimullin12 Nov
    • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616V Narayanan18 Nov
  • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616Ingo Strüwing12 Nov
    • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616V Narayanan18 Nov
      • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616V Narayanan18 Nov