Here is the correct output from dgcov
narayanan@narayanan-lt:~/Work/mysql/W-M/mysql-5.0-bugteam-39616$ perl
~/internals/dev/dgcov/dgcov.pl --local
.
-------------------------------------------------------------------------------
85 line(s) in 1 source file(s) modified in revision(s).
0 line(s) not covered by tests.
For documentation, see http://forge.mysql.com/wiki/DGCov_doc
I pasted the --uncommitted output previously, sorry about that
Narayanan
V Narayanan wrote:
> Hi Ingo,
>
> Thank you for the detailed comments/reviews,
>
> Please find some inline answers
>
> Ingo Strüwing wrote:
>> Hi V,
>>
>> V Narayanan, 11.11.2008 13:53:
>> ...
>>
>>> 2712 V Narayanan 2008-11-11
>>> Bug#39616: Missing quotes from .CSV crashes server
>>> When a CSV file contained comma separated elements
>>> that were not enclosed in quotes, it was causing the
>>> mysql server to crash.
>>> The algorithm that parsed the content of a row in
>>> mysql 5.0 was assuming that the values of the fields
>>> in a .CSV file will be enclosed in quotes and will be
>>> separated by commas.
>>> This was causing the algorithm to fail when the content
>>> of the file resembled the following
>>> 3,"with quotes"
>>>
>>
>> That's ok, but the bug reporter made it even more clear by using 3,"sans
>> quotes". ;-)
>>
> ok! reverted it back to sans quotes!
>
>>
>>> The CSV engine that is part of mysql 5.0 was expecting
>>> the above to be
>>> "3","with quotes"
>>> The above is just one example of where the engine was
>>> failing for what would be recognized as a valid .CSV
>>> file content otherwise.
>>> The proposed fix changes the previous algorithm being used
>>> to parse rows from the .CSV file to handle two separate
>>> cases
>>> 1) When the current field of the row is enclosed in quotes
>>> 2) When the current field of the row is not enclosed in quotes
>>> modified:
>>> sql/examples/ha_tina.cc
>>>
>>> per-file messages:
>>> sql/examples/ha_tina.cc
>>> The function ha_tina::find_current_row(byte *buf)
>>> contains the logic used to parse rows from the .CSV file.
>>> The proposed fix uses the following algorithm to handle
>>> the parsing of a row in the .CSV file
>>> BEGIN
>>> 1) Store the EOL (end of line) for the current row
>>> 2) Until all the fields in the current query have not been
>>> filled
>>> 2.1) If the current character begins with a quote
>>> 2.1.1) Until EOL has not been reached
>>> a) If end of current field is reached, move
>>> to next field and jump to step 2.3
>>> b) If current character begins with \\ handle
>>> \\n, \\r, \\, \\"
>>> c) else append the current character into the buffer
>>> before checking that EOL has not been reached.
>>> 2.2) If the current character does not begin with a quote
>>> 2.2.1) Until EOL has not been reached
>>> a) If the end of field has been reached move to the
>>> next field and jump to step 2.3
>>> b) append the current character into the buffer
>>> 2.3) Store the current field value and jump to 2)
>>> TERMINATE
>>>
>>
>> It is nice to have this comment. But I would rather have this as a code
>> comment, e.g. in the function comment of ha_tina::find_current_row(),
>> than in a revision file comment. IMHO the file comments should tell what
>> has been changed, not how it was done, nor why it was done, let alone a
>> description of the algorithm. How and why belong to the global revision
>> comment, at least a brief version. If more seems appropriate, I'd add it
>> to the bug report and point to it from the revision comment.
>> Descriptions of algorithms belong to code comments.
>>
>
> I agree! Moved the algorithm to the code comments.
>
>> These questions may be unrelated to the problem, but please clarify:
>> What is "end of field"? Is it a comma in the row? What do we do with
>> leading/trailing/intermediate spaces if the field is not quoted?
>>
>
> Sorry about my ambiguity,
>
> In the parsing logic used in the CSV engine EOL is \n
>
> End of field is a "(quotes) followed by a ,(comma) if the given field
> begins with
> "(quotes)
>
> Or it is just a ,(comma) (for the fields that do not begin with a
> "(quotes))
>
> We do not do anything about leading or trailing spaced if the field is
> not
> quoted. It becomes part of the field. This behavior matches that of
> 5.1 and
> 6.0.
>
>>
>>> The current algorithm basically separates the parsing of the
>>> row into
>>> two cases
>>>
>>
>> What is the "current" algorithm here? The one that currently exists in
>> 5.0? Or the new one as described above?
>>
>> In the first case, I'd prefer to have it called "the old algorithm". In
>> the second case, I'd prefer "the proposed algorithm".
>>
>
> Sorry!
>
> Changed as advised!
>>
>>> 1) when the current field has quotes (step 2.1)
>>> 2) when the current field does not have quotes (step 2.2)
>>> 1) is similar to the previous algorithm except that it is
>>> now handled as
>>> as special case of parsing with quotes and includes a check for
>>> testing
>>> that EOL has not been reached before writing into the field
>>> buffer in
>>> step 2.1.1->c)
>>>
>>
>> Does this mean that EOL markers cannot be part of a quoted string? Won't
>> it be better to collect characters until the field length is reached,
>> before reporting HA_ERR_END_OF_FILE? While I write this, I notice that
>> HA_ERR_WRONG_IN_RECORD (Record-file is crashed) may be a better error to
>> return in this case.
>>
>
> EOL is just \n. I have again taken the clue for this behavior from
> 5.1 and 6.0 here.
>
> I put it as HA_ERR_END_OF_FILE because the absence of a trailing quote
> actually results in reaching End of file. The for loop for the case
> when the
> field is enclosed in quotes keeps on looping until EOF is reached.
>
> Ideally speaking the errors returned in 5.1 and 6.0 should be
> replicated here
> but this I thought would be beyond this bug issue. I would have had to
> copy
> the goto err: being used in 5.1 and 6.0 here, it would have almost
> been close
> to adding functionality and would have moved away from the scope of this
> bug.
>> Hm. Forget about this EOL remark. I found that CSV never allowed EOL in
>> quoted strings and doesn't even in 5.1.
>>
>>
>>> ...
>>> === modified file 'sql/examples/ha_tina.cc'
>>> --- a/sql/examples/ha_tina.cc 2008-03-29 15:50:46 +0000
>>> +++ b/sql/examples/ha_tina.cc 2008-11-11 12:53:48 +0000
>>>
>>
>> The code changes look ok on the first glance. But according to my
>> experience, mere code reading does not always reveal all bugs.
>>
>> The code must be tested. I strongly recommend coverage testing. Please
>> see http://forge.mysql.com/wiki/DGCov and its link to the documentation.
>> Even though coverage testing can by far not detect every problem either,
>> it is frequently amazing to watch what the code is doing when I try to
>> add tests so that it runs through all code lines.
>>
>> For this patch you might need a hand-crafted CSV file that contains all
>> situations you can think of. If this doesn't run through all code lines,
>> you may need to add more rows with more cases, until you either have
>> full coverage, or are sure that certain lines cannot be used. Then you
>> can either delete them or attach "purecov" annotations (see dgcov doc).
>>
>
> Done!
>
> narayanan@narayanan-lt:~/Work/mysql/W-M/mysql-5.0-bugteam-39616$ perl
> ~/internals/dev/dgcov/dgcov.pl --uncommitted
>
> -------------------------------------------------------------------------------
>
>
> 0 line(s) in 0 source file(s) modified in revision(s).
> 0 line(s) not covered by tests.
> For documentation, see http://forge.mysql.com/wiki/DGCov_doc
>
>> You can put that CSV file in mysql-test/std-data and copy it over the
>> table file by using the mysqltest commands --remove_file and
>> --copy_file. There are other test cases that use files from std-data for
>> your reference.
>>
>
> Using an external file was not necessary,
>
> I used --write_file
>
>> So in summary I have given some remarks about commenting and about
>> testing. Due to the latter I set the bug report back to "In progress"
>> for now.
>>
>> Regards
>> Ingo
>>
>
> Thanks again!
>
> Narayanan
>