MySQL Lists are EOL. Please join:

List:Commits« Previous MessageNext Message »
From:V Narayanan Date:November 11 2008 12:53pm
Subject:bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712) Bug#39616
View as plain text  
#At file:///home/narayanan/Work/mysql/W-M/mysql-5.0-bugteam-39616/

 2712 V Narayanan	2008-11-11
      Bug#39616: Missing quotes from .CSV crashes server
      
      When a CSV file contained comma separated elements 
      that were not enclosed in quotes, it was causing the
      mysql server to crash.
      
      The algorithm that parsed the content of a row in
      mysql 5.0 was assuming that the values of the fields
      in a .CSV file will be enclosed in quotes and will be
      separated by commas.
      
      This was causing the algorithm to fail when the content
      of the file resembled the following
      3,"with quotes"
      The CSV engine that is part of mysql 5.0 was expecting
      the above to be
      "3","with quotes"
      
      The above is just one example of where the engine was
      failing for what would be recognized as a valid .CSV 
      file content otherwise.
      
      The proposed fix changes the previous algorithm being used
      to parse rows from the .CSV file to handle two separate
      cases
      
      1) When the current field of the row is enclosed in quotes
      2) When the current field of the row is not enclosed in quotes
modified:
  sql/examples/ha_tina.cc

per-file messages:
  sql/examples/ha_tina.cc
    The function ha_tina::find_current_row(byte *buf)
    contains the logic used to parse rows from the 
    .CSV file.
    
    The proposed fix uses the following algorithm to handle
    the parsing of a row in the .CSV file
    
    BEGIN
    1) Store the EOL (end of line) for the current row
    2) Until all the fields in the current query have not been 
       filled
       2.1) If the current character begins with a quote
            2.1.1) Until EOL has not been reached
                   a) If end of current field is reached, move
                      to next field and jump to step 2.3
                   b) If current character begins with \\ handle
                      \\n, \\r, \\, \\"
                   c) else append the current character into the buffer
                      before checking that EOL has not been reached.
        2.2) If the current character does not begin with a quote
             2.2.1) Until EOL has not been reached
                    a) If the end of field has been reached move to the
                       next field and jump to step 2.3
                    b) append the current character into the buffer
        2.3) Store the current field value and jump to 2)
    TERMINATE
    
    The current algorithm basically separates the parsing of the row into
    two cases
    
    1) when the current field has quotes (step 2.1)
    2) when the current field does not have quotes (step 2.2)
    
    1) is similar to the previous algorithm except that it is now handled as
    as special case of parsing with quotes and includes a check for testing
    that EOL has not been reached before writing into the field buffer in
    step 2.1.1->c)
    
    2) the current field has not been enclosed in quotes
    and writes the field directly into the field buffer.
=== modified file 'sql/examples/ha_tina.cc'
--- a/sql/examples/ha_tina.cc	2008-03-29 15:50:46 +0000
+++ b/sql/examples/ha_tina.cc	2008-11-11 12:53:48 +0000
@@ -419,34 +419,68 @@ int ha_tina::find_current_row(byte *buf)
   for (Field **field=table->field ; *field ; field++)
   {
     buffer.length(0);
-    mapped_ptr++; // Increment past the first quote
-    for(;mapped_ptr != end_ptr; mapped_ptr++)
+    /* Handle the case where the first character begins with a quote */
+    if (*mapped_ptr == '"')
     {
-      //Need to convert line feeds!
-      if (*mapped_ptr == '"' && 
-          (((mapped_ptr[1] == ',') && (mapped_ptr[2] == '"')) || (mapped_ptr == end_ptr -1 )))
+      /* Increment past the first quote */
+      mapped_ptr++;
+      /* Loop through the row to extract the values for the current field */
+      for(;mapped_ptr != end_ptr; mapped_ptr++)
       {
-        mapped_ptr += 2; // Move past the , and the "
-        break;
-      } 
-      if (*mapped_ptr == '\\' && mapped_ptr != (end_ptr - 1)) 
-      {
-        mapped_ptr++;
-        if (*mapped_ptr == 'r')
-          buffer.append('\r');
-        else if (*mapped_ptr == 'n' )
-          buffer.append('\n');
-        else if ((*mapped_ptr == '\\') || (*mapped_ptr == '"'))
-          buffer.append(*mapped_ptr);
-        else  /* This could only happed with an externally created file */
+        /* check for end of the current field */
+        if (*mapped_ptr == '"' && 
+            (mapped_ptr[1] == ',' || mapped_ptr == end_ptr -1 ))
+        {
+          /* Move past the , and the " */
+          mapped_ptr += 2;
+          break;
+        } 
+        if (*mapped_ptr == '\\' && mapped_ptr != (end_ptr - 1)) 
         {
-          buffer.append('\\');
+          mapped_ptr++;
+          if (*mapped_ptr == 'r')
+            buffer.append('\r');
+          else if (*mapped_ptr == 'n' )
+            buffer.append('\n');
+          else if ((*mapped_ptr == '\\') || (*mapped_ptr == '"'))
+            buffer.append(*mapped_ptr);
+          else  /* This could only happed with an externally created file */
+          {
+            buffer.append('\\');
+            buffer.append(*mapped_ptr);
+          }
+        } 
+        else
+        {
+          /* 
+             If end of row occurs here, it means there has been an error
+             in parsing
+           */
+          if (mapped_ptr == end_ptr -1) DBUG_RETURN(HA_ERR_END_OF_FILE);
+          /* Store current character in the buffer for the field */
           buffer.append(*mapped_ptr);
         }
-      } 
-      else
+      }
+    }
+    else
+    {
+      /* Handle the case where the current row does not start with quotes */
+        
+      /* Loop through the row to extract the values for the current field */
+      for (;mapped_ptr != end_ptr; mapped_ptr++)
+      {
+        /* check for end of current field */
+        if (*mapped_ptr == ',')
+        {
+          /* Increment past the current comma */
+          mapped_ptr++;
+          break;
+        }
+        /* store the current character in the buffer for the field */
         buffer.append(*mapped_ptr);
+      }
     }
+    /* Store the field value from the buffer */
     (*field)->store(buffer.ptr(), buffer.length(), buffer.charset());
   }
   next_position= (end_ptr - share->mapped_file)+1;

Thread
bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712) Bug#39616V Narayanan11 Nov
  • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616Ramil Kalimullin12 Nov
    • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616V Narayanan18 Nov
  • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616Ingo Strüwing12 Nov
    • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616V Narayanan18 Nov
      • Re: bzr commit into mysql-5.0-bugteam branch (v.narayanan:2712)Bug#39616V Narayanan18 Nov