List:Commits« Previous MessageNext Message »
From:paul Date:March 22 2007 7:17pm
Subject:svn commit - mysqldoc@docsrva: r5504 - in trunk: . internals refman-4.1 refman-5.0 refman-5.1
View as plain text  
Author: paul
Date: 2007-03-22 20:17:05 +0100 (Thu, 22 Mar 2007)
New Revision: 5504

Log:
 r22049@polar:  paul | 2007-03-22 13:42:50 -0500
 filesort clarification. (James)


Modified:
   trunk/internals/algorithms.xml
   trunk/refman-4.1/optimization.xml
   trunk/refman-5.0/optimization.xml
   trunk/refman-5.1/optimization.xml

Property changes on: trunk
___________________________________________________________________
Name: svk:merge
   - 4767c598-dc10-0410-bea0-d01b485662eb:/mysqldoc-local/mysqldoc/trunk:22048
7d8d2c4e-af1d-0410-ab9f-b038ce55645b:/mysqldoc-local/mysqldoc:18011
b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:14218
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:14593
   + 4767c598-dc10-0410-bea0-d01b485662eb:/mysqldoc-local/mysqldoc/trunk:22049
7d8d2c4e-af1d-0410-ab9f-b038ce55645b:/mysqldoc-local/mysqldoc:18011
b5ec3a16-e900-0410-9ad2-d183a3acac99:/mysqldoc-local/mysqldoc/trunk:14218
bf112a9c-6c03-0410-a055-ad865cd57414:/mysqldoc-local/mysqldoc/trunk:14593


Modified: trunk/internals/algorithms.xml
===================================================================
--- trunk/internals/algorithms.xml	2007-03-22 19:13:55 UTC (rev 5503)
+++ trunk/internals/algorithms.xml	2007-03-22 19:17:05 UTC (rev 5504)
Changed blocks: 5, Lines Added: 31, Lines Deleted: 14; 3586 bytes

@@ -196,7 +196,9 @@
 
     <title>How MySQL Does Sorting (<literal>filesort</literal>)</title>
 
-<!--  NOTE: This description is also present in refman -->
+    <remark role="note">
+      This description is also present in refman
+    </remark>
 
     <indexterm>
       <primary>filesort optimization</primary>

@@ -208,16 +210,32 @@
     </indexterm>
 
     <para>
-      In those cases where MySQL must sort the result, it uses the
-      following <literal>filesort</literal> algorithm before MySQL 4.1:
+      MySQL has two <literal>filesort</literal> algorithms for sorting
+      and retrieving results. The original method uses only the
+      <literal>ORDER BY</literal> columns. The modified method uses not
+      just the <literal>ORDER BY</literal> columns, but all the columns
+      used in the query.
     </para>
 
+    <para>
+      The optimizer selects which <literal>filesort</literal> algorithm
+      to use. Prior to MySQL 4.1, it uses the original algorithm. As of
+      MySQL 4.1, it normally uses the modified algorithm except when
+      <literal>BLOB</literal> or <literal>TEXT</literal> columns are
+      involved, in which case it uses the original algorithm.
+    </para>
+
+    <para>
+      The original <literal>filesort</literal> algorithm works as
+      follows:
+    </para>
+
     <orderedlist>
 
       <listitem>
         <para>
           Read all rows according to key or by table scanning. Rows that
-          don't match the <literal>WHERE</literal> clause are skipped.
+          do not match the <literal>WHERE</literal> clause are skipped.
         </para>
       </listitem>
 

@@ -290,19 +308,18 @@
     </para>
 
     <para>
-      In MySQL 4.1 and up, a <literal>filesort</literal> optimization is
-      used that records not only the sort key value and row position,
-      but also the columns required for the query. This avoids reading
-      the rows twice. The modified <literal>filesort</literal> algorithm
-      works like this:
+      The modified <literal>filesort</literal> algorithm incorporates an
+      optimization such that it records not only the sort key value and
+      row position, but also the columns required for the query. This
+      avoids reading the rows twice. The modified
+      <literal>filesort</literal> algorithm works like this:
     </para>
 
     <orderedlist>
 
       <listitem>
         <para>
-          Read the rows that match the <literal>WHERE</literal> clause,
-          as before.
+          Read the rows that match the <literal>WHERE</literal> clause.
         </para>
       </listitem>
 

@@ -341,7 +358,7 @@
       exceed the value of the
       <literal>max_length_for_sort_data</literal> system variable. (A
       symptom of setting the value of this variable too high is that you
-      will see high disk activity and low CPU activity.)
+      should see high disk activity and low CPU activity.)
     </para>
 
   </section>

@@ -2790,8 +2807,8 @@
         <para>
           Set this if we should abort the current statement (and any
           multi-line statements) because something went fatally wrong.
-          (for example, a stored procedure should be able to catch this).
-          This is reset by
+          (for example, a stored procedure should be able to catch
+          this). This is reset by
           <literal>mysql_reset_thd_for_next_command()</literal>.
         </para>
       </listitem>


Modified: trunk/refman-4.1/optimization.xml
===================================================================
--- trunk/refman-4.1/optimization.xml	2007-03-22 19:13:55 UTC (rev 5503)
+++ trunk/refman-4.1/optimization.xml	2007-03-22 19:17:05 UTC (rev 5504)
Changed blocks: 2, Lines Added: 23, Lines Deleted: 7; 1995 bytes

@@ -3163,11 +3163,27 @@
       </indexterm>
 
       <para>
-        Prior to MySQL 4.1, in those cases where MySQL must sort the
-        result, it uses the following <literal>filesort</literal>
-        algorithm:
+        MySQL has two <literal>filesort</literal> algorithms for sorting
+        and retrieving results. The original method uses only the
+        <literal>ORDER BY</literal> columns. The modified method uses
+        not just the <literal>ORDER BY</literal> columns, but all the
+        columns used in the query.
       </para>
 
+      <para>
+        The optimizer selects which <literal>filesort</literal>
+        algorithm to use. Prior to MySQL 4.1, it uses the original
+        algorithm. As of MySQL 4.1, it normally uses the modified
+        algorithm except when <literal>BLOB</literal> or
+        <literal>TEXT</literal> columns are involved, in which case it
+        uses the original algorithm.
+      </para>
+
+      <para>
+        The original <literal>filesort</literal> algorithm works as
+        follows:
+      </para>
+
       <orderedlist>
 
         <listitem>

@@ -3248,10 +3264,10 @@
       </para>
 
       <para>
-        In MySQL 4.1 and up, a <literal>filesort</literal> optimization
-        is used that records not only the sort key value and row
-        position, but also the columns required for the query. This
-        avoids reading the rows twice. The modified
+        The modified <literal>filesort</literal> algorithm incorporates
+        an optimization such that it records not only the sort key value
+        and row position, but also the columns required for the query.
+        This avoids reading the rows twice. The modified
         <literal>filesort</literal> algorithm works like this:
       </para>
 


Modified: trunk/refman-5.0/optimization.xml
===================================================================
--- trunk/refman-5.0/optimization.xml	2007-03-22 19:13:55 UTC (rev 5503)
+++ trunk/refman-5.0/optimization.xml	2007-03-22 19:17:05 UTC (rev 5504)
Changed blocks: 2, Lines Added: 115, Lines Deleted: 16; 6178 bytes

@@ -4581,17 +4581,116 @@
       </indexterm>
 
       <para>
-        A <literal>filesort</literal> optimization is used that records
-        not only the sort key value and row position, but the columns
-        required for the query as well. This avoids reading the rows
-        twice. The <literal>filesort</literal> algorithm works like
-        this:
+        MySQL has two <literal>filesort</literal> algorithms for sorting
+        and retrieving results. The original method uses only the
+        <literal>ORDER BY</literal> columns. The modified method uses
+        not just the <literal>ORDER BY</literal> columns, but all the
+        columns used in the query.
       </para>
 
+      <para>
+        The optimizer selects which <literal>filesort</literal>
+        algorithm to use. It normally uses the modified algorithm except
+        when <literal>BLOB</literal> or <literal>TEXT</literal> columns
+        are involved, in which case it uses the original algorithm.
+      </para>
+
+      <para>
+        The original <literal>filesort</literal> algorithm works as
+        follows:
+      </para>
+
       <orderedlist>
 
         <listitem>
           <para>
+            Read all rows according to key or by table scanning. Rows
+            that do not match the <literal>WHERE</literal> clause are
+            skipped.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            For each row, store a pair of values in a buffer (the sort
+            key and the row pointer). The size of the buffer is the
+            value of the <literal>sort_buffer_size</literal> system
+            variable.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            When the buffer gets full, run a qsort (quicksort) on it and
+            store the result in a temporary file. Save a pointer to the
+            sorted block. (If all pairs fit into the sort buffer, no
+            temporary file is created.)
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Repeat the preceding steps until all rows have been read.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Do a multi-merge of up to <literal>MERGEBUFF</literal> (7)
+            regions to one block in another temporary file. Repeat until
+            all blocks from the first file are in the second file.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Repeat the following until there are fewer than
+            <literal>MERGEBUFF2</literal> (15) blocks left.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            On the last multi-merge, only the pointer to the row (the
+            last part of the sort key) is written to a result file.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Read the rows in sorted order by using the row pointers in
+            the result file. To optimize this, we read in a big block of
+            row pointers, sort them, and use them to read the rows in
+            sorted order into a row buffer. The size of the buffer is
+            the value of the <literal>read_rnd_buffer_size</literal>
+            system variable. The code for this step is in the
+            <filename>sql/records.cc</filename> source file.
+          </para>
+        </listitem>
+
+      </orderedlist>
+
+      <para>
+        One problem with this approach is that it reads rows twice: One
+        time when evaluating the <literal>WHERE</literal> clause, and
+        again after sorting the pair values. And even if the rows were
+        accessed successively the first time (for example, if a table
+        scan is done), the second time they are accessed randomly. (The
+        sort keys are ordered, but the row positions are not.)
+      </para>
+
+      <para>
+        The modified <literal>filesort</literal> algorithm incorporates
+        an optimization such that it records not only the sort key value
+        and row position, but also the columns required for the query.
+        This avoids reading the rows twice. The modified
+        <literal>filesort</literal> algorithm works like this:
+      </para>
+
+      <orderedlist>
+
+        <listitem>
+          <para>
             Read the rows that match the <literal>WHERE</literal>
             clause.
           </para>

@@ -4622,20 +4721,20 @@
       </orderedlist>
 
       <para>
-        This algorithm represents a significant improvement over that
-        used in some older versions of MySQL.
+        Using the modified <literal>filesort</literal> algorithm, the
+        tuples are longer than the pairs used in the original method,
+        and fewer of them fit in the sort buffer (the size of which is
+        given by <literal>sort_buffer_size</literal>). As a result, it
+        is possible for the extra I/O to make the modified approach
+        slower, not faster. To avoid a slowdown, the optimization is
+        used only if the total size of the extra columns in the sort
+        tuple does not exceed the value of the
+        <literal>max_length_for_sort_data</literal> system variable. (A
+        symptom of setting the value of this variable too high is that
+        you should see high disk activity and low CPU activity.)
       </para>
 
       <para>
-        To avoid a slowdown, this optimization is used only if the total
-        size of the extra columns in the sort tuple does not exceed the
-        value of the <literal>max_length_for_sort_data</literal> system
-        variable. (A symptom of setting the value of this variable too
-        high is that you should see high disk activity and low CPU
-        activity.)
-      </para>
-
-      <para>
         If you want to increase <literal>ORDER BY</literal> speed, check
         whether you can get MySQL to use indexes rather than an extra
         sorting phase. If this is not possible, you can try the


Modified: trunk/refman-5.1/optimization.xml
===================================================================
--- trunk/refman-5.1/optimization.xml	2007-03-22 19:13:55 UTC (rev 5503)
+++ trunk/refman-5.1/optimization.xml	2007-03-22 19:17:05 UTC (rev 5504)
Changed blocks: 2, Lines Added: 115, Lines Deleted: 16; 6178 bytes

@@ -4611,17 +4611,116 @@
       </indexterm>
 
       <para>
-        A <literal>filesort</literal> optimization is used that records
-        not only the sort key value and row position, but the columns
-        required for the query as well. This avoids reading the rows
-        twice. The <literal>filesort</literal> algorithm works like
-        this:
+        MySQL has two <literal>filesort</literal> algorithms for sorting
+        and retrieving results. The original method uses only the
+        <literal>ORDER BY</literal> columns. The modified method uses
+        not just the <literal>ORDER BY</literal> columns, but all the
+        columns used in the query.
       </para>
 
+      <para>
+        The optimizer selects which <literal>filesort</literal>
+        algorithm to use. It normally uses the modified algorithm except
+        when <literal>BLOB</literal> or <literal>TEXT</literal> columns
+        are involved, in which case it uses the original algorithm.
+      </para>
+
+      <para>
+        The original <literal>filesort</literal> algorithm works as
+        follows:
+      </para>
+
       <orderedlist>
 
         <listitem>
           <para>
+            Read all rows according to key or by table scanning. Rows
+            that do not match the <literal>WHERE</literal> clause are
+            skipped.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            For each row, store a pair of values in a buffer (the sort
+            key and the row pointer). The size of the buffer is the
+            value of the <literal>sort_buffer_size</literal> system
+            variable.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            When the buffer gets full, run a qsort (quicksort) on it and
+            store the result in a temporary file. Save a pointer to the
+            sorted block. (If all pairs fit into the sort buffer, no
+            temporary file is created.)
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Repeat the preceding steps until all rows have been read.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Do a multi-merge of up to <literal>MERGEBUFF</literal> (7)
+            regions to one block in another temporary file. Repeat until
+            all blocks from the first file are in the second file.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Repeat the following until there are fewer than
+            <literal>MERGEBUFF2</literal> (15) blocks left.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            On the last multi-merge, only the pointer to the row (the
+            last part of the sort key) is written to a result file.
+          </para>
+        </listitem>
+
+        <listitem>
+          <para>
+            Read the rows in sorted order by using the row pointers in
+            the result file. To optimize this, we read in a big block of
+            row pointers, sort them, and use them to read the rows in
+            sorted order into a row buffer. The size of the buffer is
+            the value of the <literal>read_rnd_buffer_size</literal>
+            system variable. The code for this step is in the
+            <filename>sql/records.cc</filename> source file.
+          </para>
+        </listitem>
+
+      </orderedlist>
+
+      <para>
+        One problem with this approach is that it reads rows twice: One
+        time when evaluating the <literal>WHERE</literal> clause, and
+        again after sorting the pair values. And even if the rows were
+        accessed successively the first time (for example, if a table
+        scan is done), the second time they are accessed randomly. (The
+        sort keys are ordered, but the row positions are not.)
+      </para>
+
+      <para>
+        The modified <literal>filesort</literal> algorithm incorporates
+        an optimization such that it records not only the sort key value
+        and row position, but also the columns required for the query.
+        This avoids reading the rows twice. The modified
+        <literal>filesort</literal> algorithm works like this:
+      </para>
+
+      <orderedlist>
+
+        <listitem>
+          <para>
             Read the rows that match the <literal>WHERE</literal>
             clause.
           </para>

@@ -4652,20 +4751,20 @@
       </orderedlist>
 
       <para>
-        This algorithm represents a significant improvement over that
-        used in some older versions of MySQL.
+        Using the modified <literal>filesort</literal> algorithm, the
+        tuples are longer than the pairs used in the original method,
+        and fewer of them fit in the sort buffer (the size of which is
+        given by <literal>sort_buffer_size</literal>). As a result, it
+        is possible for the extra I/O to make the modified approach
+        slower, not faster. To avoid a slowdown, the optimization is
+        used only if the total size of the extra columns in the sort
+        tuple does not exceed the value of the
+        <literal>max_length_for_sort_data</literal> system variable. (A
+        symptom of setting the value of this variable too high is that
+        you should see high disk activity and low CPU activity.)
       </para>
 
       <para>
-        To avoid a slowdown, this optimization is used only if the total
-        size of the extra columns in the sort tuple does not exceed the
-        value of the <literal>max_length_for_sort_data</literal> system
-        variable. (A symptom of setting the value of this variable too
-        high is that you should see high disk activity and low CPU
-        activity.)
-      </para>
-
-      <para>
         If you want to increase <literal>ORDER BY</literal> speed, check
         whether you can get MySQL to use indexes rather than an extra
         sorting phase. If this is not possible, you can try the


Thread
svn commit - mysqldoc@docsrva: r5504 - in trunk: . internals refman-4.1 refman-5.0 refman-5.1paul22 Mar