Author: malff
Date: 2007-04-12 20:32:41 +0200 (Thu, 12 Apr 2007)
New Revision: 5853
Log:
Initial revision
Added:
trunk/internals/stored-program.xml
Added: trunk/internals/stored-program.xml
===================================================================
--- trunk/internals/stored-program.xml (rev 0)
+++ trunk/internals/stored-program.xml 2007-04-12 18:32:41 UTC (rev 5853)
Changed blocks: 1, Lines Added: 2512, Lines Deleted: 0; 82023 bytes
@@ -0,0 +1,2512 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
+"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
+[
+ <!ENTITY % fixedchars.entities SYSTEM "../common/fixedchars.ent">
+ %fixedchars.entities;
+ <!ENTITY % urls.entities SYSTEM "../refman-common/urls.ent">
+ %urls.entities;
+]>
+<!--
+ Copyright (c) 2007, MySQL AB.
+
+ This file is NOT distributed under a GNU GPL or GNU GDL license.
+ See the file {mysqldoc}/trunk/internals/legalnotice.en.xml for terms of use.
+-->
+<chapter id="stored-program">
+
+ <title>Stored Programs</title>
+
+ <section id="sp-overview">
+
+ <title>Overview</title>
+
+ <para>
+ Stored Programs in general refers to:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>PROCEDURE</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>FUNCTION</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>TABLE TRIGGER</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>EVENT</literal> (Starting with 5.1)
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ When developing, there are a couple of tools available in the
+ server itself that are helpful. These tools are only available in
+ 'DEBUG' builds:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>SHOW PROCEDURE CODE</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>SHOW FUNCTION CODE</literal>
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ The equivalent for triggers or events is not available at this
+ point.
+ </para>
+
+ <para>
+ The internal implementation of Stored Programs in the server
+ depends on several components:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ The storage layer, used to store in the database itself a
+ program (hence the name <emphasis>stored</emphasis> program),
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The internal memory representation of a Stored Program, used
+ within the server implementation,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The SQL parser, used to convert a Stored Program from it's
+ persistent representation to it's internal form,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ A flow analyser, used to optimize the code representing a
+ stored program,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Various caches, used to improve performances by avoiding the
+ need to load and parse a stored program at every invocation.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The Stored Program runtime execution itself, which interprets
+ the code of the program and executes its statements.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ </section>
+
+ <section id="sp-storage">
+
+ <title>Persistent representation</title>
+
+ <para>
+ Storage of Stored Programs is implemented using either tables in
+ the database (in the <literal>mysql</literal> schema), or physical
+ files.
+ </para>
+
+ <section id="sp-storage-sp_sf">
+
+ <title>Stored Procedure and Stored Function</title>
+
+ <para>
+ The table <literal>mysql.proc</literal> contains one record per
+ Stored Procedure or Stored Function. Note that this table design
+ is a mix of relational and non relational (blob) content:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Attributes that are part of the interface of a stored
+ procedure or function (like its name, return type, etc), or
+ that are global to the object (implementation language,
+ deterministic properties, security properties, sql mode,
+ etc) are stored with a dedicated column in the
+ <literal>mysql.proc</literal> table.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The body of a stored procedure or function, which consists
+ of the original code expressed in SQL, including user
+ comments if any, is stored as-is preserving the original
+ indentation in blob column 'body'.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ This design choice allows to represent the various attributes in
+ a format which is easy to work with (relational model), while
+ allowing a lot of flexibility for the content of the body.
+ </para>
+
+ <para>
+ A minor exception to this is the storage of the parameters of a
+ stored procedure or function (which are part of its interface)
+ inside the blob column <literal>param_list</literal> (instead of
+ using a child <quote>proc_param</quote> table).
+ </para>
+
+ <para>
+ Table <literal>mysql.procs_priv</literal> describes privileges
+ granted to a given Stored Procedure or Stored Function in
+ <literal>mysql.proc</literal>.
+ </para>
+
+ <para>
+ The code used to encapsulate database access is:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>db_create_routine()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>db_load_routine()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>db_drop_routine()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>mysql_routine_grant()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>grant_load()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>grant_reload()</literal>
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ </section>
+
+ <section id="sp-storage-trigger">
+
+ <title>Table Trigger</title>
+
+ <para>
+ Trigger definitions are stored in plain text files in the
+ directory that contains the schema objects.
+ </para>
+
+ <para>
+ The file <filename><schema>/<trigger>.TRN</filename>
+ is the <literal>TRIGGERNAME</literal> file. It represents the
+ fact that the object named <quote>trigger</quote> is a table
+ trigger, and point to the table the trigger is attached to.
+ Every trigger has a dedicated <filename>*.TRN</filename> file.
+ This design decision is used to facilitate operating system
+ filesystem services to enforce the SQL standard requirement that
+ all triggers in a given schema must be unique.
+ </para>
+
+ <para>
+ The file <filename><schema>/<table>.TRG</filename>
+ is the <literal>TRIGGERS</literal> file. It represents all the
+ table triggers attached to a given table, so this file can
+ contain triggers for multiple events (<literal>BEFORE</literal>
+ / <literal>AFTER</literal>, <literal>INSERT</literal> /
+ <literal>UPDATE</literal> / <literal>DELETE</literal>).
+ Currently it is impossible to have more than one trigger per
+ table for a given trigger action time and type, hence this file
+ may contain at most six trigger definitions.
+ </para>
+
+ <para>
+ The code used to encapsulate file access is:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>Table_triggers_list::create_trigger()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>Table_triggers_list::drop_trigger()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>Table_triggers_list::check_n_load()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>Table_triggers_list::drop_all_triggers()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>Table_triggers_list::change_table_name()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ See the C++ class <literal>Table_triggers_list</literal> in
+ general.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ Using files for triggers is due to historical reasons, and
+ follows the same design as <filename>*.FRM</filename> files for
+ table metadata. This approach has several drawbacks:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Each file has yet another text file format, which is
+ necessary to print and parse back correctly. Custom code has
+ to be implemented, which is consuming in terms of resources,
+ and introduces technical risk or code duplication.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Tables are replicated, values in columns are checked for
+ data validity, integrity constraints can be defined ...
+ where none of the above is available with a file based
+ implementation.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ With tables, the default locking, transaction and isolation
+ mechanism used by the server in general can be leveraged,
+ but the same is not available with files.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Cluster support for any new metadata operation that operates
+ on files will require a custom solution. E.g. to propagate
+ CREATE TABLE statement across MySQL Cluster mysqld nodes we
+ use a so-called <quote>.FRM shipping</quote> technique.
+ There is no similar solution implemented for triggers at
+ this point, and thus a trigger created in one mysqld node
+ does not automatically become visible on other nodes.
+ Potentially, if data is stored in tables, cluster support
+ may be added as simply as by issuing <literal>ALTER TABLE
+ mysql.triggers ENGINE=NDB</literal>;
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <note>
+ <para>
+ Various drawbacks of filesystem based solution are provided in
+ this chapter only for a sake of example. Other advantages and
+ disadvantages of two approaches may be found in relevant
+ worklog entries and design documents.
+ </para>
+ </note>
+
+ <warning>
+ <para>
+ The current implementation of the storage layer for table
+ triggers is considered <emphasis>private</emphasis> to the
+ server, and might change without warnings in future releases.
+ </para>
+ </warning>
+
+ </section>
+
+ <section id="sp-storage-event">
+
+ <title>Event</title>
+
+ <para>
+ Events storage is very similar to Stored Procedure and Stored
+ Function storage, and shares the same design. Since more
+ attributes are needed to represent an event, a different table
+ is used: <literal>mysql.event</literal>.
+ </para>
+
+ <para>
+ The code used to encapsulate the database access is:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>Event_db_repository::create_event()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>Event_db_repository::update_event()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>Event_db_repository::drop_event()</literal>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ See the C++ class <literal>Event_db_repository</literal> in
+ general.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ </section>
+
+ <section id="sp-storage-derived">
+
+ <title>Derived attributes</title>
+
+ <para>
+ Some critical attributes, like <literal>SQL_MODE</literal>, are
+ explicitly part of the storage format.
+ </para>
+
+ <para>
+ Other attributes, that also impact significantly the behavior in
+ general of Stored Programs, can be <emphasis>implicitly
+ derived</emphasis> from other properties of the storage layer.
+ In particular:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ The <literal>USE <database></literal> in effect for a
+ stored program is the <literal>schema</literal> the stored
+ object belongs to.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The statement <literal>DECLARE v CHAR(10)</literal> does not
+ intrinsically convey any notion of character set or
+ collation. The character set and collation of this local
+ variable, in a stored program, derives from the character
+ set and collation of the <literal>schema</literal> the
+ stored object belongs to.
+ </para>
+ </listitem>
+
+<!--
+ SELECT 'text_constant'; uses the system charset and collation ?
+ TODO: clarify if this is really intended, and document it,
+ or if it's a bug. To check with Alik
+ -->
+
+ </itemizedlist>
+
+ </section>
+
+ </section>
+
+ <section id="sp-internal">
+
+ <title>Internal representation</title>
+
+ <para>
+ A Stored Program is represented in memory by two major parts:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ The code of the stored program, including SQL statements and
+ control flow logic (IF, WHILE, ...),
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ A symbol table, that describes all the local variables,
+ cursors, labels, conditions ... declared in the code.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ Individual instructions of various kind are implemented by all the
+ C++ classes that inherit from class <literal>sp_instr</literal>.
+ The symbol table ('symbol table' is a term used in conjunction
+ with compilers or interpreters, in <literal>MySQL</literal> the
+ term 'Parsing Context' is used instead) is implemented by the C++
+ class <literal>sp_pcontext</literal>. A Stored Program as a whole
+ is represented by the C++ class <literal>sp_head</literal>, which
+ contains the instructions (array <literal>m_instr</literal>) and
+ the root parsing context (member <literal>m_pcont</literal>).
+ </para>
+
+ <caution>
+ <para>
+ Class <literal>sp_head</literal> contains concepts from
+ different areas: it represents both what a stored program
+ <emphasis>is</emphasis>, which is the topic of this section, and
+ how a stored program logic <emphasis>is used </emphasis>during
+ runtime interpretation, which is the subject of other sections.
+ </para>
+ </caution>
+
+ <section id="sp-internal-instr">
+
+ <title>Instructions</title>
+
+ <para>
+ Data Definition Language and Data Manipulation Language SQL
+ statements are represented as-is, by a single instruction. For
+ flow control statements and exception handlers, several
+ instructions are used to implement in the low level
+ <literal>sp_instr</literal> language the semantic of the SQL
+ construct.
+ </para>
+
+ <para>
+ Let's see an example with a stored procedure:
+ </para>
+
+<programlisting>
+delimiter $$
+
+CREATE PROCEDURE proc_1(x int)
+BEGIN
+ IF x < 0 THEN
+ INSERT INTO t1 VALUES ("negative");
+ ELSEIF x = 0 THEN
+ INSERT INTO t1 VALUES ("zero");
+ ELSE
+ INSERT INTO t1 VALUES ("positive");
+ END IF;
+END$$
+</programlisting>
+
+ <para>
+ The resulting code, displayed by SHOW PROCEDURE CODE, is:
+ </para>
+
+<programlisting>
+SHOW PROCEDURE CODE proc_1;
+Pos Instruction
+0 jump_if_not 3(7) (x@0 < 0)
+1 stmt 5 "INSERT INTO t1 VALUES ("negative")"
+2 jump 7
+3 jump_if_not 6(7) (x@0 = 0)
+4 stmt 5 "INSERT INTO t1 VALUES ("zero")"
+5 jump 7
+6 stmt 5 "INSERT INTO t1 VALUES ("positive")"
+</programlisting>
+
+ <para>
+ Instructions are numbered sequentially. Position 0 is the start
+ of the code, while the position 7 in this example, that is 1
+ past the last instruction, represents the end of the code.
+ </para>
+
+ <para>
+ Note that the instruction <literal>jump_if_not 3(7)</literal> at
+ position 0 can actually jump to <emphasis>three</emphasis>
+ locations:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ When the evaluation of the condition "x < 0" is true, the
+ next instruction will be position 1 (the "then" branch),
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ When the evaluation of the condition "x < 0" is false,
+ the next instruction will be position 3 (the "else" branch),
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ When the evaluation of the condition "x < 0" results in
+ an error, and when a continue handler exists for the error,
+ the next instruction will be position 7, known as the
+ "continuation" destination.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ Now, let's see how exception handlers are represented. The
+ following code contains just a very basic handler, protecting a
+ BEGIN/END block in the SQL logic:
+ </para>
+
+<programlisting>
+CREATE PROCEDURE proc_2(x int)
+BEGIN
+ SELECT "Start";
+
+ INSERT INTO t1 VALUES (1);
+
+ BEGIN
+ DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
+ BEGIN
+ SELECT "Oops";
+ END;
+
+ INSERT INTO t1 VALUES (2);
+ INSERT INTO t1 VALUES (2);
+ END;
+
+ INSERT INTO t1 VALUES (3);
+ SELECT "Finish";
+END$$
+</programlisting>
+
+ <para>
+ The internal instructions for this stored procedure are:
+ </para>
+
+<programlisting>
+SHOW PROCEDURE CODE proc_2;
+Pos Instruction
+0 stmt 0 "SELECT "Start""
+1 stmt 5 "INSERT INTO t1 VALUES (1)"
+2 hpush_jump 5 1 CONTINUE
+3 stmt 0 "SELECT "Oops""
+4 hreturn 1
+5 stmt 5 "INSERT INTO t1 VALUES (2)"
+6 stmt 5 "INSERT INTO t1 VALUES (2)"
+7 hpop 1
+8 stmt 5 "INSERT INTO t1 VALUES (3)"
+9 stmt 0 "SELECT "Finish""
+</programlisting>
+
+ <para>
+ Note the flow of control in the code: there is not a single if.
+ The couple of <literal>hpush_jump</literal> /
+ <literal>hpop</literal> represent the installation and the
+ removal of the exception handler. The body of the exception
+ handler starts at position 3, while the code protected by the
+ handler starts at position 5. <literal>hpush_jump 5 1</literal>
+ means: add a handler for "1" condition
+ (<literal>sqlexception</literal>), where "1" stands for the
+ index of declared conditions in the parsing context, and execute
+ the code starting at position "5".
+ </para>
+
+ </section>
+
+ <section id="sp-internal-pcontext">
+
+ <title>Parsing Context</title>
+
+ <para>
+ A parsing context is a tree of nodes, where each node contains
+ symbols (variables, cursors, labels, ...) declared locally in
+ the same name visibility scope.
+ </para>
+
+ <para>
+ For example, with the following SQL code:
+ </para>
+
+<programlisting>
+CREATE PROCEDURE proc_3(x int, y int)
+BEGIN
+ -- This is the root parsing context
+ DECLARE v1 INT;
+ DECLARE v2 INT;
+ DECLARE v3 INT;
+
+ IF (x > 0) THEN
+ BEGIN
+ -- This is the child context A
+ DECLARE v1 INT;
+ DECLARE v4 INT DEFAULT 100;
+
+ set v4:= 1;
+ set v1:= x;
+ END;
+ ELSE
+ BEGIN
+ -- This is the child context B
+ DECLARE v2 INT;
+ DECLARE v4 INT DEFAULT 200;
+
+ set v4:= 2;
+ set v2:= y;
+ set v3:= 3;
+ END;
+ END IF;
+
+ set v1 := 4;
+END$$
+</programlisting>
+
+ <para>
+ The parsing contexts match exactly the nesting of BEGIN/END
+ blocks:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ The root parsing context contains parameters
+ <quote>x</quote>, <quote>y</quote>, and local variables
+ <quote>v1</quote>, <quote>v2</quote>, <quote>v3</quote>,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The BEGIN / END block in the "THEN" part defines a child
+ parsing context (let's call it 'A'), that contains local
+ variables <quote>v1</quote> and <quote>v4</quote>,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Likewise, the "ELSE" block defines a parsing context (let's
+ call it 'B') which is a child of the root, and contains
+ local variables <quote>v2</quote> and <quote>v4</quote>.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ The total number of symbols is 9: 5 for the root + 2 for A + 2
+ for B. All the symbols are numbered internally (starting at
+ offset 0), by walking the parsing context tree in a depth first
+ manner, resulting in the following:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Root:x --> 0, Root:y --> 1, Root:v1 --> 2, Root:v2 --> 3,
+ Root:v3 --> 4,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ A:v1 --> 5, A:v4 --> 6,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ B:v2 --> 7, B:v4 --> 8,
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ There is no tool to dump the parsing context tree explicitly.
+ However, the internal numbering of symbols is apparent when
+ printing the code:
+ </para>
+
+<programlisting>
+SHOW PROCEDURE CODE proc_3;
+Pos Instruction
+0 set v1@2 NULL
+1 set v2@3 NULL
+2 set v3@4 NULL
+3 jump_if_not 9(14) (x@0 > 0)
+4 set v1@5 NULL
+5 set v4@6 100
+6 set v4@6 1
+7 set v1@5 x@0
+8 jump 14
+9 set v2@7 NULL
+10 set v4@8 200
+11 set v4@8 2
+12 set v2@7 y@1
+13 set v3@4 3
+14 set v1@2 4
+</programlisting>
+
+ <para>
+ The points of interest are that:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ There are <emphasis>two</emphasis> variables named
+ <quote>v1</quote>, where the variable v1 from block A
+ (represented as v1@5) eclipses the variable v1 from the root
+ block (represented as v1@2).
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ There are <emphasis>two</emphasis> variables named
+ <quote>v4</quote>, which are independent. The variable v4
+ from block A is represented as v4@6, while the variable v4
+ from block B is represented as v4@8.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ The parsing context C++ class, <literal>sp_pcontext</literal>,
+ contains much more information related to each symbol, like
+ notably <emphasis>data types</emphasis> of variables
+ (unfortunately not printable with SHOW PROCEDURE CODE).
+ </para>
+
+ </section>
+
+ </section>
+
+ <section id="sp-parser">
+
+ <title>Stored Programs Parser</title>
+
+ <para>
+ There is no 'Stored Program Parser' as such, there is only one
+ parser in the SQL layer in the server. This parser is capable of
+ understanding every SQL statement, including statements related to
+ Stored Programs. The Parser is implemented as an ascendant parser,
+ using bison. The source code is located in the file
+ <filename>sql/sql_yacc.yy</filename>.
+ </para>
+
+ <para>
+ The parts of the parser dedicated more specially to Stored
+ Programs are starting at the following rules:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>CREATE PROCEDURE</literal>: see rule
+ <literal>sp_tail</literal>,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>CREATE FUNCTION</literal>: see rule
+ <literal>sp_tail</literal>,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>CREATE TRIGGER</literal>: see rule
+ <literal>trigger_tail</literal>,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>CREATE EVENT</literal>: see rule
+ <literal>event_tail</literal>.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ In every case, the parser reads the SQL text stream that
+ represents the code as input, and creates an internal
+ representation of the Stored Program as output, with one C++
+ object of type <literal>sp_head</literal>. A limiting consequence
+ of this approach is that a stored program does not support
+ nesting: it is impossible to embed one <literal>CREATE
+ PROCEDURE</literal> into another, since the parser currently may
+ only support one <literal>sp_head</literal> object at a time.
+ </para>
+
+ <section id="sp-parser-struct">
+
+ <title>Parser Structure</title>
+
+ <para>
+ Conceptually, there are many different layers involved during
+ parsing:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Lexical analysis (making words or tokens from a character
+ stream),
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Syntactic analysis (making "sentences" or an abstract syntax
+ tree from the tokens),
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Semantic analysis (making sure these sentences do make
+ sense),
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Code generation (for compilers) or evaluation (for
+ interpreters).
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ From the implementation point or view, many different concepts
+ from different layers actually collide in the same code base, so
+ that the actual code organization is as follows:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ The lexical analysis is implemented in
+ <filename>sql/sql_lex.cc</filename>, as when parsing regular
+ statements.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Syntactic analysis, semantic analysis, and code generation
+ -- <emphasis>all of them</emphasis> -- are done at
+ <emphasis>once</emphasis>, during parsing of the code. From
+ that perspective, the parser behaves as a single pass
+ compiler. In other words, both the code and the symbol table
+ for the final result are generated on the fly, interleaved
+ with syntactic analysis.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ This is both very efficient from a performance point of view,
+ but difficult to understand, from a maintenance point of view.
+ </para>
+
+ <para>
+ Let's illustrate for example how the following SQL statement is
+ parsed:
+ </para>
+
+<programlisting>
+ DECLARE my_cursor CURSOR FOR SELECT col1 FROM t1;
+</programlisting>
+
+ <para>
+ The corresponding part of the grammar in the parser for DECLARE
+ CURSOR statements is the following (with annotated line
+ numbers):
+ </para>
+
+<programlisting>
+[ 1] sp_decl:
+[ 2] DECLARE_SYM ident CURSOR_SYM FOR_SYM sp_cursor_stmt
+[ 3] {
+[ 4] LEX *lex= Lex;
+[ 5] sp_head *sp= lex->sphead;
+[ 6] sp_pcontext *ctx= lex->spcont;
+[ 7] uint offp;
+[ 8] sp_instr_cpush *i;
+[ 9]
+[10] if (ctx->find_cursor(&$2, &offp, TRUE))
+[11] {
+[12] my_error(ER_SP_DUP_CURS, MYF(0), $2.str);
+[13] delete $5;
+[14] MYSQL_YYABORT;
+[15] }
+[16] i= new sp_instr_cpush(sp->instructions(), ctx, $5,
+[17] ctx->current_cursor_count());
+[18] sp->add_instr(i);
+[19] ctx->push_cursor(&$2);
+[20] $$.vars= $$.conds= $$.hndlrs= 0;
+[21] $$.curs= 1;
+[22] }
+[23] ;
+</programlisting>
+
+ <para>
+ The lines [1], [2] and [23] are bison code that express the
+ structure of the grammar. These lines belong to the syntactic
+ parsing realm.
+ </para>
+
+ <para>
+ The lines [3] and [22] are bison delimiter for the associated
+ action to execute, when parsing of the syntax succeeds.
+ Everything between lines [3] and [22] is C++ code, executed when
+ the parser finds a syntactically correct DECLARE CURSOR
+ statement.
+ </para>
+
+ <para>
+ The lines [4] to [8] could be considered syntactic parsing: what
+ the code does is find what is the current Stored Program being
+ parsed, find the associated part of the syntax tree under
+ construction (<literal>sp_head</literal>), and find the
+ associated current context in the symbol table
+ (<literal>sp_pcontext</literal>).
+ </para>
+
+ <para>
+ Note that there is some black magic here: since we are still
+ currently parsing the <emphasis>content</emphasis> of a Stored
+ Program (the DECLARE CURSOR statement), the final "syntax" tree
+ for the Stored Program (<literal>sp_head</literal>) is not
+ supposed to exist yet. The reason the <literal>sp_head</literal>
+ object is already available is that the actions in the CREATE
+ PROCEDURE / CREATE FUNCTION / CREATE TRIGGER / CREATE EVENT are
+ implemented as a <emphasis>descendant</emphasis> parser (it
+ created an empty <literal>sp_head</literal> object first,
+ filling the content later). Mixing code that way (descendant
+ actions with ascendant parsing) is extremely sensitive to
+ changes.
+ </para>
+
+ <para>
+ The line [10] is a semantic check. The statement might be
+ syntactically correct (it parsed), but to be semantically
+ correct, the <emphasis>name</emphasis> or the cursor must be
+ unique in the symbol table.
+ </para>
+
+ <para>
+ Line [12] is reporting a semantic error back to the client
+ (duplicate cursor), while the code at line [14] forces the
+ syntactic parser (bison) to abort.
+ </para>
+
+ <para>
+ By line [16], we have verified that the code is syntactically
+ valid, and semantically valid: it's now time for code
+ generation, implemented by creating a new
+ <literal>sp_instr_cpush</literal> to represent the cursor in the
+ compiled code. Note that variable allocation is done on the fly,
+ by looking up the current cursor count in the symbol table
+ (<literal>sp_pcontext::current_cursor_count()</literal>).
+ </para>
+
+ <para>
+ Line [18] adds the generated code to the object representing the
+ stored program (code generation).
+ </para>
+
+ <para>
+ Line [19] maintains the symbol table (semantic parsing) by
+ adding the new cursor in the current local context.
+ </para>
+
+ <para>
+ Lines [20] and [21] return to bison a fragment of a fake syntax
+ tree, indicating that one cursor was found.
+ </para>
+
+ <para>
+ By looking at the complete implementation of this action in
+ bison, one should note that the target code was generated, the
+ symbol table for the Stored Program was looked up and updated,
+ while at no point in time a syntax node was even created. Note
+ that the <literal>sp_instr_cpush</literal> object should really
+ be considered generated code: the fact that there is a
+ one-to-one correspondence with the syntax is incidental.
+ </para>
+
+ </section>
+
+ <section id="sp-parser-codegen">
+
+ <title>Single pass code generation</title>
+
+ <para>
+ All the code generated by the parser is emitted in a
+ <emphasis>single pass</emphasis>. For example, consider the
+ following SQL logic:
+ </para>
+
+<programlisting>
+CREATE FUNCTION func_4(i int)
+RETURNS CHAR(10)
+BEGIN
+ DECLARE str CHAR(10);
+
+ CASE i
+ WHEN 1 THEN SET str="1";
+ WHEN 2 THEN SET str="2";
+ WHEN 3 THEN SET str="3";
+ ELSE SET str="unknown";
+ END CASE;
+
+ RETURN str;
+END$$
+</programlisting>
+
+ <para>
+ The compiled program for this Stored Function is:
+ </para>
+
+<programlisting>
+SHOW FUNCTION CODE func_4;
+Pos Instruction
+0 set str@1 NULL
+1 set_case_expr (12) 0 i@0
+2 jump_if_not 5(12) (case_expr@0 = 1)
+3 set str@1 _latin1'1'
+4 jump 12
+5 jump_if_not 8(12) (case_expr@0 = 2)
+6 set str@1 _latin1'2'
+7 jump 12
+8 jump_if_not 11(12) (case_expr@0 = 3)
+9 set str@1 _latin1'3'
+10 jump 12
+11 set str@1 _latin1'unknown'
+12 freturn 254 str@1
+</programlisting>
+
+ <para>
+ Note the instruction at position 4: <literal>jump 12</literal>.
+ How can the compiler generate this instruction in a single pass,
+ when the destination (12) is not known yet ? This instruction is
+ a <emphasis>forward</emphasis> jump. What happens during code
+ generation is that, by the time the compiler has generated the
+ code for positions [0] to [11], the generated code looks like
+ this:
+ </para>
+
+<programlisting>
+Pos Instruction
+0 set str@1 NULL
+1 set_case_expr ( ?? ) 0 i@0
+2 jump_if_not 5( ?? ) (case_expr@0 = 1)
+3 set str@1 _latin1'1'
+4 jump ??
+5 jump_if_not 8( ?? ) (case_expr@0 = 2)
+6 set str@1 _latin1'2'
+7 jump ??
+8 jump_if_not 11( ?? ) (case_expr@0 = 3)
+9 set str@1 _latin1'3'
+10 jump ??
+11 set str@1 _latin1'unknown'
+...
+</programlisting>
+
+ <para>
+ The final destination of the label for the END CASE is not known
+ yet, and the list of all the instructions (1, 2, 4, 5, 7, 8 and
+ 10) that need to point to this unknown destination (represented
+ as "??") is maintained in a temporary structure used during code
+ generation only. This structure is called the context back patch
+ list.
+ </para>
+
+ <para>
+ When the destination label is finally resolved to a destination
+ (12), all the instructions pointing to that label, which have
+ been already generated (but with a bogus destination) are
+ <emphasis>back patched</emphasis> to point to the correct
+ location. See the comments marked <literal>BACKPATCH</literal>
+ in the code for more details.
+ </para>
+
+ <para>
+ As a side note, this generated code also shows that some
+ temporary variables can be generated implicitly, like the
+ operand of the CASE expression, labeled "case_expr@0".
+ </para>
+
+ <caution>
+ <para>
+ Numbering of case expressions in the symbol table uses a
+ different name space than variables, so that "case_expr@0" and
+ "i@0" are two different variables, even when both internally
+ numbered with offset zero.
+ </para>
+ </caution>
+
+ </section>
+
+ </section>
+
+ <section id="sp-optimizer">
+
+ <title>Flow analysis optimizations</title>
+
+ <para>
+ After code is generated, the low level <literal>sp_instr</literal>
+ instructions are optimized. The optimization focuses on two areas:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Dead code removal,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Jump shortcut resolution.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ These two optimizations are performed together, as they both are a
+ problem involving <emphasis>flow analysis</emphasis> in the
+ <emphasis>graph</emphasis> that represents the generated code.
+ </para>
+
+ <para>
+ The code that implements these optimizations is
+ <literal>sp_head::optimize()</literal>.
+ </para>
+
+ <caution>
+ <para>
+ Do not confuse <literal>sp_head::optimize()</literal> with the
+ component named the <emphasis>optimizer</emphasis>, as they are
+ very different. The former is specific to Stored Programs, and
+ focuses on improving the flow of statements, while the later is
+ general to queries, and focuses on finding the best execution
+ plan when executing a single statement. For the optimizer, see
+ <xref linkend="optimizer"/>.
+ </para>
+ </caution>
+
+ <para>
+ The (Stored Program) optimizer is invoked from only one place, in
+ the following code:
+ </para>
+
+<programlisting>
+db_load_routine(..., sp_head **sphp, ...)
+{
+ ...
+ (*sphp)->optimize();
+ ...
+}
+</programlisting>
+
+ <tip>
+ <para>
+ By disabling the call to <literal>sp_head::optimize()</literal>
+ and recompiling the code, <literal>SHOW PROCEDURE CODE</literal>
+ will display the code <emphasis>before</emphasis> flow
+ optimization.
+ </para>
+ </tip>
+
+ <caution>
+ <para>
+ When investigating issues related to this area, you may want to
+ use a <literal>DBUG_EXECUTE_IF</literal> to avoid recompiling
+ the server with or without flow optimization every time. Be
+ careful to shutdown and restart the server with or without the
+ call to <literal>sp_head::optimize()</literal> for each test, or
+ you will find that <emphasis>caching</emphasis> of a Stored
+ Program code does interfere.
+ </para>
+ </caution>
+
+ <section id="sp-optimizer-dead">
+
+ <title>Dead code removal</title>
+
+ <para>
+ <emphasis>Dead code</emphasis> is also known as
+ <emphasis>unreachable code</emphasis>: code that can not
+ possibly be executed, because no path in the logic leads to it.
+ </para>
+
+ <para>
+ For example, consider the following SQL code:
+ </para>
+
+<programlisting>
+CREATE PROCEDURE proc_5()
+BEGIN
+ DECLARE i INT DEFAULT 0;
+
+ again:
+ WHILE TRUE DO
+ BEGIN
+ set i:= i+1;
+
+ SELECT "This code is alive";
+
+ IF (i = 100) THEN
+ LEAVE again;
+ END IF;
+
+ ITERATE again;
+
+ SELECT "This code is dead";
+ END;
+ END WHILE;
+END$$
+</programlisting>
+
+ <para>
+ Before flow optimization, the compiled code is:
+ </para>
+
+<programlisting>
+SHOW PROCEDURE CODE proc_5;
+Pos Instruction
+0 set i@0 0
+1 jump_if_not 10(10) 1
+2 set i@0 (i@0 + 1)
+3 stmt 0 "SELECT "This code is alive""
+4 jump_if_not 7(7) (i@0 = 100)
+5 jump 10
+6 jump 7
+7 jump 1
+8 stmt 0 "SELECT "This code is dead""
+9 jump 1
+</programlisting>
+
+ <para>
+ Note the instruction at position 8: the previous instruction is
+ an <emphasis>unconditional</emphasis> jump, so the flow of
+ control can never reach 8 by coming from 7. Since there exist no
+ jump in the entire code that leads to 8 either, the instruction
+ at 8 is unreachable. By looking further in the flow, since 8 is
+ unreachable and there are no jumps to position 9, the
+ instruction at position 9 is also unreachable.
+ </para>
+
+ <para>
+ The instruction at position 6 is also unreachable, for a similar
+ reason: the <literal>THEN</literal> part of the if contains a
+ jump, due to the statement <literal>LEAVE again;</literal>, so
+ that the code never executes the jump generated by the compiler
+ to go from the end of the <literal>THEN</literal> block to the
+ statement following the <literal>IF</literal>.
+ </para>
+
+ <para>
+ After detecting all the unreachable instructions, and
+ simplifying the code, the result after flow optimization is:
+ </para>
+
+<programlisting>
+SHOW PROCEDURE CODE proc_5;
+Pos Instruction
+0 set i@0 0
+1 jump_if_not 10(10) 1
+2 set i@0 (i@0 + 1)
+3 stmt 0 "SELECT "This code is alive""
+4 jump_if_not 1(1) (i@0 = 100)
+5 jump 10
+</programlisting>
+
+ <para>
+ The flow optimizer is good at detecting
+ <emphasis>most</emphasis> of the dead code, but has limitations.
+ For example, coding in SQL <literal>IF FALSE THEN ... END
+ IF;</literal> leads to code that can never be executed, but
+ since the flow optimizer does neither propagate constants nor
+ consider impossible conditional jumps, this code will not be
+ removed.
+ </para>
+
+ <para>
+ The goal of the flow optimizer is mostly to perform
+ <emphasis>simple</emphasis> local optimizations with a low cost.
+ It's not a fully featured code optimizer, and does not guard
+ against poor SQL.
+ </para>
+
+ </section>
+
+ <section id="sp-optimizer-jump">
+
+ <title>Jump shortcut resolution</title>
+
+ <para>
+ The term <emphasis>jump shortcut</emphasis> refers to the
+ following optimization: when instruction A is a jump
+ (conditional or not) that goes to position B, and when B is an
+ unconditional jump to position C, the code can be changed so
+ that A can jump to C directly, taking a
+ <emphasis>shortcut</emphasis> to avoid the unnecessary B.
+ Consider the following SQL code:
+ </para>
+
+<programlisting>
+CREATE PROCEDURE proc_6(x int, y int, z int)
+BEGIN
+ SELECT "Start";
+
+ IF (x > 0)
+ THEN
+ BEGIN
+ SELECT "x looks ok";
+ IF (y > 0)
+ THEN
+ BEGIN
+ SELECT "so does y";
+ IF (z > 0)
+ THEN
+ SELECT "even z is fine";
+ ELSE
+ SELECT "bad z";
+ END IF;
+ END;
+ ELSE
+ SELECT "bad y";
+ END IF;
+ END;
+ ELSE
+ SELECT "bad x";
+ END IF;
+
+ SELECT "Finish";
+END$$
+</programlisting>
+
+ <para>
+ Before flow optimization, the compiled code is:
+ </para>
+
+<programlisting>
+SHOW PROCEDURE CODE proc_6;
+Pos Instruction
+0 stmt 0 "SELECT "Start""
+1 jump_if_not 12(13) (x@0 > 0)
+2 stmt 0 "SELECT "x looks ok""
+3 jump_if_not 10(11) (y@1 > 0)
+4 stmt 0 "SELECT "so does y""
+5 jump_if_not 8(9) (z@2 > 0)
+6 stmt 0 "SELECT "even z is fine""
+7 jump 9
+8 stmt 0 "SELECT "bad z""
+9 jump 11
+10 stmt 0 "SELECT "bad y""
+11 jump 13
+12 stmt 0 "SELECT "bad x""
+13 stmt 0 "SELECT "Finish""
+</programlisting>
+
+ <para>
+ Note the <literal>jump 9</literal> at position 7: since the
+ instruction at position 9 is <literal>jump 11</literal>, the
+ code at position 7 can be simplified to <literal>jump
+ 11</literal>. The optimization is also recursive: since the
+ instruction 11 is <literal>jump 13</literal>, the final jump
+ destination for the instruction at position 7 is <literal>jump
+ 13</literal>. Conditional jumps are optimized also, so that the
+ instruction 5: <literal>jump_if_not 8(9)</literal> can be
+ optimized to <literal>jump_if_not 8(13)</literal>.
+ </para>
+
+ <para>
+ After flow optimization, the compiled code is:
+ </para>
+
+<programlisting>
+SHOW PROCEDURE CODE proc_6;
+Pos Instruction
+0 stmt 0 "SELECT "Start""
+1 jump_if_not 12(13) (x@0 > 0)
+2 stmt 0 "SELECT "x looks ok""
+3 jump_if_not 10(13) (y@1 > 0)
+4 stmt 0 "SELECT "so does y""
+5 jump_if_not 8(13) (z@2 > 0)
+6 stmt 0 "SELECT "even z is fine""
+7 jump 13
+8 stmt 0 "SELECT "bad z""
+9 jump 13
+10 stmt 0 "SELECT "bad y""
+11 jump 13
+12 stmt 0 "SELECT "bad x""
+13 stmt 0 "SELECT "Finish""
+</programlisting>
+
+ <para>
+ Note the differences with every jump instruction.
+ </para>
+
+ <caution>
+ <para>
+ For clarity, this example has been designed to
+ <emphasis>not</emphasis> involve dead code. Note that in
+ general, an instruction that was reachable
+ <emphasis>before</emphasis> taking a shortcut might become
+ unreachable <emphasis>after</emphasis> the shortcut, so that
+ the optimizations for jump shortcuts and dead code are tightly
+ intertwined.
+ </para>
+ </caution>
+
+ </section>
+
+ </section>
+
+ <section id="sp-cache">
+
+ <title>Stored Programs Caches</title>
+
+ <para>
+ The <emphasis>goal</emphasis> of the Stored Program cache is to
+ keep a parsed <literal>sp_head</literal> in memory, for future
+ reuse. Reuse means:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ To be able to execute concurrently the same Stored Program in
+ different <literal>THD</literal> threads,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ To be able to execute the same Stored Program multiple times
+ (for recursive calls) in the same <literal>THD</literal>
+ thread.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ To achieve this, the implementation of <literal>sp_head</literal>
+ must be both thread safe and state less. Unfortunately, it is
+ neither:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ The class <literal>sp_head</literal> is composed of
+ <literal>sp_instr</literal> instructions to represent the
+ code, and these instructions in turn depend on
+ <literal>Item</literal> objects, used to represent the
+ internal structure of a statement. The various C++
+ <literal>Item</literal> classes are <emphasis>not</emphasis>
+ currently thread safe, since the evaluation of an
+ <literal>Item</literal> at runtime involves methods like
+ <literal>Item::fix_fields()</literal>, which modify the
+ internal state of items, making them impossible to safely
+ evaluate concurrently.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The class <literal>sp_head</literal> itself contains
+ attributes that describe the <emphasis>SQL logic</emphasis> of
+ a Stored Program (which are safe to share), mixed with
+ attributes that relate to the <emphasis>evaluation</emphasis>
+ of this logic in a given instance to a Stored Program call
+ (mostly the <literal>MEM_ROOT</literal> memory pool used
+ during execution), which by definition can not be shared.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ The consequences of these restrictions are less that optimal code.
+ What is <emphasis>currently</emphasis> implemented in the server
+ is detailed in the following sub sections, to help maintenance.
+ </para>
+
+ <warning>
+ <para>
+ Needless to say, the current implementation of Stored Program
+ caching is by no mean final, and could be re factored in future
+ releases.
+ </para>
+ </warning>
+
+ <section id="sp-cache-sp">
+
+ <title>Stored Procedure cache</title>
+
+ <para>
+ The <literal>PROCEDURE</literal> cache is maintained on a
+ <emphasis>per thread</emphasis> basis, in
+ <literal>THD::sp_proc_cache</literal>.
+ </para>
+
+ <para>
+ The function used to lookup the cache is
+ <literal>sp_find_routine</literal>. It relies on the C++ class
+ <literal>sp_cache</literal> for the low level implementation.
+ </para>
+
+ <para>
+ There is a global mechanism to invalidate all the caches of all
+ the <literal>THD</literal> threads at once, implemented with the
+ variable <literal>Cversion</literal> in file
+ <filename>sp_cache.cc</filename>, which is incremented by
+ function <literal>sp_cache_invalidate()</literal>. This global
+ invalidation is used when the server executes <literal>DROP
+ PROCEDURE</literal> or <literal>UPDATE PROCEDURE</literal>
+ statements.
+ </para>
+
+ <para>
+ Each entry in the cache is keyed by name, and consists of a
+ linked list of stored procedure instances which are all
+ duplicates of the same object. The reason for the list is
+ recursion, when the runtime needs to evaluate several calls to
+ the same procedure at once.
+ </para>
+
+ <para>
+ The runtime behavior of this caching mechanism has some
+ limitations, and in particular:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ Since <emphasis>each</emphasis> <literal>THD</literal> has
+ its own cache, <emphasis>each</emphasis> separate client
+ connection to the server uses its own cache. Multiple client
+ connections calling the same Stored Procedure will cause the
+ parser to be invoked multiple times, and memory to be
+ consumed multiple times.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ If a given client constantly opens and closes a new
+ connection to the server, and invokes Stored Procedures, the
+ cache will be always empty, causing excessive parsing of
+ used stored procedures on every invocation.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ If a given client constantly keeps an existing connection to
+ the server for a long time, and invokes Stored Procedures,
+ the cache size will grow, consuming and retaining memory. In
+ other words, memory limits or expulsion of cold members of
+ the stored procedure cache is not implemented.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Calling <literal>sp_cache_invalidate()</literal> does
+ <emphasis>not</emphasis> reclaim the cache memory. This
+ memory will be reclaimed only <emphasis>if</emphasis> a
+ Stored Procedure is looked up in the cache again, causing
+ the cache to flush.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ </section>
+
+ <section id="sp-cache-sf">
+
+ <title>Stored function cache</title>
+
+ <para>
+ The <literal>FUNCTION</literal> cache is implemented exactly
+ like the <literal>PROCEDURE</literal> cache, in the thread
+ member in <literal>THD::sp_func_cache</literal>.
+ </para>
+
+ <para>
+ Note that because <literal>THD::sp_proc_cache</literal> and
+ <literal>THD::sp_func_cache</literal> are both invalidated based
+ on the <emphasis>same</emphasis> <literal>Cversion</literal>
+ counter, executing <literal>DROP PROCEDURE</literal> happens to
+ invalidate the <emphasis>FUNCTION</emphasis> cache as well,
+ while <literal>DROP FUNCTION</literal> also invalidates the
+ <emphasis>PROCEDURE</emphasis> cache. In practice, this has no
+ consequences since DDL statements like this are not executed
+ typically while an application is running, only when it is
+ deployed.
+ </para>
+
+ </section>
+
+ <section id="sp-cache-trg">
+
+ <title>Table Trigger cache</title>
+
+ <para>
+ For table triggers, all the triggers that relate to a given
+ table are grouped in the C++ class
+ <literal>Table_triggers_list</literal>, which in particular
+ contains the member <literal>sp_head
+ *bodies[TRG_EVENT_MAX][TRG_ACTION_MAX]</literal>.
+ </para>
+
+ <para>
+ Note that at most one trigger per event
+ (<literal>BEFORE</literal>, <literal>AFTER</literal>) and per
+ action (<literal>INSERT</literal>, <literal>UPDATE</literal>,
+ <literal>DELETE</literal>) can be defined currently.
+ </para>
+
+ <para>
+ The <literal>Table_triggers_list</literal> itself is a part of
+ the structure <literal>struct st_table</literal>, which is
+ better known as a <literal>TABLE</literal> handle.
+ </para>
+
+ <para>
+ As a result, each table trigger body is duplicated in each table
+ handle, which is necessary to properly evaluate them. Since
+ <literal>TABLE</literal> handles are globally cached and reused
+ across threads, the table triggers are effectively reused across
+ different clients connections manipulating the same physical
+ table.
+ </para>
+
+ </section>
+
+ <section id="sp-cache-evt">
+
+ <title>Event</title>
+
+ <para>
+ For events, the <literal>sp_head</literal> object that
+ represents the body of an <literal>EVENT</literal> is part of
+ the C++ class <literal>Event_parse_data</literal>.
+ </para>
+
+ <para>
+ There is no caching of <literal>sp_head</literal> for multiple
+ scheduling of an event. The method
+ <literal>Event_job_data::execute()</literal> invokes the parser
+ every time an event is executed.
+ </para>
+
+ </section>
+
+ </section>
+
+ <section id="sp-execution">
+
+ <title>Stored Programs Execution</title>
+
+ <para>
+ <emphasis>Executing</emphasis> a Stored Program consist of
+ <emphasis>interpreting</emphasis> the low level
+ <literal>sp_instr</literal> code. The runtime interpreter itself
+ is implemented in the method
+ <literal>sp_head::execute()</literal>. Wrappers for different
+ kinds of Stored Programs are implemented in the following methods:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>PROCEDURE</literal>: see
+ <literal>sp_head::execute_procedure()</literal>,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>FUNCTION</literal>: see
+ <literal>sp_head::execute_function()</literal>,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>TRIGGER</literal>: see
+ <literal>sp_head::execute_trigger()</literal>,
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>EVENT</literal>: see
+ <literal>Event_job_data::execute()</literal>.
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <section id="sp-exec-rcont">
+
+ <title>Runtime context</title>
+
+ <para>
+ An interpretor needs to be able to represent the
+ <emphasis>state</emphasis> of the SQL program being executed:
+ this is the role of the C++ class
+ <literal>sp_rcontext</literal>, or runtime context.
+ </para>
+
+ <section id="sp-exec-rcont-var">
+
+ <title>Local variables</title>
+
+ <para>
+ Values of local variables in a SQL Stored Program are stored
+ within the <literal>sp_rcontext</literal>. When the code
+ enters a new scope, the <literal>sp_instr</literal> contains
+ explicit statements to initialize the local variable
+ <literal>DEFAULT</literal> value, if any. Since initialization
+ of values is done in the <emphasis>code</emphasis>, and since
+ no logic needs to be executed when a SQL variable goes out of
+ scope, space allocation to represent the
+ <emphasis>data</emphasis> does not need to follow the nesting
+ of BEGIN-END blocks during runtime.
+ </para>
+
+ <para>
+ Another important point regarding the representation of local
+ SQL variables is that, conceptually, a local variable can be
+ considered to be a SQL table with a single column (of the
+ variable type), with a single row (to represent the value).
+ </para>
+
+ <para>
+ As a result, <emphasis>all</emphasis> the local variables of a
+ Stored Program are represented by a row in a table internally.
+ For example, consider the following SQL code:
+ </para>
+
+<programlisting>
+CREATE PROCEDURE proc_7(x int)
+BEGIN
+ DECLARE v1 INT;
+ DECLARE v2 VARCHAR(10);
+ DECLARE v3 TEXT;
+
+ IF (x > 0) THEN
+ BEGIN
+ DECLARE v4 BLOB;
+ DECLARE v5 VARCHAR(20);
+ END;
+ ELSE
+ BEGIN
+ DECLARE v6 DECIMAL(10, 2);
+ DECLARE v7 BIGINT;
+ END;
+ END IF;
+END$$
+</programlisting>
+
+ <para>
+ Internally, a temporary table is created, with the following
+ structure:
+ </para>
+
+<programlisting>
+CREATE TEMPORARY TABLE `proc_7_vars` (
+ `v1` int(11) DEFAULT NULL,
+ `v2` varchar(10) DEFAULT NULL,
+ `v3` text,
+ `v4` blob,
+ `v5` varchar(20) DEFAULT NULL,
+ `v6` decimal(10,2) DEFAULT NULL,
+ `v7` bigint(20) DEFAULT NULL
+) ENGINE=MyISAM DEFAULT CHARSET=latin1
+</programlisting>
+
+ <para>
+ The real name of the table and the columns are purely
+ internal, and the table is not accessible to regular
+ statements for DDL or DML operations:
+ <quote>proc_7_vars</quote> and <quote>v1</quote> ...
+ <quote>v7</quote> are just a notation used in this example.
+ The <literal>TABLE</literal> handle that implements all the
+ local variable storage is the member
+ <literal>sp_rcontext::m_var_table</literal>
+ </para>
+
+ <para>
+ Inside a statement, local variables in a Stored Program are
+ represented by the dedicated C++ class
+ <literal>Item_splocal</literal>. What
+ <literal>Item_splocal</literal> really is, is a proxy exposing
+ the interface needed to suport <literal>Item</literal>, which
+ delegates to the underlying <literal>sp_rcontext</literal> for
+ reading / writing local variables values. The coupling between
+ <literal>Item_splocal</literal> and
+ <literal>sp_rcontext</literal> is based on
+ <literal>Item_splocal::m_var_idx</literal>, which is the
+ variable <emphasis>index</emphasis> in the symbol table
+ computed by the <emphasis>parser</emphasis>, and maintained in
+ <literal>sp_pcontext</literal>.
+ </para>
+
+ </section>
+
+ <section id="sp-exec-rcont-cursor">
+
+ <title>Cursors</title>
+
+ <para>
+ Unlike local variables, some action is needed in the
+ interpreter when a <literal>CURSOR</literal> goes out of
+ scope: the cursor must be closed, to prevent leaks of the
+ underlying <literal>TABLE</literal> resources.
+ </para>
+
+ <para>
+ As a result, cursors allocation (and really,
+ <emphasis>de-allocation</emphasis> so they can be properly
+ <emphasis>closed</emphasis>) needs to follow tightly the
+ BEGIN-END block structure of the code, so a
+ <emphasis>stack</emphasis> is used, implemented by
+ <literal>sp_rcontext::m_cstack</literal> and
+ <literal>sp_rcontext::m_ccount</literal>.
+ </para>
+
+ </section>
+
+ <section id="sp-exec-rcont-case">
+
+ <title>Case expressions</title>
+
+ <para>
+ For <literal>CASE</literal> expressions, temporary variables
+ are generated automatically. Like <literal>CURSOR</literal>,
+ there are some constraints that prevent treating these special
+ local variables like regular local variables.
+ </para>
+
+ <para>
+ The difficulty with <literal>CASE</literal> is that the real
+ type of the expression is only known when the case statement
+ is executed, so that allocating space in a statically computed
+ <literal>TABLE</literal> is not practical. For example,
+ <literal>CASE (SELECT col1 FROM t1 WHERE ...)</literal> is a
+ case expression that involves a single row subselect. During
+ parsing, the table might not even exists, so evaluating the
+ type of <quote>col1</quote> is impossible. Creation of the
+ table can be delayed until execution, with statements like
+ <literal>CREATE TEMPORARY TABLE</literal>.
+ </para>
+
+ <para>
+ Instead, a <emphasis>array</emphasis> of <literal>Item
+ *</literal> is used, implemented by
+ <literal>sp_rcontext::m_case_expr_holders</literal>. The size
+ of the array is static (it's the total number of cases), but
+ the content of each element is dynamic (to account for the
+ type of the case expression).
+ </para>
+
+ <caution>
+ <para>
+ Note the wording used here: <quote>static</quote> means
+ something that can be evaluated when compiling the code, in
+ the parser, where <quote>dynamic</quote> means something
+ that can only be evaluated when interpreting the code,
+ during runtime. Of course, from a C++ coding point of view,
+ everything is <emphasis>dynamic</emphasis>.
+ </para>
+ </caution>
+
+ <para>
+ Inside a <literal>CASE</literal> statement, temporary local
+ variables in a Stored Program are represented by the dedicated
+ C++ class <literal>Item_case_expr</literal>. The class
+ <literal>Item_case_expr</literal> is also a proxy, similar in
+ nature to <literal>Item_splocal</literal>, and delegates to
+ <literal>sp_rcontext</literal> for accessing the underlying
+ case expression value. The coupling between
+ <literal>Item_case_expr</literal> and
+ <literal>sp_rcontext</literal> is based on
+ <literal>Item_case_expr::m_case_expr_id</literal>, which is
+ the case expression <emphasis>index</emphasis> in the symbol
+ table (see <literal>sp_pcontext</literal>).
+ </para>
+
+ </section>
+
+ <section id="sp-exec-rcont-handler">
+
+ <title>Exception handlers</title>
+
+ <warning>
+ <para>
+ As of <literal>MySQL 5.1</literal>, which is the code base
+ described by this documentation, the statements
+ <literal>SIGNAL</literal>, <literal>RESIGNAL</literal> and
+ <literal>GET DIAGNOSTICS</literal> are not supported.
+ Implementing these features will have some impact on the
+ material described here, so changes in later versions are
+ anticipated.
+ </para>
+ </warning>
+
+ <para>
+ When the code <emphasis>enters</emphasis> a block of logic
+ guarded by an SQL exception handler, the
+ <emphasis>state</emphasis> or the runtime context in the
+ interpreter <emphasis>changes</emphasis>, to represent this
+ fact. The state change is not apparent immediately, it will
+ only become apparent if an exception is raised. The internal
+ runtime state of the engine also changes when the code
+ <emphasis>leaves</emphasis> a block that contains an exception
+ handler.
+ </para>
+
+ <para>
+ How exception handlers work during runtime is the subject of
+ another section (See <xref linkend="sp-exec-handler"/>). What
+ is described here is the state maintained internally, to
+ represent <emphasis>which</emphasis>
+ <literal>HANDLER</literal> is currently <quote>active</quote>,
+ and <emphasis>what</emphasis> <literal>CONDITION</literal> is
+ protected against.
+ </para>
+
+ <para>
+ The SQL precedence rules for <literal>HANDLER</literal>
+ dictates that the last installed (inner most) handler is
+ always considered first, so the natural structure to represent
+ what handler is active is a <emphasis>stack</emphasis>,
+ implemented by <literal>sp_rcontext::m_handler</literal> and
+ <literal>sp_rcontext::m_hcount</literal>.
+ </para>
+
+ <para>
+ In addition, some extra information is required for
+ <literal>CONTINUE</literal> handlers: the
+ <quote>address</quote> in the code, or instruction pointer in
+ the <literal>sp_instr</literal> array, of where to resume
+ execution when the handler returns. This data is maintained in
+ <literal>sp_rcontext::m_hstack</literal> and
+ <literal>sp_rcontext::m_hsp</literal>, which again is a
+ <emphasis>stack</emphasis> because exception handlers can be
+ <emphasis>nested</emphasis> (exceptions can be raised and
+ trapped <emphasis>during</emphasis> the execution of the body
+ of an exception handler, too).
+ </para>
+
+ </section>
+
+ </section>
+
+ <section id="sp-exec-instr">
+
+ <title>Executing one instruction</title>
+
+ <para>
+ Executing an instruction consist of calling the virtual method
+ <literal>sp_instr::execute()</literal>, which is implemented for
+ each instruction.
+ </para>
+
+ <para>
+ For instructions that can be executed directly, and don't depend
+ on the evaluation of a general SQL statement or expression, the
+ execution is very simple. See for example
+ <literal>sp_instr_jump::execute()</literal>,
+ <literal>sp_instr_hpush_jump::execute()</literal> or
+ <literal>sp_instr_hpop::execute()</literal>. In all cases, the
+ implementation of the <literal>execute()</literal> method is
+ purely internal to the runtime interpreter.
+ </para>
+
+ <para>
+ For instructions that need to evaluate a general expression,
+ like <literal>sp_instr_jump_if_not::execute()</literal>, or
+ general instructions that need to execute a SQL statement, like
+ <literal>sp_instr_stmt::execute()</literal>, things are more
+ complex. The implementation needs to leverage the existing code
+ that is already capable of evaluating an expression or executing
+ a <emphasis>query</emphasis>, and is implemented by the function
+ <literal>mysql_execute_command()</literal>.
+ </para>
+
+ <para>
+ The function <literal>mysql_execute_command()</literal>, for
+ historical reasons (it was implemented
+ <emphasis>before</emphasis> Stored Programs), is mostly designed
+ to consume directly the result of the parser, which is passed as
+ input in <literal>THD::lex</literal>.
+ </para>
+
+ <para>
+ To comply with this interface, the runtime for stored program
+ has to provide a <literal>THD::lex</literal> structure before
+ executing each instruction, to prepare an execution environment
+ which <emphasis>looks as if</emphasis> the statement to execute
+ was just parsed. Dealing with the existing interface for
+ re-entrant execution of SQL statements is the role of the C++
+ class <literal>sp_lex_keeper</literal>. The wrapper method to
+ used to execute instructions is
+ <literal>sp_lex_keeper::reset_lex_and_exec_core()</literal>,
+ which ultimately invokes the
+ <literal>sp_instr::exec_core()</literal> instructions
+ implementation.
+ </para>
+
+ </section>
+
+ <section id="sp-exec-flow">
+
+ <title>Flow control</title>
+
+ <para>
+ Instructions are numbered sequentially, and the current position
+ in the code is represented by an <quote>instruction
+ pointer</quote>, which is just an integer. In the main execution
+ loop in <literal>sp_head::execute()</literal>, this instruction
+ pointer is represented by the local variable
+ <literal>ip</literal>.
+ </para>
+
+ <para>
+ When executing each instruction, the method
+ <literal>sp_head::execute()</literal> is also responsible to
+ return the address of the next instruction to execute. Most of
+ the time, this corresponds to the <quote>next</quote>
+ instruction (implemented by <literal>m_ip+1</literal>), except
+ for absolute jumps (see
+ <literal>sp_instr_jump::execute()</literal>) or conditional
+ jumps (see <literal>sp_instr_jump_if_not::execute()</literal>).
+ </para>
+
+ </section>
+
+ <section id="sp-exec-handler">
+
+ <title>Exception handling</title>
+
+ <para>
+ When the code <emphasis>enters</emphasis> a block protected by a
+ <literal>HANDLER</literal>, the execution leads to
+ <literal>sp_instr_hpush_jump::execute()</literal>, which
+ installs the exception handler in the runtime handler stack, by
+ calling <literal>sp_rcontext::push_handler()</literal>.
+ </para>
+
+ <para>
+ In a similar way, when the code <emphasis>leaves</emphasis> a
+ block protected by a <literal>HANDLER</literal>,
+ <literal>sp_instr_hpop::execute()</literal> removes the handlers
+ installed by the matching
+ <literal>sp_instr_hpush_jump</literal>, by calling
+ <literal>sp_rcontext::pop_handlers()</literal>.
+ </para>
+
+ <para>
+ During the execution of <emphasis>any</emphasis> statement,
+ different <literal>CONDITION</literal> can be raised at runtime,
+ which are reported by the implementation of each statement by
+ calling <literal>push_warning()</literal>,
+ <literal>my_error()</literal> or similar functions. All these
+ entry points ultimately leads to the error handler hook callback
+ function implemented by <literal>error_handler_hook</literal> in
+ <filename>mysys/my_error.c</filename>. In case of the server
+ itself, this hook points to the function
+ <literal>my_message_sql()</literal>.
+ </para>
+
+ <para>
+ Under normal circumstances, <literal>my_message_sql()</literal>
+ just reports a warning or an error to the client application,
+ and for errors cause the query to abort.
+ </para>
+
+ <para>
+ When executing a stored program, <literal>THD::spcont</literal>
+ points to the runtime context of the program currently executed.
+ When a <literal>HANDLER</literal> is active, the runtime context
+ contains in it's handler stack the list of all the
+ <literal>CONDITION</literal>s currently trapped, giving a chance
+ to the call to <literal>sp_rcontext::handle_error()</literal> to
+ intercept error handling.
+ </para>
+
+ <para>
+ If the condition reported does not match any of the conditions
+ for which an exception handler is active,
+ <literal>sp_rcontext::handle_error()</literal> returns false,
+ and <literal>my_message_sql()</literal> raises the error or
+ warning as usual.
+ </para>
+
+ <para>
+ When the condition reported <emphasis>does</emphasis> match an
+ active <literal>HANDLER</literal>, that handler is
+ <emphasis>called</emphasis>, but the technical nature of this
+ call is special: the call is <emphasis>asynchronous</emphasis>.
+ Instead of invoking the exception handler
+ <emphasis>directly</emphasis>,
+ <literal>sp_rcontext::handle_error()</literal> marks which
+ exception handler is <emphasis>to be called</emphasis>, by
+ setting the member variable
+ <literal>sp_rcontext::m_hfound</literal>, and then returns true,
+ so that <literal>my_message_sql()</literal> returns without
+ reporting anything: at this point, the error condition has been
+ totally masked, except for the fact that
+ <literal>sp_rcontext::m_hfound</literal> is set.
+ </para>
+
+ <para>
+ Once <literal>my_message_sql()</literal> returns, the
+ implementation of a given statement continues, either by
+ proceeding if only a warning was reported, or by aborting the
+ current execution if an error was raised. The execution of code
+ in the server will eventually return from the implementation of
+ a statement, and return from the call to
+ <literal>sp_instr::execute()</literal> for that statement,
+ returning control to the loop located in
+ <literal>sp_head::execute()</literal>. Note that during the
+ execution of the code that follows a call to
+ <literal>my_message_sql()</literal>, error conditions are
+ propagated in the call stack though the functions return value.
+ Whether an exception was caught by an error handler or not is
+ transparent to the implementation of statements in general.
+ </para>
+
+ <para>
+ After an instruction is executed in
+ <literal>sp_head::execute()</literal>, the main interpreter loop
+ checks for any pending exception handler code to call, by
+ invoking <literal>sp_rcontext::found_handler()</literal>. If an
+ exception was caught, the handler to call has been found by
+ <literal>sp_rcontext::handle_error()</literal>, and is invoked.
+ </para>
+
+ <para>
+ In case of <literal>CONTINUE HANDLER</literal>, the instruction
+ to return to after the handler code is executed needs to be
+ saved in the runtime context. Finding the continuation
+ destination is accomplished by the call to
+ <literal>sp_instr::get_cont_dest()</literal> for the current
+ instruction, while preserving this destination is done with
+ <literal>sp_rcontext::push_hstack()</literal>. The matching call
+ to <literal>sp_rcontext::pop_hstack()</literal>, which is
+ executed when the exception handler is done, is located in
+ <literal>sp_instr_hreturn::execute()</literal>.
+ </para>
+
+ <caution>
+ <para>
+ To integrate properly with exception handling in general, the
+ code should avoid testing for
+ <literal>thd->net.report_error</literal>, or worse inspecting
+ the content of the error stack (displayed by <literal>SHOW
+ ERRORS</literal>), since doing this actually assumes not only
+ that an error was raised, but also that is was not caught.
+ Instead, the proper way to implement error handling in the
+ server is to return error statuses and check for them.
+ </para>
+ </caution>
+
+ </section>
+
+ <section id="sp-exec-nest">
+
+ <title>Nesting calls</title>
+
+ <para>
+ In the following example, the Stored Procedure
+ <quote>proc_1</quote> makes a nested call to
+ <quote>proc_2</quote>.
+ </para>
+
+<programlisting>
+CREATE TABLE my_debug(
+ seq int NOT NULL AUTO_INCREMENT,
+ msg varchar(80),
+ PRIMARY KEY(seq)
+);
+
+delimiter $$
+
+CREATE PROCEDURE proc_1()
+BEGIN
+ INSERT INTO my_debug(msg) VALUES ("entering p1");
+ CALL proc_2();
+ INSERT INTO my_debug(msg) VALUES ("leaving p1");
+END$$
+
+CREATE PROCEDURE proc_2()
+BEGIN
+ INSERT INTO my_debug(msg) VALUES ("inside p2");
+END$$
+
+delimiter ;
+
+CALL proc_1();
+</programlisting>
+
+ <tip>
+ <para>
+ Oh, yes, we don't have a debugger yet, so this is old school
+ <quote>printf-like</quote> debugging into a table.
+ </para>
+ </tip>
+
+ <para>
+ By setting a breakpoint in <literal>mysql_insert</literal> in
+ the server, the current thread stack at the first insert will
+ look like:
+ </para>
+
+<programlisting>
+#0 mysql_insert () at sql_insert.cc:351
+#1 in mysql_execute_command () at sql_parse.cc:2643
+#2 in sp_instr_stmt::exec_core () at sp_head.cc:2609
+#3 in sp_lex_keeper::reset_lex_and_exec_core () at sp_head.cc:2455
+#4 in sp_instr_stmt::execute () at sp_head.cc:2560
+#5 in sp_head::execute () at sp_head.cc:1077
+#6 in sp_head::execute_procedure () at sp_head.cc:1726
+#7 in mysql_execute_command () at sql_parse.cc:3807
+#8 in mysql_parse () at sql_parse.cc:5274
+#9 in dispatch_command () at sql_parse.cc:896
+#10 in do_command () at sql_parse.cc:662
+#11 in handle_one_connection () at sql_connect.cc:1089
+#12 in start_thread () from /lib/libpthread.so.0
+#13 in clone () from /lib/libc.so.6
+</programlisting>
+
+ <para>
+ By the time the second <literal>INSERT</literal> is executed,
+ the stack will look like:
+ </para>
+
+<programlisting>
+#0 mysql_insert () at sql_insert.cc:351
+#1 in mysql_execute_command () at sql_parse.cc:2643
+#2 in sp_instr_stmt::exec_core () at sp_head.cc:2609
+#3 in sp_lex_keeper::reset_lex_and_exec_core () at sp_head.cc:2455
+#4 in sp_instr_stmt::execute () at sp_head.cc:2560
+#5 in sp_head::execute () at sp_head.cc:1077
+#6 in sp_head::execute_procedure () at sp_head.cc:1726
+#7 in mysql_execute_command () at sql_parse.cc:3807
+#8 in sp_instr_stmt::exec_core () at sp_head.cc:2609
+#9 in sp_lex_keeper::reset_lex_and_exec_core () at sp_head.cc:2455
+#10 in sp_instr_stmt::execute () at sp_head.cc:2560
+#11 in sp_head::execute () at sp_head.cc:1077
+#12 in sp_head::execute_procedure () at sp_head.cc:1726
+#13 in mysql_execute_command () at sql_parse.cc:3807
+#14 in mysql_parse () at sql_parse.cc:5274
+#15 in dispatch_command () at sql_parse.cc:896
+#16 in do_command () at sql_parse.cc:662
+#17 in handle_one_connection () at sql_connect.cc:1089
+#18 in start_thread () from /lib/libpthread.so.0
+#19 in clone () from /lib/libc.so.6
+</programlisting>
+
+ <para>
+ In this stack trace,
+ <literal>sp_head::execute_procedure()</literal> at
+ <literal>#12</literal> corresponds to <literal>CALL
+ proc_1();</literal>, while
+ <literal>sp_head::execute_procedure()</literal> at
+ <literal>#6</literal> corresponds to <literal>CALL
+ proc_2();</literal>. In other words, recursive calls in the
+ <emphasis>user</emphasis> SQL code are implemented by performing
+ matching recursive calls in the <emphasis>system</emphasis> C++
+ code (the server).
+ </para>
+
+ <para>
+ This is actually a severe limitation of the implementation,
+ which causes problems for the following reasons:
+ </para>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ User logic can be arbitrarily nested, with a long chain of
+ Stored Programs calling other Stored Programs. The total
+ depth of calls can be greater than one would expect,
+ especially considering that a <literal>VIEW</literal> can
+ invoke a <literal>FUNCTION</literal>, and that a
+ <literal>TRIGGER</literal> can also invoke other
+ <literal>PROCEDURE</literal>, <literal>FUNCTION</literal> or
+ <literal>TRIGGER</literal>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The amount of memory that can be consumed in the
+ <emphasis>stack</emphasis> for a thread is not infinite. In
+ fact, it's quiet limited, since: <literal>{MAX NUMBER OF
+ THREADS} * {MAX THREAD STACK} = {TOTAL STACK}</literal>.
+ Note the catch in the equation here:
+ <emphasis>MAX</emphasis> thread stack, which is dependent of
+ the nesting of stored program in the user SQL code, for the
+ worst case. Since MySQL currently does not use a thread pool
+ but is compiled sometime with a <emphasis>BIG</emphasis>
+ number of threads, this can be a problem affecting
+ scalability.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ As a result, the Stored Program interpreter has to protect
+ itself against stack overflow. This is implemented by
+ <literal>check_stack_overrun()</literal>
+ </para>
+ </listitem>
+
+ </itemizedlist>
+
+ <para>
+ What <emphasis>should</emphasis> be implemented instead, is
+ representing the user <emphasis>SQL stack</emphasis> on the
+ <emphasis>C++ heap</emphasis>, and have the interpreter
+ <emphasis>loop</emphasis> instead of making
+ <emphasis>recursive</emphasis> calls.
+ </para>
+
+ <para>
+ There are also other good reasons to use the heap: for error
+ reporting for example, the current implementation has no way to
+ tell that <quote>proc_2</quote> was called from
+ <quote>proc_1</quote>, since this data is not available to the
+ code: it's hidden in the C++ stack.
+ </para>
+
+ <para>
+ Nesting calls also has some impact on SQL exception handlers.
+ The member <literal>THD::spcont</literal> for the current thread
+ is not pointing to a single <literal>sp_rcontext</literal>, but
+ to a <emphasis>stack</emphasis> of runtime contexts. This is
+ implemented internally as a linked list, with
+ <literal>sp_rcontext::m_prev_runtime_ctx</literal>.
+ </para>
+
+ <para>
+ With the example used, when the code is executing
+ <quote>proc_1</quote>, <literal>THD::spcont</literal> points to
+ the runtime context for <quote>proc_1</quote>. When the code is
+ inside <quote>proc_2</quote>, the current thread
+ <literal>THD::spcont</literal> points to
+ <literal>sp_rcontext{proc_2}</literal>, which member
+ <literal>m_prev_runtime_ctx</literal> points to
+ <literal>sp_rcontext{proc_1}</literal>. This chain allows a
+ parent Stored Program to catch exceptions raised by children
+ Stored Programs.
+ </para>
+
+ </section>
+
+ </section>
+
+</chapter>
| Thread |
|---|
| • svn commit - mysqldoc@docsrva: r5853 - trunk/internals | malff | 12 Apr |