From: Jay Pipes Date: May 13 2009 12:09pm Subject: Re: MySQL Reengineering Project List-Archive: http://lists.mysql.com/internals/36669 Message-Id: <4A0AB86E.5000609@sun.com> MIME-Version: 1.0 Content-Type: text/plain; CHARSET=US-ASCII; format=flowed Content-Transfer-Encoding: 7BIT Alex Esterkin wrote: > Jay, > > I don't think intra-statement parallelism implementation belongs in a > database server microkernel. However, this aspect should be discussed > at the Drizzle-discuss forum. You're absolutely correct, Alex. I mistakenly wrote intra-statement when I described and meant to say *intra-Session*. > In general, having worked on intra-statement parallelism architecture > and implementation at Dataupia (in the Postgres server grid context), > I find this objective to be unrealistic for MySQL. For N shards you > will need N*(N-1)/2 exchange nodes - asynchronous, with real time load > balancing, capable of dealing with data skew, and a zillion more > issues to solve, such as not being able to rely on the determinism of > demand-driven execution. > > Drizzle has a better shot at intra-statement parallelism - leveraging > Gearman as opposed to doing it inside a Session. > > Regards, > > Alex Esterkin > > On Tue, May 12, 2009 at 4:58 PM, Jay Pipes wrote: >> Sergei Golubchik wrote: >> >>>>>> Without doing this, and using encapsulation so that a THD can have >>>>>> multiple Statements, it will be very difficult to work on any future >>>>>> parallelization efforts. >>>>> You mean, a Statement can have multiple THDs, I suppose :) >>>> No he means what he is saying. I don't know why you would want a >>>> statement shared across multiple THD, but having a THD be able to handle >>>> multiple statements means that you can do asynchronous queries within a >>>> single connection. >>> Ah, okay. I see. >>> >>> I thought that "parallelization", that Jay mentioned, means executing >>> parts of a single statement in different threads - which, indeed, may >>> need two THDs sharing the same Statement. >> Yes, what Brian says...once the THD is distinguised from a pthread, >> intra-statement parallelization is possible. We've only just begun this >> step in Drizzle (a Session is now no longer intricately linked to a pthread >> in Drizzle) but there's clearly a ton more to do. :) I know Alex Esterkin >> thinks there is no reason to do such as thing, but perhaps he just needs an >> example :) >> >> Imagine a Session which sends a few long-running SELECTs in the same client >> connection. Currently, because the THD is linked to a single pthread in >> MySQL, these SELECTs will not only block each other, but will be executed in >> order. What if there was no reason to do so? The three Statements, if a >> Session contained a vector of Statement objects, could parse and optimize >> all three statements and decide to send two of them off to other scheduler >> threads for execution, essentially parallelizing the operation (particularly >> on a distributed node architecture...) >> >> Sure, like Alex mentioned, Drizzle's protocol supports parallel operations >> (well, not really, it is non-blocking operations, but still...similar >> concept). But, separating the notion of a thread with a Statement means >> that we can achieve similar theoretical points for parallel operations. The >> more options/points for parallel opportunities, the better we can scale, no? >> >> Cheers! >> >> Jay >> >> -- >> MySQL Internals Mailing List >> For list archives: http://lists.mysql.com/internals >> To unsubscribe: >> http://lists.mysql.com/internals?unsub=aesterkin@stripped >> >>