From: Alex Esterkin Date: May 13 2009 12:25am Subject: Re: MySQL Reengineering Project List-Archive: http://lists.mysql.com/internals/36663 Message-Id: <81f5410f0905121725t48694dacm96d29fb90b3e380@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Jay, I don't think intra-statement parallelism implementation belongs in a database server microkernel. However, this aspect should be discussed at the Drizzle-discuss forum. In general, having worked on intra-statement parallelism architecture and implementation at Dataupia (in the Postgres server grid context), I find this objective to be unrealistic for MySQL. For N shards you will need N*(N-1)/2 exchange nodes - asynchronous, with real time load balancing, capable of dealing with data skew, and a zillion more issues to solve, such as not being able to rely on the determinism of demand-driven execution. Drizzle has a better shot at intra-statement parallelism - leveraging Gearman as opposed to doing it inside a Session. Regards, Alex Esterkin On Tue, May 12, 2009 at 4:58 PM, Jay Pipes wrote: > Sergei Golubchik wrote: > >>>>> Without doing this, and using encapsulation so that a THD can have >>>>> multiple Statements, it will be very difficult to work on any future >>>>> parallelization efforts. >>>> >>>> You mean, a Statement can have multiple THDs, I suppose :) >>> >>> No he means what he is saying. I don't know why you would want a >>> statement shared across multiple THD, but having a THD be able to handl= e >>> multiple statements means that you can do asynchronous queries within a >>> single connection. >> >> Ah, okay. I see. >> >> I thought that "parallelization", that Jay mentioned, means executing >> parts of a single statement in different threads - which, indeed, may >> need two THDs sharing the same Statement. > > Yes, what Brian says...once the THD is distinguised from a pthread, > intra-statement parallelization is possible. =A0We've only just begun thi= s > step in Drizzle (a Session is now no longer intricately linked to a pthre= ad > in Drizzle) but there's clearly a ton more to do. :) I know Alex Esterkin > thinks there is no reason to do such as thing, but perhaps he just needs = an > example :) > > Imagine a Session which sends a few long-running SELECTs in the same clie= nt > connection. =A0Currently, because the THD is linked to a single pthread i= n > MySQL, these SELECTs will not only block each other, but will be executed= in > order. =A0What if there was no reason to do so? =A0The three Statements, = if a > Session contained a vector of Statement objects, could parse and optimize > all three statements and decide to send two of them off to other schedule= r > threads for execution, essentially parallelizing the operation (particula= rly > on a distributed node architecture...) > > Sure, like Alex mentioned, Drizzle's protocol supports parallel operation= s > (well, not really, it is non-blocking operations, but still...similar > concept). =A0But, separating the notion of a thread with a Statement mean= s > that we can achieve similar theoretical points for parallel operations. = =A0The > more options/points for parallel opportunities, the better we can scale, = no? > > Cheers! > > Jay > > -- > MySQL Internals Mailing List > For list archives: http://lists.mysql.com/internals > To unsubscribe: > =A0http://lists.mysql.com/internals?unsub=3Daesterkin@stripped > >