I don't think intra-statement parallelism implementation belongs in a
database server microkernel. However, this aspect should be discussed
at the Drizzle-discuss forum.
In general, having worked on intra-statement parallelism architecture
and implementation at Dataupia (in the Postgres server grid context),
I find this objective to be unrealistic for MySQL. For N shards you
will need N*(N-1)/2 exchange nodes - asynchronous, with real time load
balancing, capable of dealing with data skew, and a zillion more
issues to solve, such as not being able to rely on the determinism of
Drizzle has a better shot at intra-statement parallelism - leveraging
Gearman as opposed to doing it inside a Session.
On Tue, May 12, 2009 at 4:58 PM, Jay Pipes <Jay.Pipes@stripped> wrote:
> Sergei Golubchik wrote:
>>>>> Without doing this, and using encapsulation so that a THD can have
>>>>> multiple Statements, it will be very difficult to work on any future
>>>>> parallelization efforts.
>>>> You mean, a Statement can have multiple THDs, I suppose :)
>>> No he means what he is saying. I don't know why you would want a
>>> statement shared across multiple THD, but having a THD be able to handle
>>> multiple statements means that you can do asynchronous queries within a
>>> single connection.
>> Ah, okay. I see.
>> I thought that "parallelization", that Jay mentioned, means executing
>> parts of a single statement in different threads - which, indeed, may
>> need two THDs sharing the same Statement.
> Yes, what Brian says...once the THD is distinguised from a pthread,
> intra-statement parallelization is possible. We've only just begun this
> step in Drizzle (a Session is now no longer intricately linked to a pthread
> in Drizzle) but there's clearly a ton more to do. :) I know Alex Esterkin
> thinks there is no reason to do such as thing, but perhaps he just needs an
> example :)
> Imagine a Session which sends a few long-running SELECTs in the same client
> connection. Currently, because the THD is linked to a single pthread in
> MySQL, these SELECTs will not only block each other, but will be executed in
> order. What if there was no reason to do so? The three Statements, if a
> Session contained a vector of Statement objects, could parse and optimize
> all three statements and decide to send two of them off to other scheduler
> threads for execution, essentially parallelizing the operation (particularly
> on a distributed node architecture...)
> Sure, like Alex mentioned, Drizzle's protocol supports parallel operations
> (well, not really, it is non-blocking operations, but still...similar
> concept). But, separating the notion of a thread with a Statement means
> that we can achieve similar theoretical points for parallel operations. The
> more options/points for parallel opportunities, the better we can scale, no?
> MySQL Internals Mailing List
> For list archives: http://lists.mysql.com/internals
> To unsubscribe: