Alex Esterkin wrote:
> Jay,
>
> I don't think intra-statement parallelism implementation belongs in a
> database server microkernel. However, this aspect should be discussed
> at the Drizzle-discuss forum.
You're absolutely correct, Alex. I mistakenly wrote intra-statement
when I described and meant to say *intra-Session*.
> In general, having worked on intra-statement parallelism architecture
> and implementation at Dataupia (in the Postgres server grid context),
> I find this objective to be unrealistic for MySQL. For N shards you
> will need N*(N-1)/2 exchange nodes - asynchronous, with real time load
> balancing, capable of dealing with data skew, and a zillion more
> issues to solve, such as not being able to rely on the determinism of
> demand-driven execution.
>
> Drizzle has a better shot at intra-statement parallelism - leveraging
> Gearman as opposed to doing it inside a Session.
>
> Regards,
>
> Alex Esterkin
>
> On Tue, May 12, 2009 at 4:58 PM, Jay Pipes <Jay.Pipes@stripped> wrote:
>> Sergei Golubchik wrote:
>>
>>>>>> Without doing this, and using encapsulation so that a THD can
> have
>>>>>> multiple Statements, it will be very difficult to work on any
> future
>>>>>> parallelization efforts.
>>>>> You mean, a Statement can have multiple THDs, I suppose :)
>>>> No he means what he is saying. I don't know why you would want a
>>>> statement shared across multiple THD, but having a THD be able to handle
>>>> multiple statements means that you can do asynchronous queries within a
>>>> single connection.
>>> Ah, okay. I see.
>>>
>>> I thought that "parallelization", that Jay mentioned, means executing
>>> parts of a single statement in different threads - which, indeed, may
>>> need two THDs sharing the same Statement.
>> Yes, what Brian says...once the THD is distinguised from a pthread,
>> intra-statement parallelization is possible. We've only just begun this
>> step in Drizzle (a Session is now no longer intricately linked to a pthread
>> in Drizzle) but there's clearly a ton more to do. :) I know Alex Esterkin
>> thinks there is no reason to do such as thing, but perhaps he just needs an
>> example :)
>>
>> Imagine a Session which sends a few long-running SELECTs in the same client
>> connection. Currently, because the THD is linked to a single pthread in
>> MySQL, these SELECTs will not only block each other, but will be executed in
>> order. What if there was no reason to do so? The three Statements, if a
>> Session contained a vector of Statement objects, could parse and optimize
>> all three statements and decide to send two of them off to other scheduler
>> threads for execution, essentially parallelizing the operation (particularly
>> on a distributed node architecture...)
>>
>> Sure, like Alex mentioned, Drizzle's protocol supports parallel operations
>> (well, not really, it is non-blocking operations, but still...similar
>> concept). But, separating the notion of a thread with a Statement means
>> that we can achieve similar theoretical points for parallel operations. The
>> more options/points for parallel opportunities, the better we can scale, no?
>>
>> Cheers!
>>
>> Jay
>>
>> --
>> MySQL Internals Mailing List
>> For list archives: http://lists.mysql.com/internals
>> To unsubscribe:
>> http://lists.mysql.com/internals?unsub=1
>>
>>