Hi, Eric!
On Feb 18, Eric Jensen wrote:
> Sergei,
>
> I do still want to implement this, although I haven't had time to dedicate
> to it recently. If you could shed any light on my unanswered questions
> that would be great.
yes, I just want to understand the solution you propose (it'd be a pity
to spend a lot of time and efforts implementing something, only to find
out that it doesn't work as expected).
> Interleaving results of unordered component selects was just a proof of
> concept to show myself there wouldn't be any non-starter issues with
> keeping multiple selects "open" within a single query. It had the added
> bonus of me being able to run most of the test suite against those changes.
Ah, okay. I see.
> The reason to do a merge is the case where we want the entire union sorted
> equivalently to the component selects, especially when we only want the
> first n, i.e.:
Exactly.
But note that even when component selects are not explicitly ordered, in
the sense that there is no ORDER BY clause, you can still tell MySQL to
order them as it'll enable you to do the k-way merge. Ultimately, it
depends on the cost of sorting vs. the cost of using a temp table, and
it's up to the optimizer to make the best choice.
> (select blah from foo where x order by z limit n
> union
> select blah from foo where y order by z limit n) order by z limit n
>
> You are of course correct that the optimal solution is not to use a
> temporary table, but to do a proper k-way merge either with pairwise
> comparisons or a heap of the top elements for each of the k
> components. Either of those are O(n log k), as compared to the O(kn
> log kn) sort mysql currently does for the global order by. Even if we
> can achieve an order-by-index optimization on the components, mysql
> can only do a filesort for the global order by.
order by index isn't necessarily better than a filesort, as it often
involves much more random I/O, while filesort tries to access the disk
sequentially.
> The current implementation reads all the component selects into a
> temporary table and then runs an internal select against that to
> produce the result set.
Current *your* implementation ?
If yes - how is it different from what a vanilla MySQL is doing ?
> This is somewhat convenient, as that internal select just has the
> parsed global order-by from the query. Doing a proper merge would
> require the ability to perform the comparisons related to that
> order-by in code, as opposed to just applying the sort like any other
> select. It would also still require some way of exposing the results
> of the merge as a select_result (or whatever those are
> called...forgetting), which is done as a temporary table everywhere
> else we have intermediate results anyways. Both of these can be
> overcome, but I think the best plan is to do something simpler for my
> first version of this.
>
> My plan for using a temporary table with a btree index to do the top-n
> sorting avoids these difficulties, at the expense of building that
> index across all results in that temporary table versus popping them
> off of a heap which limits its size to k. In the average case, that
> is still much better than doing an O(kn log kn) sort.
You want to create a temporary table with a btree index, put all union
data into it, and then retrieve rows in the index order ?
Regards / Mit vielen Grüßen,
Sergei
--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik <serg@stripped>
/ /|_/ / // /\ \/ /_/ / /__ Principal Software Engineer/Server Architect
/_/ /_/\_, /___/\___\_\___/ Sun Microsystems GmbH, HRB München 161028
<___/ Sonnenallee 1, 85551 Kirchheim-Heimstetten
Geschäftsführer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Häring