> From: mos <mos99@stripped>
>
>
> At 10:34 AM 10/24/2011, you wrote:
>> select id from table order by rand() limit 1;
>> is doing as example a dumb temporary table with the full size
>
> Because it has to sort the entire table, then it returns the one row. This of course
> is extremely inefficient. :)
That is absolutely incredible and counter-intuitive, and (as you say) extremely
inefficient!
This is used everywhere. Perhaps it is one of the biggest "anti-patterns" in SQL. I just
checked two different SQL "cookbook" sites, and they both recommend ORDER BY RAND().
I just googled around a bit, and found that putting RAND() in the WHERE clause is very
efficient:
SELECT id FROM table WHERE RAND() < 0.01 LIMIT 1
The comparison constant can be optimized for the number of rows you have. The above
returns the first record of 1% of the table. If you have a million rows, you might want
to bump that to something like 100 parts per million or so.
But really, folks, this is something so ubiquitous and so recommended, why can't the query
optimizer look out for ORDER BY RAND() and simply skip the table sort and just grab some
record? (Hopefully using something better than Knuth's LCRNG...)
----------------
Learning to think wholistically requires an overriding, or reversal, of much of the
cultural heritage of the last few hundred years. -- David Holmgren
:::: Jan Steinman, EcoReality Co-op ::::