From: Jan Steinman Date: October 31 2011 1:51am Subject: Re: large temp files created by mysql List-Archive: http://lists.mysql.com/mysql/226200 Message-Id: <9D71EFE2-F169-41C8-BE7C-A547E8712255@Bytesmiths.com> MIME-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable > From: mos >=20 >=20 > At 10:34 AM 10/24/2011, you wrote: >> select id from table order by rand() limit 1; >> is doing as example a dumb temporary table with the full size >=20 > Because it has to sort the entire table, then it returns the one row. = This of course is extremely inefficient. :) That is absolutely incredible and counter-intuitive, and (as you say) = extremely inefficient! This is used everywhere. Perhaps it is one of the biggest = "anti-patterns" in SQL. I just checked two different SQL "cookbook" = sites, and they both recommend ORDER BY RAND(). I just googled around a bit, and found that putting RAND() in the WHERE = clause is very efficient: SELECT id FROM table WHERE RAND() < 0.01 LIMIT 1 The comparison constant can be optimized for the number of rows you = have. The above returns the first record of 1% of the table. If you have = a million rows, you might want to bump that to something like 100 parts = per million or so. But really, folks, this is something so ubiquitous and so recommended, = why can't the query optimizer look out for ORDER BY RAND() and simply = skip the table sort and just grab some record? (Hopefully using = something better than Knuth's LCRNG...) ---------------- Learning to think wholistically requires an overriding, or reversal, of = much of the cultural heritage of the last few hundred years. -- David = Holmgren :::: Jan Steinman, EcoReality Co-op ::::