At 05:31 PM 10/24/2011, Reindl Harald wrote:
>Am 24.10.2011 23:31, schrieb mos:
> > At 11:32 AM 10/24/2011, Reindl Harald wrote:
> >> Am 24.10.2011 18:02, schrieb mos:
> >> > At 10:34 AM 10/24/2011, you wrote:
> >> >> select id from table order by rand() limit 1;
> >> >> is doing as example a dumb temporary table with the full size
> >> >
> >> > Because it has to sort the entire table, then it returns the
> one row. This of course is extremely inefficient. :)
> >> > You need to choose a random row by using an auto-inc field.
> Something like:
> >> >
> >> > select id from table where id>=myrandomnum limit 1
> >> but this is TOTALLY braindead if "id" is a primary-KEY with auto-increment
> > It all depends on how many holes you have in the sequence and how
> random you want the selections to be. If there
> > are no holes then it will work. You need of course to get the
> first and last id and generate "myrandomnum" within
> > that range. If there are a lot of holes in the sequence then
> build another table with the columns bin and an
> > autoinc column and pick one of those rows randomly. Regenerate
> the table once an hour or once a day.
> > Either way it is going to be a LOT FASTER than sorting the entire table
>and why in the world is with the query above the WHOLE table
>copied in a temp-table while fecth the whole id-list in a
>php-array and take a random one is more than 1000 times faster?
>the implementation if "order by rand()" is totally braindead
It is not "braindead". You told MySQL to sort by rand() which is a
non-indexed column. It needs to assign a value to each row of the
result set (all ids of the table) and sort it to get the lowest
random number. This is very inefficient for large tables.
That is why I recommended using:
select id from table where id>=myrandomnum limit 1
provided id is uniformly distributed (not too many holes). If there
are a lot of holes in the distribution then copy the id's to a
separate table with its own auto-in rcd id. This will take about 1-2
seconds. Now you can select a random row from this table using the
same SQL statement above and get a random id in about 1 ms. I didn't
say to load all of the ids into the separate table (or a PHP array)
every time you wanted a random id. That would be "braindead". You
will need to update this secondary table whenever your table gets
rows inserted or deleted (or reload the contents of the secondary
table every few minutes).
Now there is a 3rd way to get a random row that is easier to
implement. Add an extra column to your table called RndNum Float.
When a row is added to the table, assign a random number to the
column. This column needs to be indexed. Now you can execute:
select id from table where RndNum >= rand() limit 1
and this will return a random id in a couple of ms. You should use a
stored procedure and handle the occasions when no row is returned
because rand() that was generated was larger than the largest RndNum
in the table. Depending on how random you want the results, you can
make the index unique and handle the occurrences when a duplicate
rand() number is generated.
>SELECT pri_key FROM table ORDER BY RAND() LIMIT 1;
>let's say 'pri_key' as a autoincrement integer
>let's say the table has some blob-fields and is 4 GB large
>fetch ONE random id via mysql will explode your webserver
>because 4 GB data is copied - there is no valid reason for
>such crappy implementation, really no single reason
The blob field is not included in the query so it won't be copied and
the server will not explode. If it worked the way you said it did,
then we would have exploding web servers with so much blob material
encircling every small that not even Steve McQueen could eradicate it.
What does happen is a temporary table is created with all of the
pri_key values and is sorted on Rand() and a single pri_key is
returned. Since pri_key is an index, it only needs to access the
index file and won't even bother accessing the data in the table.
And that's the way it is.