List:General Discussion« Previous MessageNext Message »
From:mos Date:October 25 2011 3:45am
Subject:Re: large temp files created by mysql
View as plain text  
At 05:31 PM 10/24/2011, Reindl Harald wrote:


>Am 24.10.2011 23:31, schrieb mos:
> > At 11:32 AM 10/24/2011, Reindl Harald wrote:
> >
> >
> >> Am 24.10.2011 18:02, schrieb mos:
> >> > At 10:34 AM 10/24/2011, you wrote:
> >> >> select id from table order by rand() limit 1;
> >> >> is doing as example a dumb temporary table with the full size
> >> >
> >> > Because it has to sort the entire table, then it returns the 
> one row. This of course is extremely inefficient. :)
> >> > You need to choose a random row by using an auto-inc field. 
> Something like:
> >> >
> >> > select id from table where id>=myrandomnum limit 1
> >>
> >> but this is TOTALLY braindead if "id" is a primary-KEY with auto-increment
> >
> > It all depends on how many holes you have in the sequence and how 
> random you want the selections to be. If there
> > are no holes then it will work. You need of course to get the 
> first and last id and generate "myrandomnum" within
> > that range. If there are a lot of holes in the sequence then 
> build another table with the columns bin and an
> > autoinc column and pick one of those rows randomly. Regenerate 
> the table once an hour or once a day.
> >
> > Either way it is going to be a LOT FASTER than sorting the entire table
>
>and why in the world is with the query above the WHOLE table
>copied in a temp-table while fecth the whole id-list in a
>php-array and take a random one is more than 1000 times faster?
>
>the implementation if "order by rand()" is totally braindead

It is not "braindead". You told MySQL to sort by rand() which is a 
non-indexed column. It needs to assign a value to each row of the 
result set (all ids of the table) and sort it to get the lowest 
random number. This is very inefficient for large tables.

That is why I recommended using:

select id from table where id>=myrandomnum limit 1

provided id is uniformly distributed (not too many holes).  If there 
are a lot of holes in the distribution then copy the id's to a 
separate table with its own auto-in rcd id. This will take about 1-2 
seconds. Now you can select a random row from this table using the 
same SQL statement above and get a random id in about 1 ms. I didn't 
say to load all of the ids into the separate table (or a PHP array) 
every time you wanted a random id.  That would be "braindead". You 
will need to update this secondary table whenever your table gets 
rows inserted or deleted (or reload the contents of the secondary 
table every few minutes).

Now there is a 3rd way to get a random row that is easier to 
implement. Add an extra column to your table called RndNum Float. 
When a row is added to the table, assign a random number to the 
column. This column needs to be indexed. Now you can execute:

select id from table where RndNum >= rand() limit 1

and this will return a random id in a couple of ms. You should use a 
stored procedure and handle the occasions when no row is returned 
because rand() that was generated was larger than the largest RndNum 
in the table. Depending on how random you want the results, you can 
make the index unique and handle the occurrences when a duplicate 
rand() number is generated.


>_______________
>
>SELECT pri_key FROM table ORDER BY RAND() LIMIT 1;
>
>let's say 'pri_key' as a autoincrement integer
>let's say the table has some blob-fields and is 4 GB large
>
>fetch ONE random id via mysql will explode your webserver
>because 4 GB data is copied - there is no valid reason for
>such crappy implementation, really no single reason

The blob field is not included in the query so it won't be copied and 
the server will not explode. If it worked the way you said it did, 
then we would have exploding web servers with so much blob material 
encircling every small that not even Steve McQueen could eradicate it.

What does happen is a temporary table is created with all of the 
pri_key values and is sorted on Rand() and a single pri_key is 
returned. Since pri_key is an index, it only needs to access the 
index file and won't even bother accessing the data in the table.

And that's the way it is.

Mike


Thread
large temp files created by mysqlJoey L24 Oct
  • Re: large temp files created by mysqlJoey L24 Oct
    • Re: large temp files created by mysqlReindl Harald24 Oct
      • Re: large temp files created by mysqlmos24 Oct
        • Re: large temp files created by mysqlJoey L24 Oct
          • Re: large temp files created by mysqlReindl Harald24 Oct
        • Re: large temp files created by mysqlReindl Harald24 Oct
          • Re: large temp files created by mysqlmos24 Oct
            • Re: large temp files created by mysqlReindl Harald25 Oct
              • Re: large temp files created by mysqlmos25 Oct
                • Re: large temp files created by mysqlReindl Harald25 Oct
                  • Re: large temp files created by mysqlmos26 Oct
                    • Re: large temp files created by mysqlReindl Harald26 Oct
Re: large temp files created by mysqlAntony T Curtis24 Oct
  • Re: large temp files created by mysqlJoey L24 Oct
    • Re: large temp files created by mysqlJohan De Meersman24 Oct
Re: large temp files created by mysqlJan Steinman31 Oct
  • Re: large temp files created by mysqlJan Steinman31 Oct
    • Re: large temp files created by mysqlLuis Daniel Lucio Quiroz24 May
      • RE: large temp files created by mysqlRick James24 May
        • RE: large temp files created by mysqlLuis Daniel Lucio Quiroz24 May