List:General Discussion« Previous MessageNext Message »
From:Jerry Schwartz Date:July 14 2011 2:50pm
Subject:RE: How to Shuffle data
View as plain text  
There are a couple of problems with using any database for doing this.

- Rows in a table are inherently in no particular order. That means they are 
neither sorted nor random.
- Depending upon the keys you are using (an auto-increment field for example), 
you might be able to select a random row; but that wouldn't guarantee that you 
wouldn't get the same row twice.

There is a way around this that sounds tempting, but it would be hideously 
impractical:

1. Create a copy of your original table, `t2`, with an auto-increment index 
`aui` (either from the original table or created at this point.
2. Create yet another table, `t3`, with the same structure but no data in it.
3. Select a random record from `t2` using
   SELECT ROUND(RAND()*MAX(`t2`.`aui`)) FROM `t2` INTO @RANDREC;
4. Copy that randomly-selected record into `t3` using
   INSERT INTO `t3` SELECT * FROM `t2` WHERE `t2`.`aui` = @RANDREC;
5. Remove the selected record from `t2` using
   DELETE FROM `t2` WHERE `t2`.`aui` = @RANDREC;
6. Loop back to step 3 until `t2` is empty.

Here's why, tempting as it is, it wouldn't work: as the number of records in 
`t2` dwindles, it will become less and less likely that @RANDREC will actually 
match an existing value of `t2`.`aui`. By the time you got down to a handful 
of records, it might take years to find one.

Now if the LIMIT clause accepted variables instead of constants, you could 
replace steps 3 and 4 with something like this:

   SELECT ROUND(RAND()*COUNT(`t2`.`aui`)) INTO @RANDSTART;
   SELECT FROM `t2` LIMIT @RANDSTART, 1;

And that would actually work (give or take some corner cases I haven't thought 
about very hard).

In point of fact, that's basically what you need to do whether you do it in 
MySQL or in an external program. The difference is that in an external program 
you can effectively renumber your rows as you remove them, so that it won't 
get harder and harder to find an actual row. Even then, a random shuffle is 
likely to be time-consuming.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: jerry@stripped
Web site: www.the-infoshop.com

>-----Original Message-----
>From: Adarsh Sharma [mailto:adarsh.sharma@stripped]
>Sent: Wednesday, July 13, 2011 9:59 AM
>To: walter@stripped
>Cc: mysql@stripped
>Subject: Re: How to Shuffle data
>
>Do i need to move output to a CSV file & then write a Java or C code for it.
>
>I think a procedure can do it easily , If I know the proper function for
>that.
>
>Thanks
>
>walter@stripped wrote:
>> What you want to do is not shuffle, but reverse the domain order, starting
>> from the TLD (top level domain). You may pick the string functions you
>> need from the string functions page of the manual, but the problem is that
>> there may be any number of subdomains, so that in principle you would need
>> a loop construct. If I were in your position, I wouldn't use MySQL to do
>> this.
>> ciao
>> Walter
>>
>>
>>> Dear all,
>>>
>>> I have million of sites stored in url column of a mysql table.
>>>
>>> Some few examples are :-
>>>
>>> www.facebook.com/home
>>> adelaide.yourguide.com/news/local/news/entertainment/cd-review-day-and-age-
>the-killers/1401702.aspx
>>> abclive.in/abclive_business/2393.html
>>> abclive.in/abclive_business/assocham_manufacturing_companies.html
>>> abclive.in/abclive_business/b-ramalinga-raju-satyam-financial-
>irregularities.html
>>> aktualne.centrum.cz/report/krimi/clanek.phtml?id=635342
>>> aktualne.centrum.cz/report/krimi/clanek.phtml?id=635306
>>>
>>> I want to take the output in a tsv file the sites url in the below forms :
>>>
>>> com.faebook.com/home
>>> com.yourguide.adelaide/news/local/news/entertainment/cd-review-day-and-age-
>the-killers/1401702.aspx
>>> in.abclive/abclive_business/2393.html
>>> in.abclive/abclive_business/assocham_manufacturing_companies.html
>>> in.abclive/abclive_business/b-ramalinga-raju-satyam-financial-
>irregularities.html
>>> cz.centrum.aktualne/report/krimi/clanek.phtml?id=635306
>>> cz.centrum.aktualne/report/krimi/clanek.phtml?id=635342
>>>
>>> I need to shuffle the . words . Is there any in built function in mysql
>>> to achieve this.
>>>
>>> Thanks
>>>
>>>
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:
>>> http://lists.mysql.com/mysql?unsub=1
>>>
>>>
>>
>>
>>
>>




Thread
How to Shuffle dataAdarsh Sharma13 Jul
  • Re: How to Shuffle dataReindl Harald13 Jul
  • Re: How to Shuffle datawalter13 Jul
    • Re: How to Shuffle dataAdarsh Sharma13 Jul
      • RE: How to Shuffle dataJerry Schwartz14 Jul
        • Re: How to Shuffle datashawn wilson14 Jul
      • Re: How to Shuffle datahsv15 Jul
  • Re: How to Shuffle datahsv13 Jul
  • Re: How to Shuffle datashawn wilson14 Jul
  • Re: How to Shuffle dataRaj Shekhar14 Jul