List:General Discussion« Previous MessageNext Message »
From:Li Li Date:November 6 2013 3:33am
Subject:Re: how to create unique key for long varchar?
View as plain text  
I came up with a solution: using special "md5" to deal with conflicts
steps to insert a url
1. insert into table(md5,url) values('md5 of url', url)
2. if get a duplicate entry for primary key
    2.1 select md5,url from table where md5 like 'xxxx%' for update
    2.2 if url really exists, don't need insert anything
    2.3 generate a special md5 for this url starts with xxxx,maybe
xxxx00001 and insert it
    2.4 commit transaction



On Wed, Nov 6, 2013 at 10:56 AM, Li Li <fancyerii@stripped> wrote:
> I prefer your solution in that it's something like Optimistic Locking.
> but the problem is that if I define md5 as unique key and there exists
> 2 different urls with the same md5. I can't insert the second url
> anymore
>
> On Tue, Nov 5, 2013 at 11:55 PM, Dan Nelson <dnelson@stripped> wrote:
>> In the last episode (Nov 05), Li Li said:
>>>     I want to create a table with a long varchar column, maybe it's the url.
>>>     according to dns spec, the url's max length is fixed. but I have
>>> to deal with url having long params such as
>>> a.html?q=aaaaaaaaaaaaaaaaaaaaaaaaaaaa&fl=bbbbbbbbbbbbbbbbbbbbbbbb....
>>>     I want the url is unique when inserting it.
>>>     I googled and found
>>> http://stackoverflow.com/questions/6800866/how-to-store-urls-in-mysql
>>>     this post suggests use md5 of url. But in theory, there will be
>>> conflict that two different urls will have the same md5(even it's
>>> probablitiy is very small). I want to a absolute correct solution.
>>>     one method i can come up with is using select ... for update
>>>     1. begin transaction
>>>     2. select url from tb where md5='xxxx' for update
>>>     3. if the url is not exist, insert into this url; else do nothing
>>
>> It might be more efficient to optimize for the common case here.  The
>> assumption is that an md5 (or sha1 or sha2) hash collision is extremely
>> unlikely, so you could just insert your new row, and if you get a "duplicate
>> entry for primary key" error, then you can "select url from tb where
>> md5='xxxx' ", and compare the retreived url with the one you want to insert.
>>
>> --
>>         Dan Nelson
>>         dnelson@stripped
Thread
how to create unique key for long varchar?Li Li5 Nov
  • Re: how to create unique key for long varchar?Dan Nelson5 Nov
    • RE: how to create unique key for long varchar?Rick James5 Nov
    • Re: how to create unique key for long varchar?Li Li6 Nov
      • Re: how to create unique key for long varchar?Michael Dykman6 Nov
      • Re: how to create unique key for long varchar?Li Li6 Nov