I came up with a solution: using special "md5" to deal with conflicts
steps to insert a url
1. insert into table(md5,url) values('md5 of url', url)
2. if get a duplicate entry for primary key
2.1 select md5,url from table where md5 like 'xxxx%' for update
2.2 if url really exists, don't need insert anything
2.3 generate a special md5 for this url starts with xxxx,maybe
xxxx00001 and insert it
2.4 commit transaction
On Wed, Nov 6, 2013 at 10:56 AM, Li Li <fancyerii@stripped> wrote:
> I prefer your solution in that it's something like Optimistic Locking.
> but the problem is that if I define md5 as unique key and there exists
> 2 different urls with the same md5. I can't insert the second url
> On Tue, Nov 5, 2013 at 11:55 PM, Dan Nelson <dnelson@stripped> wrote:
>> In the last episode (Nov 05), Li Li said:
>>> I want to create a table with a long varchar column, maybe it's the url.
>>> according to dns spec, the url's max length is fixed. but I have
>>> to deal with url having long params such as
>>> I want the url is unique when inserting it.
>>> I googled and found
>>> this post suggests use md5 of url. But in theory, there will be
>>> conflict that two different urls will have the same md5(even it's
>>> probablitiy is very small). I want to a absolute correct solution.
>>> one method i can come up with is using select ... for update
>>> 1. begin transaction
>>> 2. select url from tb where md5='xxxx' for update
>>> 3. if the url is not exist, insert into this url; else do nothing
>> It might be more efficient to optimize for the common case here. The
>> assumption is that an md5 (or sha1 or sha2) hash collision is extremely
>> unlikely, so you could just insert your new row, and if you get a "duplicate
>> entry for primary key" error, then you can "select url from tb where
>> md5='xxxx' ", and compare the retreived url with the one you want to insert.
>> Dan Nelson