It could even be done in a mixed way, store say 512 bit of key and 512
bit of hash, for example.
Locality is preserved in most cases this way, still having unique
compact and fast index.
Rick James ha scritto:
> I've seen it done. In fact they stood on their head to turn half of an MD5
> (of arbitrarily long blob) into a BIGINT. For a under a million rows, there
> is not too much risk of dup bigint when the data is actually diff.
> The full md5 is virtually foolproof. (If you had 9E12 blobs, there would be
> only one chance in 9e12 that you get an invalid collision.)
> Several times I have used the md5 (I like the 22-byte base64 variant) in a
> secondary, UNIQUE, key for locating a blob. Then I use the auto_increment
> PRIMARY KEY for joins.
> Locality -- With the MD5 (or BIGINT variant), you will be literally be
> randomly jumping around the db. If it is too big for cache, count on cache
> being vitually useless. On the other hand, with the mapping from md5 to
> auto_inc, you may get some locality -- the rows come in chronologically, and
> [usually] you tend to access recent rows more often that the oldies.
>> -----Original Message-----
>> From: Andrew Prendergast [mailto:ap@stripped]
>> Sent: Friday, October 20, 2006 4:10 PM
>> To: internals@stripped
>> Subject: why hash indexes only for memory storage engine
>> Hi folks,
>> Has anyone considered making hash indexes available for
>> innoDB & MyISAM
>> storage engines?
>> Currently the only way of enforcing a unique key constraint
>> on a VARCHAR
>> field is with a btree, however in the case of VARCHAR fields over 256
>> chars this is impractical for large numbers of rows.
>> Examples of where this has been a problem for me:
>> - a table with a VARCHAR(1024) field containing a URL and 20
>> million rows.
>> - a VARCHAR(1024) column containing sparse vectors for use
>> with apps like
>> Weka & Yale, again in the order of 10s of millions.
>> - storing VARCHAR UUIDs/GUIDs, email addresses & email
>> message IDs in the
>> hundreds of millions of rows range.
>> The issue is that there is no way of enforcing a unique
>> constraint without
>> massive index overhead.
>> The normal workaround of only indexing the first few chars of
>> a field is
>> irrelevant for unique indexes.
>> MySQL Internals Mailing List
>> For list archives: http://lists.mysql.com/internals
>> To unsubscribe: