Q & A: Risk of Duplicates When Using MD5?
Yes, MD5 can produce hash collisions in a very small percentage of cases. For many uses this shouldn’t be significant, but for security there are better options.
I prefer the SHA-2 series, referred to as SHA-224/256/384/512, because the algorithms are strong and widely supported.
If you need the hashes to be un-guessable then I’d recommend hashing more than just the input data. A well accepted strategy is to include a secret key in the computation, resulting in a keyed-Hash Message Authentication Code (HMAC), and another useful technique is to concatenate a “salt”, which may or may not be secret, with the input.
PHP versions >= 5.1.2 have the hash_hmac and hash functions:
$hmac = hash_hmac('sha256', $data, $key); // hex string output
$hmac = base64_encode(hash_hmac('sha256', $data, $key, TRUE)); // force binary output before encoding
$hash = hash('sha256', $data . $salt);
PHP versions < 5.3 have the mhash function:
$hmac = base64_encode(mhash(MHASH_SHA256, $data, $key)); // mhash produces binary output
$hash = bin2hex(mhash(MHASH_SHA256, $data . $salt));
There’s a nice table of algorithms and their properties on Wikipedia.
Original email discussion was on the DC PHP Developers Group list.
Tags: DC PHP, hash, hash_hmac, HMAC, MD5, mhash, PHP, Q&A, Security, SHA, SHA-256
May 9th, 2008 at 8:16 am
thanks for mentioning this, I wonder if there’s a way to compute the chances of duplication in the md5 algorithm though.
Vladimir
May 9th, 2008 at 9:24 am
Vladimir,
More detailed discussions are available at http://eprint.iacr.org/2005/425.pdf and http://en.wikipedia.org/wiki/MD5