Yes, MD5 can produce hash collisions in a very small percentage of cases. For many uses this shouldn’t be significant, but for security there are better options.
I prefer the SHA-2 series, referred to as SHA-224/256/384/512, because the algorithms are strong and widely supported.
If you need the hashes to be un-guessable then I’d recommend hashing more than just the input data. A well accepted strategy is to include a secret key in the computation, resulting in a keyed-Hash Message Authentication Code (HMAC), and another useful technique is to concatenate a “salt”, which may or may not be secret, with the input.
PHP versions >= 5.1.2 have the hash_hmac() and hash() functions:
$hmac = hash_hmac('sha256', $data, $key); // hex string output
$hmac = base64_encode(hash_hmac('sha256', $data, $key, TRUE)); // force binary output before encoding
$hash = hash('sha256', $data . $salt);
PHP versions < 5.3 have the mhash() function:
$hmac = base64_encode(mhash(MHASH_SHA256, $data, $key)); // mhash produces binary output
$hash = bin2hex(mhash(MHASH_SHA256, $data . $salt));
There’s a nice table of algorithms and their properties on Wikipedia.
Original email discussion was on the DC PHP Developers Group list.
thanks for mentioning this, I wonder if there’s a way to compute the chances of duplication in the md5 algorithm though.
Vladimir
Vladimir,
More detailed discussions are available at http://eprint.iacr.org/2005/425.pdf and http://en.wikipedia.org/wiki/MD5