Hashing of usernames in syslog

Sun Sep 29 06:11:56 UTC 2002

> -----Original Message-----
> From: inn-workers-bounce at isc.org [mailto:inn-workers-bounce at isc.org]On
> Behalf Of Forrest J. Cavalier III
> Sent: Saturday, September 28, 2002 10:46 PM
> To: inn-workers at isc.org
> Cc: mibsoft at epix.net
> Subject: Re: Hashing of usernames in syslog
>
> If you are hashing usernames in order to obfuscate them, be aware
> that it can be trivial to create a dictionary of user-ids and the
> corresponding hashes.
>
> One way to limit this ability is to use a fixed secret (or "salt")
> which is hashed along with the username.
>
> But you must be sure to use a high entropy secret, and you must
> keep it secret.
>
> How you incorporate the secret into the md5 matters also: You
> should calculate md5("secret" "username" "secret"), and not
> md5("username" "secret") or md5("secret" "username")  (This
> is to disrupt known plaintext attempts to determine the secret.)

Bologna.  If somebody is methodically guessing at the salt, what prevents
them from also guessing assuming the form md5("secret" "username" "secret")?
This makes no difference at all, assuming enough entropy in the salt.

This would assume differential defects in md5 that may not exist.  From one
spot on the web:

<BEGIN>
Differential cryptanalysis has proven to be effective against one round of
MD5, but not against all 4 (differential cryptanalysis looks at ciphertext
pairs whose plaintexts has specfic differences and analyzes these
differences as they propagate through the cipher).
<END>

> The "known plaintext" means that an attacker with
> a valid user ID and access to the output can identify which
> log entries are theirs.  This gives them the plaintext and
> the hashed value.  (And makes it easier to search for the
> secret.)

As I said above, assuming an attacker can find a hashed entry that
corresponds to them, they can also guess at the secret and use md5("secret"
"username" "secret") to see if they can duplicate the hashed value.
md5("username" "secret") will be just as good as md5("secret" "username"
"secret") unless md5 has mathematical weakness, and that has not been
demonstrated.

And, if you're going to take stabs in the dark without any mathematical
evidence, what makes you think that md5("secret" "username" "secret") might
not lead to some other way to attack md5 because of the symmetry of the
input?  In other words, if your recommendation is without a mathematical
basis, then a guess like that might actually make things worse.

Let's assume that somebody has a short username, such as "a", and a salt of
"lterego".

If they know the MD5 of "a*******", this won't provide any information in
guessing "lterego".  MD5 is designed explicitly so no attack except brute
force is fesible.  Without a differential weakness in MD5, doubling up on
the "secret" is just superstition.

> For cryptographic purposes, you should know that there are some
> who consider md5's "reversible", and that SHS1 is better.  If true,
> then recovering the secret is not impossible, although the
> md5("secret" "username" "secret") form makes it much harder.

I think that MD5 has lived up to its executive summary statement by Ron
Rivest.  I don't believe that it is computationally practical to create any
string of bytes with the same md5 as another one.

Those "some" who consider md5 "reversible"--is there a published weakness in
MD5 or is that because it is only 128 bits versus 160 and computing
technology is evolving quickly enough that in a reasonable amount of compute
time one may be able to find another input to give the same hash?  Is this
assessment based on mathematical weakness or on the 128-bit size?

> Again, don't rely on my advice.  I am not an expert and this
> is all from memory.

Your advice is fine.  It just seems superstitious.  I want to see papers
which point out vulnerabilities in MD5.  I also want to see proof that an
input to MD5 with symmetry doesn't make things worse.

Best regards, Dave.