So you might have read about the recent leak of 6.5million LinkedIn passwords onto the internet. This comes at a fitting time for me since, having just completed the Udacity CS387 Applied Cryptography course, I’ve developed a new love for all-things cryptographic (in fact, I have some interesting ideas for a crypto-spatial library – encoding secret messages in the coordinate values of geographic instances – but that’s for a separate post).
Rather than just rely on the newspaper reports of the leak, I thought I’d practice some of my newly-acquired cryptography knowledge by examining the set of leaked passwords first-hand. It didn’t take long to acquire the 118Mb combo_not.zip source file – I’m not going to post a direct link here but I’m assuming you know how to search the internet, right? The file contains the leaked LinkedIn passwords not in plaintext, but hashed using the SHA-1 algorithm. This means that, while you can’t browse the list of passwords directly, it’s very easy to search whether a particular password is on the list, by just generating the SHA-1 hash of that password yourself, and testing whether the associated hash is on the list.
There’s one other thing to note, which is that the first five digits of many (if not most) of the hashed passwords in the combo_not.zip file have been overwritten with five zeroes: ‘00000’. So, although the SHA1 hash of the password “password” is not listed:
The following hash is on the list:
Anecdotal evidence suggests that “00000” is a marker to indicate that this password has already been cracked by the hackers (which, if you set your password as “password”, it probably deserves to be!).
So, first things first, I wrote a small Python program to check whether the hash of any given plaintext password was included in the datafile, either in its original hash form or in the overwritten form:
def check(filename, pass_to_search): """Test for occurrence of password hash within the file.""" with open(filename) as datafile: return any((pass_to_search in line or '0'*5 + pass_to_search[5:] in line) for line in datafile) import hashlib # Create the hash of the password to check hashed = hashlib.sha1("secretpassword").hexdigest() # Look for the hash in the LinkedIn datafile if check('combo_not.txt', hashed): print "password hash found!" else: print "password hash not found."
Running this script reveals that, indeed, at least one LinkedIn user has chosen “secretpassword” as their password. To test other passwords just replace the string in the line hashed = hashlib.sha1(“secretpassword”).hexdigest(). Other secure passwords chosen by LinkedIn users that you can find in the file include “abc123”, “mylinkedinpassword”, “opensesame”, “startrek” and “bigcock”.
Google recently ran the ad campaign shown below to encourage users to pick better passwords. Probably best not to copy Google’s suggestion on this issue either – 2bon2btitq appears on the list of leaked passwords too:
Concerningly for me, my own password (which is semi-random, alphanumeric, and over 10 characters in length) also features on the list, and appears to be one of those that has been cracked. I briefly contemplated the advice to reset my LinkedIn password but decided it would be more effective to simply cancel my LinkedIn account altogether. Any organisation that can’t take even basic steps to protect my information such as salting hashes isn’t worth dealing with. Perhaps they should subscribe to the next Udacity cryptography course to find out more…
Great post Alistair. I checked the data file and my password was there too. I can’t believe LinkedIn didn’t inform me.
Also, thanks for sharing the link to Udacity.
Thanks Lee – I’ve heard nothing from LinkedIn either and there’s no announcement on the site – I’d be more forgiving of them making the mistake in the first place if it looked like they were taking any action to admit, investigate, or resolve it!