June 9, 2012

Cryptography, Secure Passwords, and why I’m no longer on LinkedIn

So you might have read about the recent leak of 6.5million LinkedIn passwords onto the internet. This comes at a fitting time for me since, having just completed the Udacity CS387 Applied Cryptography course, I’ve developed a new love for all-things cryptographic (in fact, I have some interesting ideas for a crypto-spatial library – encoding secret messages in the coordinate values of geographic instances – but that’s for a separate post).

Rather than just rely on the newspaper reports of the leak, I thought I’d practice some of my newly-acquired cryptography knowledge by examining the set of leaked passwords first-hand. It didn’t take long to acquire the 118Mb combo_not.zip source file – I’m not going to post a direct link here but I’m assuming you know how to search the internet, right? The file contains the leaked LinkedIn passwords not in plaintext, but hashed using the SHA-1 algorithm. This means that, while you can’t browse the list of passwords directly, it’s very easy to search whether a particular password is on the list, by just generating the SHA-1 hash of that password yourself, and testing whether the associated hash is on the list.

There’s one other thing to note, which is that the first five digits of many (if not most) of the hashed passwords in the combo_not.zip file have been overwritten with five zeroes: ‘00000’. So, although the SHA1 hash of the password “password” is not listed:

5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8

The following hash is on the list:

000001e4c9b93f3f0682250b6cf8331b7ee68fd8

Anecdotal evidence suggests that “00000” is a marker to indicate that this password has already been cracked by the hackers (which, if you set your password as “password”, it probably deserves to be!).

So, first things first, I wrote a small Python program to check whether the hash of any given plaintext password was included in the datafile, either in its original hash form or in the overwritten form:

```def check(filename, pass_to_search):
"""Test for occurrence of password hash within the file."""
with open(filename) as datafile:
return any((pass_to_search in line or '0'*5 + pass_to_search[5:] in line) for line in datafile)

import hashlib
# Create the hash of the password to check

# Look for the hash in the LinkedIn datafile
if check('combo_not.txt', hashed):
print "password hash found!"
else:

```

Running this script reveals that, indeed, at least one LinkedIn user has chosen “secretpassword” as their password. To test other passwords just replace the string in the line hashed = hashlib.sha1("secretpassword").hexdigest(). Other secure passwords chosen by LinkedIn users that you can find in the file include “abc123”, “mylinkedinpassword”, “opensesame”, “startrek” and “bigcock”.

Google recently ran the ad campaign shown below to encourage users to pick better passwords. Probably best not to copy Google’s suggestion on this issue either – 2bon2btitq appears on the list of leaked passwords too:

Concerningly for me, my own password (which is semi-random, alphanumeric, and over 10 characters in length) also features on the list, and appears to be one of those that has been cracked. I briefly contemplated the advice to reset my LinkedIn password but decided it would be more effective to simply cancel my LinkedIn account altogether. Any organisation that can’t take even basic steps to protect my information such as salting hashes isn’t worth dealing with. Perhaps they should subscribe to the next Udacity cryptography course to find out more…

April 18, 2012

Further Education (or, get yourself free university-level computing knowledge)

The notion of “Online distance learning” conjures up many different associations. The Open University, for example, is (rightly) regarded as a world-leading educational establishment, and has been offering distance-learning degree courses to students since 1971. OU Lectures, originally broadcast on BBC2 in the middle of the night, are now commonly distributed over the internet. At the other extreme, there’s the non-accredited American College of Holistic Nutrition – the institution from which disgraced TV nutritionist “Dr” Gillian McKeith claims to have received her PhD via the internet…

The first Open University lecture, broadcast on 3rd January 1971. Unusually, not hosted by a man with a beard.

Whereas the courses offered by the OU were always designed with distance-learning in mind, in recent years there has been growing momentum behind the idea of Open Courseware. Basically, this involves “regular” universities allowing their course materials to be downloaded by anyone over the internet, for free. And the idea has been taken up by some top-class (mostly American) institutions – M.I.T., Yale, and the University of Michigan, for example, offer a wide range of videos, downloadable lecture notes, and past exam questions that offer anybody access to the same material as if they were enrolled in that course in person. Click the links above to browse their course catalogues and learn something new today.

Whilst much of the OpenCourseware material is pretty amazing, it suffers from one problem, and that is that the courses are very one-directional. As an internet student, you sit and watch a pre-recorded hour long lecture video, but with no ability to interact with the lecturer. Although you can sit the exams, nobody will mark your paper (but you can get model answers and mark your own paper). And, even if you attend and “pass” all the required units, you won’t get any qualification at the end of it.

Or, will you?

Enter Udacity. Founded in January 2012 by Sebastian Thrun (Professor, Google Fellow, former director of the Stanford Artificial Intelligence Laboratory… amongst others), it offers free online computing courses. Where it differs from some Open Courseware courses is that the material has been explicitly designed to be studied over the internet. This means that the video clips are divided into nice short chunks, hosted on YouTube, and can easily be watched on a phone or tablet device, say. The Udacity courses use Python as their language of choice, but you don’t need to download any software to your computer – there’s an interactive, browser-hosted environment for you to write your code in. There’s a forum where you can discuss with other students enrolled on the course, and occasionally the lecturers will input there too. You don’t just sit and passively watch a video – there are plenty of interactive quizzes and homework assignments, and these are graded (albeit automatically). And, at the end of it all, you do get a certificate of completion of the course.

Sure, at the moment, a certificate from Udacity might not look so impressive as a degree from Stanford, but 5 years down the line I’m not so sure that will still be true. I’m enrolled on two courses at the moment, and I have to say that I think the quality of the material is fantastic.

If you have any interest in learning how to program, or learning to program better,  I highly recommend you look at the courses they have available – the introductory level 1 course requires no previous programming experience, while the level 3 courses include Applied Cryptography and how to build a robotic car.

These clever men want to teach you how to program. For free. Why not let them?