10 million passwords: quick analysis underlines users' need for usability in security

A few days ago Mark Burnett fortunately released 10 million passwords captured for research to the public. Captured password lists are rarely that long - hence, this is a good opportunity to take a look at users' password usage, and test if our findings are in line with common sense and previous research on password usage. At first we need to convert the file to UTF-8 (you can download the converted file here). We then do a quick analysis of the data using a small Bash and R script. Specifically, we look at password lengths and the most used passwords.

Distribution of password length

When looking at the distribution of password length, we see that passwords of length 6 and 8 are used most. Length 3 and smaller seems to be used rarely, while above length 8 usage declines drastically.

Overall most frequently used passwords

The most frequently passwords could easily be guessed or derived by dictionary attacks - or are related to patterns users type on their keyboards (e.g. keys aligned horizontally or vertically on the keyboard that get pressed serially, like 1qaz2wsx). To underline that his is not only true for the 10 most frequently used passwords we show excerpts up to the #3000 to #3010 most used passwords here. And - as expected - the overall most frequently used passwords are combinations of 123456... (different lengths and orders).

(click to enlarge)

Most frequently used passwords per password length

When looking at most frequently used passwords for specific lengths we see: the more the longer passwords get, the more users seem to try to related passwords to patterns on the keyboard layout (such as the vertically aligned 1qaz2wsx).

(click to enlarge)

While average users might already be aware that passwords like 123456 drastically decrease security and therefore are considered insecure, this might not be the case for passwords related to other keyboard layout patterns. Such patterns are even visible in passwords connected to possibly important accounts. Try some of the following grep filters on the password file to see examples:

grep $'root.*\t' 10-million-combos-utf8.txt
grep $'admin.*\t' 10-million-combos-utf8.txt
grep $'default.*\t' 10-million-combos-utf8.txt
grep $'config.*\t' 10-million-combos-utf8.txt
grep $'sql.*\t' 10-million-combos-utf8.txt
grep $'apache.*\t' 10-million-combos-utf8.txt
grep $'server.*\t' 10-million-combos-utf8.txt
grep $'wordpress.*\t' 10-million-combos-utf8.txt
grep $'bank.*\t' 10-million-combos-utf8.txt
grep $'research.*\t' 10-million-combos-utf8.txt
grep $'\.com\t' 10-million-combos-utf8.txt
grep $'\.at\t' 10-million-combos-utf8.txt
grep $'\.edu\t' 10-million-combos-utf8.txt

But most notably, these findings underline how users struggle to remember their secrets - hence use "tricks" to make remembering them easier, while also lowering the associated security.