Comprehensive Statistical Analysis on the Crackability of Real-World Passwords

Hello everyone. My name is Aikaterini Kanta and I’m a PhD student with University College Dublin, and the European Commission’s Joint Research Centre. I’m here today at DFRWS APAC 2021 to present a comprehensive statistical analysis on the crackability of real world passwords.

The reference number of passwords a user needs to remember was shown to be 27 in one study, although a whopping 191 in another one. And this is because we use passwords for everything: we use them when we go to the bank, we use them when we log into our social media. They can be of different types: they can be numerical, they can be phrases and they can be patterns. And lately we’ve seen that for critical websites, multifactor authentication has been used more and more, but still the single password remains the most popular method of authentication.

This is why we decided to look into leaked passwords and more specifically, the “Have I Been Pwned” list, which contains passwords from 3.9 billion real world accounts stemming from various data breaches. For this list, we have statistics on the length, makeup and strength of these passwords to assess how long we will take to crack them, and what the building components of these passwords are. Our paper, which contains this analysis, is currently under review. One part of the analysis of the paper is a fragment analysis, which were performed using a tool called [inaudible], which can segment the passwords into fragments, and then analyze them and classify them according to their semantic meaning, with the help of WordNet.

The types of fragments that can be found in a password are letter fragments, numbers, and special fragments. For example, the password “manchester.2019” is split into “manchester” which is classified as a city, “.”, which is classified as a special character, and “2019”, which is classified as a year. You can also see on the slide, the number of fragments per category, and the total number of fragments, which is about 1.5 billion.

It’s worth it to mention that the original “Have I Been Pwned” list contained about 500 million passwords. Which means that the number of fragments is about three times as much. You can also see the 10 most popular fragments per category. And there are some entries that are not very surprising, such as keyboard [inaudible], sequences of numbers, single digits, and single symbols.

Here you can see the most common fragment categories. We can see that the three most popular categories contain numbers, with a category “common-number” describing numbers that are either sequences of numbers, or numbers that are meaningful to us, such as “314”, which is the number for pi. We can also see that other categories containing names like masculine name feminine name are very popular. And also categories with cities, animals, computers, food, colors, emotions are also very popular. We can also see the most frequent fragment combinations, or again, a combination of common numbers is the most popular one. Another combination that is very interesting is that names, like masculine and feminine names, are followed by a number, a digit, or a year.

The results of this analysis show that even a data set as large and as diverse as “Have I Been Pwned”, there is indeed context to be found in passwords. Demographic factors like whether someone is male or female, whether they’re English or non-English speaking, what their age is and their profession play important role in password selection. For example, one study of Chinese speaking users show that about 50% of them usually include numbers in their password. Users also tend to include familiar models in their passwords: personal information, such as male and female names, birth dates, city names and pet names. This is why one of the most popular categories was a male or female name, followed by number. Users also use their own interests like animals, food, and emotions when they create their passwords.

But this is useful to the cell investigator. If you want to gain access to the device of the suspect, you can look into their digital life like the information from your local devices, their online presence and their previous passwords. All of that information can then be used to create a personalized dictionary of keywords, and then in conjunction with mangling rules, create a smarter dictionary list that is tailored to the suspect. This means that during the password cracking process, password candidates that are likely to be the password will be checked first, thus expediting the process. Thank you for watching this short talk, and please don’t hesitate to reach out to us with any questions.

Leave a Comment