Motivation

The Windows 2008 R2 on our build server has been reminding me to change the password for quite some time now. Since I can remember I’ve been mostly using relatively short (8-10 characters) randomly generated passwords. While it is not too hard (although takes time) to remember one of those, there’s a much better way to generate a password, as was illustrated in this comic.

Of course, I didn’t want to directly follow that suggestion, so I came up with my own requirements for a perfect password (and perfect password generator) and figured it’s time for me to finally do the thing I wanted to do for a long time now: to become one of the many who’ve created an xkcd-style password generator.

Development

It’s been over a year since I’ve created this gist:

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import random
>>> random.seed()
>>> import nltk
>>> from nltk.corpus import wordnet as wn

>>> nouns, verbs, adjectives, adverbs = [list(wn.all_synsets(pos=POS)) for POS in [wn.NOUN, wn.VERB, wn.ADJ, wn.ADV]]

>>> def gen_phrase(*pattern): return [random.choice(i) for i in pattern]
>>> def phrase_to_string(phrase): return ' '.join(s.lemmas[0].name for s in phrase)
>>> def gen_password(): return phrase_to_string(gen_phrase(adverbs, adjectives, adjectives, nouns))

>>> gen_password()
'decidedly Noachian autoplastic bag_lady'
>>> gen_password()
'ad_val faithful regimented accordance'
>>> gen_password()
'familiarly votive empty Gadiformes'
>>> gen_password()
'universally surmountable wiry field_mouse'
>>> gen_password()
'deadpan bilingual grandiose oxygen_deficit'
>>> gen_password()
'flip-flap heterozygous unenlightened solitaire'
>>> gen_password()
'only piquant apart Massawa'
>>> gen_password()
'continuously forty-two coarse-haired Israel'
>>> gen_password()
'actually trigger-happy tactful cobia'
>>> 

This gist is basically the essence of what I wanted to achieve: generate passwords by given template.

Here, I used nltk (which is a natural language processing toolkit for Python), which, among other things, gives you access to a dozen of structured corpora and dictionaries and whatnot. I used it to pick a random word that is a specific part of speech. As the WordNet dictionary was the better-structured one, and with permissive license, I’ve decided to use that.

Now, how do I make a webapp out of it, so I can access it from anywhere? I’ve decided that I should finally give Google App Engine a try. Overall, registering an application, downloading the example code, deploying and seeing it running on the appengine server took about 5 minutes, which I think is pretty impressive.

Using nltk and WordNet directly on the appengine seemed overkill, so I didn’t do that. Instead, I’ve flattened the dictionary and all the metadata to a much simpler structure using the following code:

import nltk
from nltk.corpus import wordnet
print 'DICTIONARY =', repr(
    { pos: [lemma.name
        for synset in wordnet.all_synsets(pos=getattr(wordnet, pos))
        for lemma in synset.lemmas]
        for pos in ['NOUN', 'VERB', 'ADJ', 'ADV'] })

As a result, I get a Python dictionary with entries for the parts of speech (nouns, verbs, adjectives and adverbs). Each entry of this dictionary contains a list of words that are of that particular part of speech. The dictionary isn’t stored in any kind of database, but instead is imported into the main.py. The source code for the generated dictionary is a massive 2.8Mb .py file.

By storing the dictionary in a Python file I also simplified the deployment a lot, as the download for WordNet dictionary is a separate step which I wasn’t sure could be automated. I also think this way it actually makes the web app to load faster, as parsing a huge source file is probably faster than querying WordNet via nltk.

Result

Couple hours of work, and voila – the app is ready and you can access it here. I realize that the writing is goofy, but my excuse is that I just had to fill all that space that this Twitter Bootstrap example template had.

One cool feature my generator has that I haven’t seen anywhere else is that the user can set the template from which the password will be generated. For example, the default template is <adj><noun><00>, which means that it will concatenate a random adjective, a noun and a two digit number to create a password. Like this: distributionaldisplaycase83. So far the templating capabilities are limited, but I will probably work on expanding them in future.

Of course, in the spirit of all modern things, I’ve also made the API to the generator available here. The available parameters are template (default is <adj><noun><00>) and count (default is 5). For example, a GET request to https://sparemaranta.appspot.com/api/1/generate?template=<adv>-<adj>-<noun>.<000>&count=3 yielded this for me:

{
  "passwords": [
    "bad-ineligible-staripomoea.837", 
    "astern-shortened-conflictofinterest.857", 
    "unscientifically-indurate-cithern.760"
  ], 
  "template": "<adv>-<adj>-<noun>.<000>"
}

The successful reply will be a JSON object with two properties: passwords, which will contain a list of generated passwords, and template, which will just echo the received template, for debugging purposes.

Only after I was done, I’ve found out that there’s already a generator like that: see here (and it’s even been covered at lifehacker). Interestingly, while it’s been over two years, googling for a password generator will still yield the usual mash-some-random-chars-together-style generators on the first page.

It is questionable whether the safety of an online password generator is good enough, even though the data is pumped over a secure connection, but I think it is at least good enough for me. I might make the generator offline, but for now this project has served it’s both purposes – create something useful and have some fun.

Source code

comments powered by Disqus