To generate human-readable and pronounceable passwords, I usually rely
on apg
or pwgen
, but I needed a solution that can be used from a
Python program, which should be portable so that I can’t rely on the
presence of apg
or pwgen
on the computer.
First, we need to download Ned Batchelder’s hyphenate.py that implements Frank Liang’s hyphenation algorithm (the one used in TeX). It allows to break words into syllables and to build a big set of elements that we’ll be able to combine later.
In the script below, we load a dictionary (typically
/usr/share/dict/words
), select only the words that are all lower
case alphabetic (to exclude proper nouns, words ending with 's
,
etc.), break them into syllables, keep only the short ones (some
syllables are really long, up to full words, perhaps because the
algorithm fails on some words), then we build a Python module with two
set objects words
and syllables
.
# gendict.py
import sys, pprint
from hyphenate import hyphenate_word
words = set()
syllables = set()
for dictfile in sys.argv[1:] :
for line in open(dictfile) :
word = line.strip()
if word.isalpha() and word.islower() :
words.add(word)
syllables.update(s for s in hyphenate_word(word)
if len(s) < 5)
print("words = %s\n\n" % pprint.pformat(words))
print("syllables = %s\n" % pprint.pformat(syllables))
This script can be invoked as, for instance:
$ python gendict.py /usr/share/dict/words > wordlist.py
Then, we are ready to create our password generator:
# pwgen.py
import sys, random
import wordlist
nonalpha = "0123456789-+/*=%,;.:!?$&#|@"
def pwgen (length) :
while True :
pw = []
for s in random.sample(wordlist.syllables, length) :
pw.append(s)
if random.randint(0, 1) :
pw[-1] = pw[-1].capitalize()
if random.randint(0, 1) :
pw[-1] = pw[-1].swapcase()
pw.extend(random.sample(nonalpha, random.randint(1, length)))
random.shuffle(pw)
attempt = "".join(pw)
word = attempt.translate(None, nonalpha).lower()
if word not in wordlist.words :
return attempt
if __name__ == "__main__" :
for length in sys.argv[1:] :
print(pwgen(int(length)))
The while
loop is used to check that we don’t generate a word that
actually exists in our list.
$ python pwgen.py 1 2 3 4 5
LUM!
6nateR4
GRAB6SHIMALMS4
?lAXpORSviva&vels
Ba/saraZels9,mAKEBREW2
Hopefully you will not have the same output. ;-)
Using module random
is actually a weakness because it uses a
pseudo-random generator that is not considered secure enough to be
used for cryptography. However, on systems where os.urandom
is
available, it is used to seed the pseudo-random generator, which
should result in decent passwords. A stronger implementation would
need to explicitly reimplement random.sample
and random.randint
using os.urandom
.