This project has moved and is read-only. For the latest updates, please go here.

Simpler dictionary?

Mar 26, 2013 at 6:32 PM
Hi,
Fantastic project, really pleased to come across it.

But I'm finding the words in the built in dictionary a little complex, many of which aren't words used in day to day language and therefore difficult to spell (at least for me!).

It would be great to have a simplified list of words.

Or if there is a way to filter the dictionary to common words or something similar.

Keep up the good work
Alex.
Mar 28, 2013 at 11:32 AM
Thanks for the feedback Alex.

My criteria for including words was: if it's in my vocabulary, I include it (plus a few "interesting" words even I didn't know). Plus I limit to 9 character words (or maybe 10, can't remember now). I know my vocab is a bit bigger than average, but I figure it's a learning experience for you! (A little secret: I may have a wide vocab, but my spelling sucks, so I need to type a phrase for a few days before I get the hang of it).

There isn't an easy "tick-a-box" style option to use easy words. But here are a few ideas to reduce the complexity (from easiest to hardest):
  • Generate a 20 or so passphrases and pick an easy one from the list (I'm pushing the pick-from-a-list approach more in the current version).
  • Reduce the max length of the phrases generated; that should keep the really long words from appearing so often.
  • Download the console version and grab the raw dictionary from it. Then go through and delete any words you don't know. And then select your modded dictionary as a custom one from the option dialog.
  • Contribute some of your own "easier" words for the stock dictionary. Because all words are equally likely, the more easy words, the more often they'll come up.
  • Make your own dictionary without all the hard words.
Once I get through all the words I pulled together to make the dictionary (I'm only 15% of the way through!), I may add some sort of measure of word frequency attribute. But that's a way off yet.

Murray
Mar 28, 2013 at 7:09 PM
Thanks for the tips Murray.

And thanks for the background info, your basis for the words makes sense.

I'm thinking of generating passwords for a default password on a site, so it would be really handy to have some sort of difficulty rating.
But I must admit I have no idea how to conceive such a thing. :-)

I wonder if the Oxford dictionary or something similar has a list of common spoken words.

Hopefully if I get some spare time I can help out and create a list of easy words.

I think I'll perhaps go with a max length and perhaps a refresh button to cycle the phrases.

thanks again.
Mar 29, 2013 at 11:48 PM
You may want to try a specific PhraseStrength rather than the random one. Either Normal or Strong would keep the generated phrase shorter, although it wouldn't affect the difficulty of the words themselves.

If you were so inclined, you could make a "blacklist" of really hard words and filter phrases out which contain them as you generate them. Or make make your own list of allowable "easy" words and only allow phrases which contain those words. Basically, some manual post-processing of the generated phrases.

Oh and I like the idea of having a passphrase for the default password!

I'm actually downloading Google's ngram corpus (http://books.google.com/ngrams/info). That should give me the word frequency statistics I need to rate word difficulty. But don't hold your breath for me to implement it for a while; there are a few other things on my plate! (You could use the 1-grams to derive your own list of commonly spoken words, but even they are several gigs to download!)
Mar 30, 2013 at 1:30 AM
Cool, good idea with the black list.

I've not heard of the ngram, i'll have to check that out. thanks.

Keep up the good work.