Why Two Different Totals?

Word Count

Surely there's only one way to count the number of words in the dictionary!?! Right?

Well, as with all things statistical, its not that simple.

In earlier versions of the Readable Passphrase Generator, it reported the total number of words in the dictionary. Or more correctly, the total root words. That's the first number. This is what you'd find if you counted the number of entries in a real dictionary.

From version 0.12, it also counts the number of unique forms of words as well. This is what you'd count if you counted the number of words in a word list (like many password crackers use).

What's the difference? The former counts run, running, will run as one word, because it's all the same root: run. The latter counts 3, because each are different. There are many more unique forms than roots.

Of course, some forms of run are the same: the singular and plural future tense forms are the same (will run), while the singular and plural past tense are different (was running vs were running). Words need to be identical down to the letter to be identical.

And sometimes a word will appear as a verb and as an adjective and perhaps even as a noun too. The former will count one for each part of speech, the latter counts one in total because they're all identical.

Last edited May 2, 2013 at 10:41 AM by ligos, version 4

Comments

No comments yet.