Zipf's law:

Zipf's law /ˈzɪf/ states that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table. So word number N has a frequency of 1/N. Thus the most frequent word will occur about twice as often as the second most frequent word, three times as often as the third most frequent word, etc. For example, in one sample of words, the most frequently occurring word accounts for nearly 7% of all the words (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word "of" accounts for slightly over 3.5% of words (36,411 occurrences), followed by "and" (28,852). Only 135 vocabulary items are needed to account for half the sample of words. The same relationship occurs in many other rankings, unrelated to language, such as the population ranks of cities in various countries, corporation sizes, income rankings, etc. The appearance of the distribution in rankings of cities by population was first noticed by Felix Auerbach in 1913. It is not known why Zipf's law holds for most languages.



This program tries to replicate this phenomenon by useing a context based randomly genorated algorithm. This means that the string is randomly structured but each letter is determained by the other letters around it. Just as in language, any sentence could be formed but the words that are used in the sentence are determained on the words preceding it. Given that we use such a small number of characters (5) the deviation may be slightly off. For example, the second most used my be 43% or 62% instead of 50%. There may also be charaters tied for the second most used or third most used.

Output:





HISTOGRAM

W


Z


V


Y


X






In genral, the position of frequency will be inversly proportional to height of it's postion on the graph. The graph will subsiquently follow a shape of expinetail decay. Such as this:

W


Z


V


Y


X


Max

Med


(50%)