The practice of programming
Chapter 3 Design and Implementation
In this section, we focus on one kind problem: generate random English text that reads well.
The solution is called 'The Markov Chain Algorithm', and its main idea is shown as follows:
First we scan an article and save it in a special chart. For example, we use two adjacent words as prefix, by searching which we can get all the possible following words. We use 'hashtable', 'list', 'vector' to realize this function. The details are left out.
Then we are to generate a paragraph based on the data above. The first several words are picked up randomly. And each time, we choose one of its suffix randomly. To ensure the random in just one scanning, we use a stractage, that is, 'use a variable "n" to count the current number of suffixes, use a string to save the previous chosen suffix. each time the present suffix can replace the previous suffix by the possibilition 1/n. When the scan is over, the suffix chosen is determined.'
Last, to limit to total length of the generated paragraph, we add a symbol of ending to each prefix's map. Also, use a variable to count to total length we have generated.
The book introduce the solving programs in C, JAVA, C++, Awk, Perl. Their different realizations are all anchored in the same main idea. By comparison, we can found that C has the longest code lines and the highest efficience. Awk and Perl are the easiest. The quickest is C, then is Perl. Their adjustence is also different. If the number of prefix is changed, C, C++, Java is easy to modefy, while Perl and Awk are not.