8.11.5 搜索字符串(4)
试一试:排序文本中的单词
本例将读一个文本块,然后提取一些单词,并以升序输出它们。在此将使用相当低效的冒泡排序函数,这在Ex8_14中可以看到。在第10章将使用一个好用得多的库函数来排序,不过在使用此函数前需要先了解一些别的知识。该程序也会计算每个单词出现的次数,并输出各个单词的个数。因此,这样的分析称为词语搭配(collocation)。代码如下:
- // Ex8_15.cpp
- // Extracting words from text
- #include <iostream>
- #include <iomanip>
- #include <string>
- using std::cin;
- using std::cout;
- using std::endl;
- using std::ios;
- using std::setiosflags;
- using std::resetiosflags;
- using std::setw;
- using std::string;
- // Sort an array of string objects
- string* sort(string* strings, size_t count)
- {
- bool swapped(false);
- while(true)
- {
- for(size_t i = 0 ; i < count-1 ; i++)
- {
- if(strings[i] > strings[i+1])
- {
- swapped = true;
- strings[i].swap(strings[i+1]);
- }
- }
- if(!swapped)
- break;
- swapped = false;
- }
- return strings;
- }
- int main()
- {
- const size_t maxwords(100);
- string words[maxwords];
- string text;
- string separators(" \".,:;! ()\n");
- size_t nwords(0);
- size_t maxwidth(0);
- cout << "Enter some text on as many lines as you wish."
- << endl << "Terminate the input with an asterisk:" << endl;
- getline(cin, text, '*');
- size_t start(0), end(0), offset(0); // Record start & end of word & offset
- while(true)
- {
- // Find first character of a word
- start = text.find_first_not_of(separators, offset); // Find non-separator
- if(string::npos == start) // If we did not find it, we are done
- break;
- offset = start + 1; // Move past character found
- // Find first separator past end of current word
- end = text.find_first_of(separators,offset); // Find separator
- if(string::npos == end) // If it's the end of the string
- { // current word is last in string
- offset = end; // We use offset to end loop later
- end = text.length(); // Set end as 1 past last character
- }
- else
- offset = end + 1; // Move past character found
- words[nwords] = text.substr(start, end-start); // Extract the word
- // Keep track of longest word
- if(maxwidth < words[nwords].length())
- maxwidth = words[nwords].length();
- if(++nwords == maxwords) // Check for array full
- {
- cout << endl << "Maximum number of words reached."
- << endl << "Processing what we have." << endl;
- break;
- }
- if(string::npos == offset) // If we reached the end of the string
- break; // We are done
- }
- sort(words, nwords);
- cout << endl
- << "In ascending sequence, the words in the text are:"
- << endl;
- size_t count(0); // Count of duplicate words
- // Output words and number of occurrences
- for(size_t i = 0 ; i < nwords ; i++)
- {
- if(0 == count)
- count = 1;
- if(i < nwords-2 && words[i] == words[i+1])
- {
- ++count;
- continue;
- }
- cout << setiosflags(ios::left) // Output word left-justified
- << setw(maxwidth+2) << words[i];
- cout << resetiosflags(ios::right) // and word count right-justified
- << setw(5) << count < < endl;
- count = 0;
- }
- cout << endl;
- return 0;
- }
下面是该程序的部分输出:
- Enter some text on as many lines as you wish.
- Terminate the input with an asterisk:
- I sometimes think I'd rather crow
- And be a rooster than to roost
- And be a crow. But I dunno.
- A rooster he can roost also,
- Which don't seem fair when crows can't crow
- Which may help some. Still I dunno.*
- In ascending sequence, the words in the text are:
- A 1
- And 2
- But 1
- I 3
- I'd 1
- Still 1
- Which 2
- a 2
- also 1
- be 2
- can 1
- can't 1
- crow 3
- crows 1
- don't 1
- dunno 2
- fair 1
- he 1
- help 1
- may 1
- rather 1
- roost 2
- rooster 2
- seem 1
- some 1
- sometimes 1
- than 1
- think 1
- to 1
- when 1