/ *
TQuery.c
The compiler I used is GNU C and VC2003, and the following modifications are required for this program:
1 tQuery.c -> tQuery.cpp
2
3 Increase #include
4 Delete Allocator and the ",", ", pay attention to leave a space between >>, because the compiler is not a fairy, it will put >> as an operator
5 Delete 250 rows of Diff_Type, for the current compiler, it is outdated
6 Perform a GNU C : g -o tquery.ext tquery.cpp [Enter]
7 Executes for VC2003: CL TQuery.cpp [Enter]
* /
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
Using namespace std; // For convenience, the explicit specified name space is a standard name space
Typedef Pair
Typedef vector
Typedef Vector
Typedef Pair
// If you are writing, you will definitely put the location, loc, text, text_loc in a column, why do you want to do this? I think about it is a personal preference.
// Question, this is the benefit of it, it will not confuse the definition
Class textQuery {
PUBLIC:
TextQuery () {MEMSET (this, 0, sizeof (textQuery));} / * We know that Memset is a function of a C language. This sentence is set to the initial size of this memory size as the size of the SIZEOF (TEXTQUERY). 0, I understand this constructor is: Because such member functions are manipulated, some String is some String, so that memory is to improve the efficiency of String processing, because MemSet returns a Void * pointer, but Interpretation is interpreted as a char * pointer * / static void filter_Elements (STRING FELEMS) {filt_lex = flex;} // Static member function says: "I am doing Filt_Elems"
Void query_text ();
Void Display_map_text ();
Void Display_text_locations ();
Void DOIT () {
Retrieve_text ();
Separate_words ();
Filter_text ();
SUFFIX_TEXT ();
Strip_caps ();
Build_word_map ();
}
Private:
Void Retrieve_Text ();
Void Separate_words ();
Void filter_text ();
Void strip_caps ();
Void Suffix_Text ();
Void Suffix_s (String &);
Void Build_word_map ();
Private:
Vector
TEXT_LOC * text_locations;
Map
STATIC STRING FILT_ELEMS; / / I belong to class, so I belong to all objects, but I have a little relationship with them, that is, // makes them don't exist, and I also exist, and I am unique, I will not each object There is a copy, so I save space, be careful not to initialize me in the class, // Why? Ask Bjarne Stroustrup, it is
}
String textQuery :: filt_elex ("/") (/// ");
int main ()
{
TextQuery TQ;
tq.doit ();
Tq.Query_text ();
tq.display_map_text ();
Return 0;
}
// get text
/ * I am a big narrative, what is the retrieve_text function is doing and how to do it, this function reads every line in the input text file, record its contents, why do you say this, you think about it: it put Enter the text file and one row, read it, look String, put it with push_back in the vector, the 0 elements in the VECTOR of the brief are the first line of the Text file, and so on. How is it doing? First, it builds an IFStream object. This object represents yourself on behalf of you, then getting GetLine, pay attention to getLine is a Getline provided for stream, one is Getline provided for String in
TEXTQUERY ::
Retrieve_text ()
{
String file_name;
COUT << "please enter file name:";
cin >> file_name;
ifstream infile (file_name.c_str (), iOS :: IN); // Infile is an IFStream object, of course ifstream is a template class defined by TypeDef, which locks the char type as a parameter, the constructor requires a const char * The meaning of iOS :: in IN is that iOS is a stream template that locks the char type, and I is telling the stream to open a file for input.
IF (! infile) {
CERR << "OOPS! Unable to open file"
<< file_name << "- bailing out! / n";
EXIT (-1);
}
Else Cout << "/ n";
LINES_OF_TEXT = New Vector
String Textline;
While (getLine (infile, textline, '/ n'))
Lines_of_text-> push_back (textline); // put a line in the string vector pointing to Lines_OF_Text, Vector Self-add 1
}
/ / Separate a single word and capture its position (? Line,?
/ *
Question 1: How to separate a single word?
A: Depending on the word space
Question 2: How to iterate? A: For loop iteration, the While loop looks for a space of each line
* /
Void
TEXTQUERY ::
Separate_words ()
{
Vector
Vector
For (short line_pos = 0; line_pos
{
Short Word_POS = 0;
String textLine = (* line_of_text) [line_pos];
String :: size_type eol = textline.Length ();
String :: size_type pos = 0, prev_pos = 0;
While ((pOS = textline.find_first_of ('', pOS))! = String :: npos) / * For example: For the first line:
Alice Emma Has Long Flowing Red Hair. Her Daddy Says Eol: 52 POS: 5 line: 0 Word: 0 Substring: Alice means, number to 52, the first line ends, numbers to 5th time (Remember Mr. Lippman said that the number starts from 0), is the 0th line, marked the first word, the location is 0, the content is Alice, put the alice in Words, Words is a string vector, think why To put it in the vector, because you want to give a separate word, a code, such as No. 0, No. 1, No. 2 ....... * /
{
Words-> Push_back (TextLine.Substr (Prev_POS, POS - PREV_POS);
Locations-> push_back (make_pair (line_pos, word_pos)); / * Locations is a pair vector, which is executing this sentence to be {Word [0] = (0, 0) ...} * /
Word_pos ; POS ; prev_pos = POS;
}
Words-> Push_back (TextLine.Substr (Prev_POS, POS - PREV_POS);
Locations-> push_back (opportune);
}
Text_locations = New Text_loc (Words, Locations);
/ *
TEXT_LOC complete types are:
Pair
* /
// Filter to the punctuation
Void
TEXTQUERY ::
FILTER_TEXT ()
{
IF (Filt_Lems.empty ())
Return;
Vector
Vector
Vector
While (iter! = iTER_END)
{
String :: size_type pos = 0;
While ((POS = (* iter) .find_first_of (pOS))! = String :: npos) // Find_First_of If you do not find a match, you will return string :: npos so you can use string :: npos as a flag
(* iTer) .led (POS, 1);
iTer ;
}
}
// Handfix the suffix
Void
TEXTQUERY ::
SUFFIX_TEXT ()
{
Vector
Vector
Vector
While (iter! = iTER_END)
{
// IF 3 or Less Characters, Let IT BE
IF ((* ip) .size () <= 3) {{{i ; Continue;
IF (* ip) [(* iter) .size () - 1] == 's')
Suffix_s (* iter);
// Additional Suffix Handling Goes Here ...
iTer ;
}
}
// Process the auxiliary function
Void
TEXTQUERY ::
SUFFIX_S (String & Word)
{
String :: size_type spos = 0;
String :: size_type pos3 = word.size () - 3;
// "OUS", "SS", "IS", "IUS"
String Suffixes ("Oussisius");
IF (! Word.comPare (POS3, 3, Suffixes, SPOS, 3) ||
! Word.comPare (POS3, 3, SUFFIXES, SPOS 6, 3) ||
Word.comPare (POS3 1, 2, Suffixes, SPOS 2, 2) ||
Word.comPare (POS3 1, 2, Suffixes, SPOS 4, 2))
Return;
String IES ("IES");
IF (! Word.comPare (POS3, 3, IES)) {
Word.Replace (POS3, 3, 1, 'Y');
Return;
}
String SES ("SES");
IF (! Word.comPare (POS3, 3, SES))
{
Word.rase (POS3 1, 2);
Return;
}
// Erase Ending 'S'
Word.rase (POS3 2);
// Watch out for "'S"
IF (Word [POS3 1] == '/' ')
Word.rase (POS3 1);
}
// Treat uppercase letters
Void
TEXTQUERY ::
Strip_caps ()
{
Vector
Vector
Vector
String Caps ("AbcDefghijklmnopqrStuvwxyz");
While (iter! = iter_end) {
String :: size_type pos = 0;
While (((POS = (* iter) .find_first_of (caps, pos))! = String :: Npos)
(* iTer) [POS] = TOLOWER ((* iTer) [POS]);
iTer;
}
}
Void
TEXTQUERY ::
Build_word_map ()
{
Word_map = new map
TypedEf Map
Typedef set
Set
IFStream Infile ("Exclusion_Set");
IF (! infile)
{
Static string default_excluded_words [25] = {
"THE", "and", "but", "That", "are", "been",
"CAN", "Can't", "Cannot", "Could", "DID", "For",
"Had", "Have", "HIM", "His", "Her", "ITS", "INTO",
"WERE", "Which", "WHEN", "with", "would"
}
CERR << "Warning! Unable to open word exclusion file! -" << "" "" "" "" "" "
Copy (default_excluded_words, default_excluded_words 25, insert)))));
}
Else {
iStream_iterator
Copy (Input_set, EOS, INSERTER (Exclusion_Set, Exclusion_Set.Begin ()));
}
// iperate through the the the word, entering the key / pair
Vector
Vector
Register int elem_cnt = text_words-> size ();
For (int = 0; ix { String textword = (* text_words) [ix]; // Exclusion Strategies // Less Than 3 Character or in Exclusion Set IF (TextWord.size () <3 || Exclusion_set.count (TextWord)) CONTINUE; IF (! Word_map-> count (* text_words) [ix])) {// Not Present, Add IT: Loc * Ploc = New Vector Ploc-> push_back (* text_locs) [ix]); Word_map-> INSERT (value_type ((* text_words) [ix], ploc); } ELSE (* Word_MAP) [(* text_words) [ix] -> push_back ((* text_locs) [ix]); } } // Handle the user's query, this function is like a shell Void TEXTQUERY :: Query_text () { String query_text; Do { Cout << "Enter a Word Against Which to search the text./n" << "To Quit, Enter a Single Character ==>"; CIN >> query_text; IF (query_text.size () <2) Break; String Caps ("AbcDefghijklmnopqrStuvwxyz"); String :: size_type pos = 0; While ((POS = query_text.find_first_of (caps, pos))! = String :: Npos) Query_text [POS] = TOLERY_TEXT [POS]); // if we index into map, query_text is entered, if Absent // Not at all what we stay what for ... IF (! word_map-> count (query_text)) { COUT << "/ nsorry. there is no entries for" << query_text << "./n/n"; CONTINUE; } Loc * ploc = (* word_map) [query_text]; Set Loc :: item Liter = Ploc-> Begin (), LITER_END = PLOC-> END (); While (liter! = liter_end) { Occurrence_lines.insert (Occurrence_Lines.end (), (* liter) .first); liter; } Register int size = opcurrence_lines.size (); Cout << "/ n" << query_text << "occrurs" << size << (size == 1? "Time:": "Times:") << "/ n / n"; Set For (; it! = opcurrence_lines.end (); it) { INT line = * it; Cout << "/ t (line" // Don't confound user with text lines starting at 0 ... << Line 1 << ")" << (* Lines_of_text) [line] << ENDL; } Cout << Endl; } While (! query_text.empty ()); COUT << "OK, BYE! / N"; } Void TEXTQUERY :: Display_map_text () { Typedef Map Map_text :: item t = word_map-> begin (), iter_end = word_map-> end (); While (iter! = iter_end) { Cout << "Word:" << (* iter) .first << "("; INT LOC_CNT = 0; Loc * text_locs = (* iter). Second; Loc :: item Liter = text_locs-> begin (), LITER_END = TEXT_LOCS-> end (); while (liter! = liter_end) { IF (LOC_CNT) Cout << "; ELSE LOC_CNT; COUT << "(" << (* liter) .first << "," << (* liter). SECOND << ""; liter; } Cout << ") / n"; iTer; } Cout << Endl; } Void TEXTQUERY :: Display_text_locations () { Vector Vector Register int elem_cnt = text_words-> size (); IF (elem_cnt! = text_locs-> size ()) { CERR << "OOPS! INTERNAL ERROR: Word and Position Vectors" << "" are of unequal size / n " << "Words:" << ELEM_CNT << "" << "LOCS:" << Text_locs-> size () << "- bailing out! / n"; EXIT (-2); } For (int = 0; ix { Cout << "Word:" << (* text_words) [ix] << "/ t" << "Location: (" << (* text_locs) [ix] .first << "," << (* text_locs) [ix]. Second << ")" << "/ n"; } Cout << Endl; } / * Sample Input Text: ------------------ Alice Emma Has Long Flowing Red Hair. Her Daddy Says When the Wind Blows Through Her Hair, IT Looks Almost Alive, Like a Fiery Bird in flight. a beautiful Fiery Bird, He Tells Her, "Daddy, Shush," "Daddy, Shush," SHE TELLS HIM, AT The Same Time Wanting Him To Tell Her. Shyly, She Asks, "I mean, daddy, is there?" --------------------- Sample query session: --------------------- Please enter file name: Alice_EMMA Warning! Unable to open word exceset file! - USING DEFAULT SET ENTER A Word Against Which to search the text. To quit, Enter a single character ==> Alice Alice Occurs 1 Time: (LINE 1) Alice Emma Has Long Flowing Red Hair. Her Daddy Says ENTER A Word Against Which to search the text. To quit, Enter a single character ==> DADDY Daddy Occurs 3 Times: (LINE 1) Alice Emma Has Long Flowing Red Hair. Her Daddy Says (Line 4) Magical Butned. "Daddy, Shush, There is no self" (LINE 6) Shyly, She Asks, "I mean, daddy, is there?" ENTER A Word Against Which to search the text. To Quit, Enter a single character ==> Phoenix Sorry. There are no entries for phoenix. ENTER A Word Against Which to search the text. To quit, Enter a single character ==>. OK, BYE! -------------------------------------------------- ------------ Sample text map after: (a) Stripping Out Puncture, (b) Eliminating semantically neutral word such as `the`, (c) SUFFIXING, SO THAT FIXES AND FIX BECOME FIX, AND (d) Removal of Capitalization -------------------------------------------------- --------- Word: Alice ((0,0)) Word: alive (1,10)) Word: Almost ((1,9)) Word: Ask ((5, 2)) Word: Beautiful ((2,7)) Word: Bird ((2, 3), (2, 9)) Word: blow ((1,3)) Word: Daddy ((0, 8), (3, 3), (5, 5)) Word: EMMA ((0,1)) Word: Fiery ((2, 2), (2, 8)) Word: flight ((2,5)) Word: flowing ((0,4)) Word: Hair ((0,6), (1, 6)) Word: HAS ((0,2)) Word: like (2,0)) Word: long ((0,3)) Word: Look ((1,8)) Word: magical ((3,0)) Word: mean ((5, 4)) Word: more (4,12)) Word: red ((0,5)) Word: Same ((4, 5)) Word: Say ((0,9)) Word: SHE ((4, 0), (5, 1)) Word: shush ((3, 4)) Word: shyly ((5,0)) Word: Such ((3, 8)) Word: Tell ((2, 11), (4, 1), (4, 10)) Word: there ((3, 5), (5, 7)) Word: Thing ((3,9)) Word: through ((1,4)) Word: Time ((4, 6)) Word: UntaMed ((3, 2)) Word: Wanting (4,7)) Word: Wind ((1, 2)) * /