Regular expression thinking
origin
There is no doubt that the processing of the string is one of our programming most often. At the same time, the string may also be one of the most difficult issues we have encountered. For a simple example, a text I have processed last week: 02 student list. The text is TXT format, each tuple, and each component is strictly separated by blank. I want to turn it into database information for query, because I haven't used any conversion tool, I can only achieve itself - replace each pilot into a SQL insert statement and complete (an old bird: I am not Say ~).
It seems a simple task, and it is achieved with a C standard stream. However, in the process of implementation, I found several questions: 1) The information such as the "political look" and "family call" and the "family call" may have a vacancy; 2) There may be blank characters in a single component, such as some two There is a space in the middle of the word student name (Day knows this text is what format file exported ~); 3) Extreme some student information is incomplete, even missing primary keys. These such these simple standard streams are almost impossible.
Regular expression
Nature we think of the regular expression. Regular expressions provide a way to describe string formats. For example, a student number is nine digits, it can be described as [0-9] {9}, or [[: Digit:]] {9}. The latter is a description method of Perl, which guarantees that there is versatility on the machine of different characters. Here we can solve the problem of empty items, such as the home phone can be described as [[: DIGIT:]] {, 12}, indicating that up to 12 digits, more tight methods are: (/ <[0] [[: DIGIT :]] {10, 11})? The first digit is 0, a length 11 or 12 digits, or a null string. For information on regular expressions, you can find it in any book about Linux / UNIX. The following Boost and other tools also have related documents.
Boost :: Regex
Today's C language supports the best expression is the Boost library (it seems to be nonsense). If you don't think about internal implementation, the regular expression of learning to use Boost is Easy ~ Boost's regular expression library is Boost :: Regex (regular expression is regular expression, this is all about it). The header file is
Boost :: Regex defines a STRING class similar to STL, is like:
Namespace boost {
Template Class traits = regex_traits Class allocator = std :: allocator Class Basic_Regex; Typedef Basic_Regex Typedef Basic_Regex } Then we can match it. The syntax is: Template Inline Bool Regex_match (const std :: Basic_string Const Reg_Expression Page, the above is copied in the original file, just use Boost :: Regex_match (strs, e) Where STRS is the source string, E is a regular expression, returns the BOOL value. Of course, it is mainly to take the matching element, and Boost also provides related support, it is a class: Namespace boost { TYPEDEF MATCH_RESULTS TYPEDEF MATCH_RESULTS Typedef match_results TYPEDEF MATCH_RESULTS } In addition, Boost also provides additional operations such as replacement. In short, BOOST provides completely identical, functional operations with Linux / UNIX's regular expressions on C . Recently I heard that several libraries such as Boost's regular expressions have been included in the standard, and it is really the trend of the trend. The above introduction is taken from Boost's own document, interested friends can find themselves. Perl The other two have to be mentioned is the Sed and GREP under Linux / UNIX (may have VI?), They are two editing tools that fully support regular expressions. However, as some popular statements, regular expressions "the most fascinating, most exciting" is in this footnon in Perl. Perl's syntax is simple and easy to learn, and it is because of its feature of its scripting language (or gelatin language), it can be and other systems and languages (except for C Boost, the popular programming language is not so supported regular expression. Let's go ~) Communication. However, talking about Perl, inevitably saying that Python is extremely ... Forget it, this area ... I am also in learning (everyone don't want B4 even), I will not misunderstood. Little Dongdong also wrote so many words, write technical articles for the first time, feel tired, huh, huh. Some reference books are listed below. Welcome any comments and suggestions, please send it to pmfrank@sina.com Linux Programming Collection (Linux Programming Bible John Goerzen Electronic Industry Press Python language programming gold (this is not around ~ Python introductory textbook, super easy to understand) Boost Libraries Manure Book.