Introduction Boost.Regex

xiaoxiao2021-03-06  28

Introduction 1 Regular Expression is a form of mode matching frequently used in text processing, which may be familiar with GREP, SED, AWK and other tools under UNIX or Perl language, which are widely used expression. Traditional C users are also limited by POSIX C API'S (Portable Operateing System Interface Standard) to operate regular expressions, while Regex has provided these API's, although it is not necessarily the best way to use POSIX C regular expression library. For example, regex can process wide character strings, or search, replacement operations (similar to SED or Perl in a sense), which cannot be implemented in conventional C libraries. Class boost :: reg_expression is a key class in the Regex library. It represents the regular expression of "Machine Read", and REG_EXPRESSION is built on String, which can be considered a string function, plus this regular expression. The state machine required for the formula algorithm. Like Std :: Basic_String, it provides two special versions for CHAR and WCHAR_T: Namespace Boost {Template , class allocator = std :: allocator > Class Basic_Regex ; typef Basic_Regex regex; typedef Basic_Regex WregEx;}

Do you know what the regex library is? It can be imagined to write a credit card handler. Credit cards typically have 16-digit numbers of numbers, each of which is separated by spaces or linked fonts. Do we not check if these credit card numbers are stored in the database? Don't we check if these numbers meet the correct format? In order to match any number, we can use regular expressions [0-9], the width of the digital strings can be used [[: DIGIT:]], of course these are the POSIX standard. Simplified to / D in Regex and Perl (note that many old libraries are hardcoded to C-Locale, so this is not a problem). The following regular expression can verify the format of the credit card number. (/ D {4} [-]) {3} / d {4} () marking sub-expression, {4} is repeated 4 times. This is only an example of a regular expression of Perl, awk, egrep. Regex also supports the more old "basic" syntax used by SED and GREP, although they are rarely used unless you need to reuse some substantially regular expressions.

Now let's put this expression in the C code to verify the format of the credit card number: BOOL VALIDATE_CARD_FORMAT (const st :: string s) {static const boost :: regex e ("(// d {4} [-]) {3} // d {4} "); RETURN Regex_match (s, e);} Note How we used to add some additional escape sequences (or translated into: escape characters): To Know that the regular expression engine processes the escape character, the escape character can only be identified once by the C compiler, so embedded the escape character of the regular expression in the C code must be double-written (write twice). Also note that your compiler must support Koening Lookup 2 (such as VC6 is not supported), otherwise you need to add some boost :: prefixes to some function references. Those who are familiar with the credit card, may also think of the above format is suitable for people's reading, does not represent the format of the online credit card system (may be 16 or 15 no interval numbers). We need a simple conversion method, we need 2 strings, one is a regular expression, one is a format string (providing a description of the matched content). In Regex , Search and Replace can complete the regex_merge algorithm. We give the following two algorithms for format conversions for format conversions: // // A (// D {3, 4}) [-]? (// d {4}) [-]? (// D {4}) [-]? (// D {4}) // z "); const std :: string machine_format (" // 1 /////////////////////////////// 2 - // 3 - // 4 " ); std :: string machine_readable_card_number (const st: string s) {return regex_merge (s, e, machine_format, boost :: match_default | boost :: format_sed);

Std :: string human_readable_card_number (const st :: string s) {return regex_merge (s, e, human_format, boost :: match_default | boost :: format_sed);

Here, we divide the number into 4 pieces in a regular expression, and the Format String replaces the matched content to a specified format with a syntax similar to the SED. In the above example, we have not directly operating matching results, and the matching results include all matching and some sub-model matching. When the matching result of the regular expression is required, it is necessary to use the instance of Class match_results. The following is a common type of specialty version: namespace boost {typedef match_results cmatch; typef match_results wcmatch; typedef Match_results smatch; typef match_results wsmatch;} regex_search and regex_grep algorithms are used to match_result. Note that these algorithms are not limited to general C-Strings, and any bidirectional iterator type can be searched, which provides a possibility for seamless search any type of data.

For those who don't like templates, they can also use Class Regex, which is a high-level package for template code, which provides simple interface for people who don't have the entire function of the library, of course it only supports narrow characters. Regular expression syntax of Narrow Character and "Extended". For people who want to be compatible with POSIX can use PosixAPI functions: RegComp, Regexec, Regfree, Regeror, these for Narrow Character and Unicode are applicable.

Finally, this library now supports run-time localization, which fully identifies POSIX regular expression syntax, including some multi-character elements and equivalent advanced features, it can also compatible with other regular expressions. The library includes the regex package of GNU, BSD4.

Installation and configuration

First of all, when you decompress this library from the ZIP file, you must retain its internal structure. If you don't do this, then you have to delete the file you decompressed, reproduce it once. This library does not need to be configured before using most common compilers / standard library / platforms. If you encounter a configuration problem, or want to test your compiler configuration information, you can refer to the configuration document (this and the processing procedure of all other libraries of Boost).

Since this library is mixed with the template code (header file) and static code data (in the CPP file), the code supported by the library must be generated in the library and file files before you use. The following is a few specific platforms: Borland C Builder Microsoft Visual C 6 and 7 If you use VC5, you may have to find the previous version of this library. Open a command prompt (its MSVC environment variable must be defined, if you do not run vcvars32.bat, you are located in / bin), enter the / libs / regex / build directory to select the correct Makefile, VC6 is VC6. MAK, supporting Stlport is VC6-Stlport.mak call nmake -fvc6.mak If you want to contain all LIB, DLL files, lib files in / lib, DLL files in / BIN, you can use nmake -fvc6.mak install to delete all temporary files in the generation process, you can use the nmake -fvc6.mak clean only to add root directory to your project included in the directory list. There is no need to manually add the * .lib file to the project, because the correct .lib file is selected in the header file. Note: If you want to static link to the regex library, you can define boost_regex_static_link (take effect in the Release version). If you want to use the source file directly, you can define boost_regex_no_lib, so that the automatic selection library will be invalid.

1. Introduction: http://www.boost.org/libs/regex/doc/introduction.html2.koening Lookup: When a Function IS Called, in Order To Determine IF That Function Is Visible In The Current Scope, The Namespaces IN which the functions parameters reside must be takenot inTo account.

转载请注明原文地址:https://www.9cbs.com/read-66449.html

New Post(0)