How to use regular expressions in C language
Author: Xiao Wenpeng
If the user is familiar with the SED, AWK, GREP, or VI under Linux, then the concept of regular expressions is certainly will not be unfamiliar. Since it can greatly simplify complexity when handling strings, it has now been applied in many Linux utilities. Don't think that the regular expression is just the patents of the scripting languages such as Perl, Python, Bash. As a C language programmer, users can also use regular expressions in their own programs. None of the standard C and C do not support regular expressions, but some libraries can assist C / C programmers to complete this feature, where the most famous PRILIP Hazel's Perl-Compatible Regular Expression library, many Linux release versions With this function library. Compiling regular expressions In order to improve efficiency, before comparing a string with regular expressions, you must first compile it with a regcomp () function, convert it to the regex_t structure:
Int Regcomp (Regex_t * PREG, Const Char * Regex, INT CFLAGS);
Parameter regex is a string that represents a regular expression that will be compiled; parameter preg points to a data structure declared as regex_t, used to save compilation results; parameter cflags determines how the regular expression will be processed. If the function regcomp () is successful, and the compile results are properly filled into the preg, the function will return 0, and any other return result represents some kind of error generation. Matching Regular Expressions Once the regular expression is successfully compiled with the regcomp () function, the regexec () function can be called:
INT Regexec (const regex_t * preg, const char * string, size_t nmatch, regmatch_t pmatch [], int effects);
Typedef struct {
REGOFF_T RM_SO;
REGOFF_T RM_EO;
} regmatch_t;
Parameter preg points to the compiled regular expression, the parameter string is a string that will be matched, and the parameters nmatch and PMATCH are used to return the matching result to the call, and the last parameter EFLAGS determines the details of the match. In the process of calling function regexec (), there may be many places in string string to match the given regular expression, the parameter pmatch is used to save these matching positions, and the parameter nmatch tells the function. Regexec () can populate how many matching results are populated into the PMATCH array. When the regexec () function is successfully returned, from String Pmatch [0] .rm_so to string pmatch [0] .rm_eo is the first matching string, and from string pmatch [1] .rm_so to string pmatch [1] .rm_eo, is the second matching string, so on. Release regular expressions Whenever it is no longer needed, the function regfree () will be called to avoid memory leakage.
Void Regfree (regex_t * preg);
The function regfree () does not return any results, which only receives a pointer to the regex_t data type, which is the compile result obtained by calling the REGComp () function. If multiple REGCOMP () functions are called for the same regex_t structure in the program, the POSIX standard does not specify whether the regfree () function must be released each time, but it is recommended that the regcomp () function is made for regular expressions each time. After compiling, a regfree () function is called to release the stored space occupied as soon as possible. Report error information If the call function regcomp () or regexec () is a non-0 return value, some errors have occurred during the processing of the regular expression, and can be obtained by calling the function regerror () Detailed error message. SIZE_T REGERROR (int errcode, const regex_t * preg, char * errbuf, size_t errbuf_size);
Parameter errcode is an error code from the function regcomp () or regexec (), and the parameter preg is the compilation result obtained by the function regcomp (), which is the purpose of providing the context necessary to the formatted message to the rerror () function. When performing a function regerror (), the maximum number of bytes indicated by parameter errbuf_size will be filled in the formatted error message in the ERRBUF buffer, and the length of the error message is returned. Applying regular expressions Finally, a specific instance is given, describing how to handle regular expressions in a C language program.
#include
#include
#include
/ * Sub string function * /
Static Char * SUBSTR (Const Char * Str, Unsigned Start, Unsigned End)
{
UNSIGNED N = end - start;
Static char stbuf [256];
STRNCPY (STBUF, STR START, N);
STBUF [N] = 0;
Return stbuf;
}
/ * Main program * /
INT main (int Argc, char ** argv)
{
Char * pattern;
INT X, Z, LNO = 0, CFLAGS = 0;
Char ebuf [128], lbuf [256];
Regex_t reg;
Regmatch_t pm [10];
Const size_t nmatch = 10;
/ * Compile regular expression * /
Pattern = argv [1];
z = regcomp (& reg, pattern, cflags);
IF (z! = 0) {
Regerror (Z, & Reg, EBUF, SIZEOF (EBUF));
FPRINTF (stderr, "% s: pattern '% s' / n", ebuf, pattern);
Return 1;
}
/ * Documentation processing input data * /
While (FGETS (LBUF, SIZEOF (LBUF), stdin) {
LNO;
IF ((z = strlen (lbuf))> 0 && lbuf [z-1] == '/ n')
LBUF [Z - 1] = 0;
/ * Match the regular expression for each line * /
Z = Regexec (& REG, LBUF, NMATCH, PM, 0);
IF (z == REG_NOMATCH) CONTINUE;
Else IF (z! = 0) {
Regerror (Z, & REG, EBUF, SIZEOF (EBUF)); FPRINTF (stderr, "% s: reg.com ('% s') / n", ebuf, lbuf);
Return 2;
}
/ * Output processing result * /
For (x = 0; x IF (! x) Printf ("% 04D:% S / N", LNO, LBUF); Printf ("$% d =% s' / n", x, substr (lbuf, pm [x] .rm_so, pm [x] .rm_eo); } } / * Release regular expression * / Regfree (& REG); Return 0; } The above program is responsible for obtaining regular expressions from the command line, then applied to each row of data obtained from the standard input, and prints a matching result. Execute the following command to compile and execute the program: # Gcc regexp.c -o regexp # ./Regexp 'regex [a-z] *' 0003: #include $ 0 = 'regex' 0027: regex_t reg; $ 0 = 'regex' 0054: z = regexec (& reg, lbuf, nmatch, pm, 0); $ 0 = 'regexec' Summary For programs that need complex data processing, regular expressions are undoubtedly a very useful tool. This paper focuses on how to simplify string processing in C-language to simplify string processing in order to obtain flexibility similar to PerL language in terms of data processing.