Use BOOST regular expression library on C Builder6
Written: aw Ready
Regular expression is a mode matching form that is usually used in a text program in which it is processed. For example, the GREP tool we often use, or the Perl language uses a regular expression. Traditional C treatment regular expressions are very troublesome, which also has many other language fans's laughter, now the situation is different, because there is Boost.
Boost is a Template-based development source code library. There are many sub-libraries in this library to efficiently handle all aspects of issues, such as string split, format, thread, etc., Boost is for every C enthusiast. It should be understood that if you can use Boost if you can use the VCL in the case of skilled use of VCL, I want to be like a tiger.
In general, it is very simple to use Boost, and there is not much difference in using other STL libraries, but the regular expression library using Boost is not so easy, because this library also needs to be compiled separately, I will detail how to use it in detail.
If you don't know or have no boost, you can go to www.boost.org to download the latest version, the author is using the 1.30 version. Will download the ZIP package [1] to any directory you like, such as D: / boost.
Compile regular expression library
As mentioned earlier, this library needs to be compiled separately, why don't you compile it together? Mainly considering that different compilers require different link library files and link libraries too big. Under the command line, enter the [% Boost] / libs / regex / build directory, "directly into the make -fbcb6.mak command start compile, please pay attention, if you install BCB5 on your computer, please be sure to put Path is set to become the directory where BCC32.exe program is located, otherwise it may use the Make program of BCB5, so although it can be compiled but finally cannot be used.
When the compilation process is time consuming, you need to wait patiently, finally compile, will generate a bcb6 directory in the [% boost] / lib / regex / build directory, generate a lot of lib files and DLL files in this directory, copy all DLL files Go to the Windows system directory, so the lib file is copied to the BCB6 / lib directory. If you don't want to copy the file, you can join the install parameters when compile, just like this make -fbcb6.mak install, but the author is more like the previous way, so I can know what files have been generated. Now compiled has been completed, you can reflect the magical charm of Boost.
A test program
Create a Console program in BCB6, write the following code:
#include
#include
#include
#include
int main ()
{
USING NAMESPACE BOOST;
Using namespace std;
Regex expression ("// s href // s * = /// s * /" ([^ / "] *) /" ", regbase :: normal | regbase :: ics);
Deque
Copy (Result.egin (), Result.end (), Ostream_iterator
INT C;
CIN >> C;
Return 0;
}
Set the lib path and include Path of the BCB6 Project property to install the directory of your boost, run you will see the results:
ind
EX.html
You can see that INDEX.html has already raised from the string, then why is this this?
The core part of the code is:
Regex expression ("// s href // s * = /// s * /" ([^ / "] *) /" ", regbase :: normal | regbase :: ics);
It is used to set up how to match the string, and the above mess is very difficult to understand. If you don't understand the writing rules of the regular expression, the above code can be compared with the sky.
Regbase :: Normal | Regbase :: ics is the parsing parameter setting, and you can refer to the Boost help documentation.
Writing rules for regular expressions
Specific writing rules, you can see Boost's documentation, I will make a brief description here:
(DOT)
Used to match any character, but does not include characters on the new line
*
Closed bag, any limited connection
Finitely repeatedly connected, but at least once
{}
Specify possible number of repetitions
E.g:
BA * Match B Ba Baa Baaa, etc.
Ba matching Ba Baa Baaaaaaaa, etc.
BA {1,5} matches Ba Baa Baaa Baaa Baaaaa
/
Side characters, there are many purposes, vary depending on the parameter setting, the most common is similar to the C language / usage
/ s
Match space
/ w
Match a word
/ d
Matching numbers
()
There are two usage:
1 is the role of the merge, such as (ab) * matching AB ABAB ABABAB, etc.
2 is to determine the match, that is, the characters in () will be finally dismantled.
According to the above table, we can easily know how to explain the day in front.
An actual example
There is a post on 9CBS a few more time, and the problem is that there is a file structure (similar):
@People {
AGE = 19
Speek = "hay, {name}, how are you"
}
Ask how to split the string to get the name of the @, = the name of the attribute name and attribute value, the name of the {} in the quotation marks.
Solving this problem with the regular expression is still right.
According to the analysis, we can construct match rules like this:
"@ (. *?) / s * // {" match @ Start character creation, the two types of two types How to construct match rules for everyone to think.
This way we can easily dismantle this example.
Performance analysis
Through the discussion above, everyone has already learned the powerful power of Boost, what is the performance? To do this, we rectified a complex HTML code to see how much time it takes to see it.
In order to save space, it does not list HTML code, but you can tell you that this is a HTML file with a Word generated by 186K. This file is used in this file. This file is used in this file. So I will take it here. A width property of all