Use BOOST regular expression library on C ++ Builder6

zhaozj2021-02-16 94

Use BOOST regular expression library on C Builder6

Written: aw Ready

Regular expression is a mode matching form that is usually used in a text program in which it is processed. For example, the GREP tool we often use, or the Perl language uses a regular expression. Traditional C treatment regular expressions are very troublesome, which also has many other language fans's laughter, now the situation is different, because there is Boost.

Boost is a Template-based development source code library. There are many sub-libraries in this library to efficiently handle all aspects of issues, such as string split, format, thread, etc., Boost is for every C enthusiast. It should be understood that if you can use Boost if you can use the VCL in the case of skilled use of VCL, I want to be like a tiger.

In general, it is very simple to use Boost, and there is not much difference in using other STL libraries, but the regular expression library using Boost is not so easy, because this library also needs to be compiled separately, I will detail how to use it in detail.

If you don't know or have no boost, you can go to www.boost.org to download the latest version, the author is using the 1.30 version. Will download the ZIP package [1] to any directory you like, such as D: / boost.

Compile regular expression library

As mentioned earlier, this library needs to be compiled separately, why don't you compile it together? Mainly considering that different compilers require different link library files and link libraries too big. Under the command line, enter the [% Boost] / libs / regex / build directory, "directly into the make -fbcb6.mak command start compile, please pay attention, if you install BCB5 on your computer, please be sure to put Path is set to become the directory where BCC32.exe program is located, otherwise it may use the Make program of BCB5, so although it can be compiled but finally cannot be used.

When the compilation process is time consuming, you need to wait patiently, finally compile, will generate a bcb6 directory in the [% boost] / lib / regex / build directory, generate a lot of lib files and DLL files in this directory, copy all DLL files Go to the Windows system directory, so the lib file is copied to the BCB6 / lib directory. If you don't want to copy the file, you can join the install parameters when compile, just like this make -fbcb6.mak install, but the author is more like the previous way, so I can know what files have been generated. Now compiled has been completed, you can reflect the magical charm of Boost.

A test program

Create a Console program in BCB6, write the following code:

#include

int main ()

{

USING NAMESPACE BOOST;

Using namespace std;

Regex expression ("// s href // s * = /// s * /" ([^ / "] *) /" ", regbase :: normal | regbase :: ics);

String s = " ";

Deque Result; Regex_Split (std :: back_inserter (result), s, expression;

Copy (Result.egin (), Result.end (), Ostream_iterator (Cout, "/ N"));

INT C;

CIN >> C;

Return 0;

}

Set the lib path and include Path of the BCB6 Project property to install the directory of your boost, run you will see the results:

ind

EX.html

You can see that INDEX.html has already raised from the string, then why is this this?

The core part of the code is:

Regex expression ("// s href // s * = /// s * /" ([^ / "] *) /" ", regbase :: normal | regbase :: ics);

It is used to set up how to match the string, and the above mess is very difficult to understand. If you don't understand the writing rules of the regular expression, the above code can be compared with the sky.

Regbase :: Normal | Regbase :: ics is the parsing parameter setting, and you can refer to the Boost help documentation.

Writing rules for regular expressions

Specific writing rules, you can see Boost's documentation, I will make a brief description here:

(DOT)

Used to match any character, but does not include characters on the new line

Closed bag, any limited connection

Finitely repeatedly connected, but at least once

{}

Specify possible number of repetitions

E.g:

BA * Match B Ba Baa Baaa, etc.

Ba matching Ba Baa Baaaaaaaa, etc.

BA {1,5} matches Ba Baa Baaa Baaa Baaaaa

Side characters, there are many purposes, vary depending on the parameter setting, the most common is similar to the C language / usage

/ s

Match space

/ w

Match a word

/ d

Matching numbers

()

There are two usage:

1 is the role of the merge, such as (ab) * matching AB ABAB ABABAB, etc.

2 is to determine the match, that is, the characters in () will be finally dismantled.

According to the above table, we can easily know how to explain the day in front.

An actual example

There is a post on 9CBS a few more time, and the problem is that there is a file structure (similar):

@People {

AGE = 19

Speek = "hay, {name}, how are you"

}

Ask how to split the string to get the name of the @, = the name of the attribute name and attribute value, the name of the {} in the quotation marks.

Solving this problem with the regular expression is still right.

According to the analysis, we can construct match rules like this:

"@ (. *?) / s * // {" match @ Start character creation, the two types of two types How to construct match rules for everyone to think.

This way we can easily dismantle this example.

Performance analysis

Through the discussion above, everyone has already learned the powerful power of Boost, what is the performance? To do this, we rectified a complex HTML code to see how much time it takes to see it.

In order to save space, it does not list HTML code, but you can tell you that this is a HTML file with a Word generated by 186K. This file is used in this file. This file is used in this file. So I will take it here. A width property of all

tags. The test code is as follows: #include

#include

int main ()

{

USING NAMESPACE BOOST;

Using namespace std;

Tstringlist * html = new tstringlist ();

HTML-> LoadFromFile ("D: //1.htm");

Regex expression ("// s width = ([^ /"] *) / s ", regbase :: normal | regbase :: ics);

DWORD START = GetTickCount ();

For (int N = 0; n count; n )

{

String s = html-> strings [n] .c_str ();

Deque Result;

Regex_split (std :: back_inserter (result), s, expression;

Copy (Result.egin (), Result.end (), Ostream_iterator (Cout, "/ N"));

RESULT.CLEAR ();

}

START = GetTickCount () - START;

Delete HTML;

Cout << start;

INT C;

CIN >> C;

Return 0;

}

The output result is 671 milliseconds, and the split gets 1072 width attribute values. We can see that the efficiency of Boost is very high, although it is more resolved with some corner of the language or slow, but it is already possible to meet most of the programming requirements. In addition, the author's computer configuration is not very high. I believe that it will be superior to the author's results on any of the mainstream configurations.

Conclude

In fact, the strong power above is only the Boost's iceberg. If you don't experience it yourself, you can't imagine the power of boost. There are also many libraries in Boost, such as formatting output, string disassembly, type conversion, etc. These libraries are more convenient, and everyone can refer to the Boost document. In these libraries, there are two libraries to compile themselves, they are Python and Thread libraries, and these libraries need special tool JAM, so we need to compile JAM tools when compiling these libraries, and compile JAM tools. Not a happy thing, the trouble also appears if you have installed multiple compilers, if the reader is interested, you can try it yourself.

However, BCB6 does not support all BOOST libraries, from Boost's compiler support tables to see [2], BCB6 still has a quite library does not support, support the best of GCC / G compiler, but not All support. I hope that the next C compiler that Borland will be released can support more C standards.

[1] There is actually other types of packages, but under the Windows system, you better download zip packages.

转载请注明原文地址:https://www.9cbs.com/read-25669.html

9cbs

New Post(0)