There is a HTML file, there are many web letters references to relative addresses, for example, it is necessary to change it to absolute address, such as www.codeproject.com/image/logo.gif />. Since the source file is not small, there is a need for a quick and resource occupancy.
?
The first reaction I at the time is to use regular expressions because of other ways, such as direct processing strings or using XMLReader / XMLWriter, etc., the regular expression speed and resource usage is best. Therefore, after careful consideration, I wrote the following code:
?
Using system;
Using system.text.regularExpressions;
Using system.io;
Class Evaldemo
{
Public Static String GetResult (Match M)
{
Return regex.replace (M.Value, @ "[/]", @ "http://www.codeproject.com/");
}
}
Class Demo
{
Public static void main ()
{
StreamReader Reader = file.opentext ("Source.txt");
String source = reader.readToend ();
Reader.Close ();
String reg = @ "HREF / S * = / S * [" "" "? /";
String Result = regex.replace (Source, Reg, New Matchevaluator (Evaldemo.getResult);
Console.writeline (Result);
}
}
?
Among them, the Source.txt file placed in the same directory saves the HTML source file that needs to be modified. Of course, you can also put it in other directories, or use different ways, such as by httpResponse to acquire the file. Stream.
?
After opening the file and reads in memory, I use the streamReader.ReadToEnd method to get the stream content string representation, and then operate this string with the regular expression. Maybe you will ask, directly through all href = "/ replacement into href =" www.codeproject.com/ can not be? This is ok, but if the HTML code becomes href = / then cannot be processed, in addition, because HTML is allowed to leave space between the valid character, if the user writes the code into href =? /, Then you This situation must also take into account this situation (although it will happen) so your string.replace method will become complicated, and the application formal expression will handle this problem will be simple, please see below This line of code:
?
String reg = @ "HREF / S * = / S * [" "" "? /";
?
If you haven't been exposed to regular expressions before you haven't been exposed, it will feel complicated. In fact, as long as I explain the content represented by the people, you will find that there is no mysterious expression of the regular expression:
?
character
meaning
/ s matches a blank character, such as full-width space, half-width space, tab, etc. * indicates that it can have zero or more match, that is, it can be no space or more Space [] matches the single character [""] matches the two characters: Dual quotation marks or single quotes. Since the double quotes have special meanings in C #, they need to represent a single double quotation in the form of two double quotes? Indicates that its previous item (["" ") can have one or more matching, similar to *'s usage?
With the above explanation, then see that the regular expression is not very simple? In fact, it is to find a string similar to hREF = "/, which may contain some blank characters or optional characters. Ok, then look down.
?
String Result = regex.replace (Source, Reg, New Matchevaluator (Evaldemo.getResult);
?
This is the usage of standard regular expression replacement operations, where the first two parameters are used very clear, and the third parameter is special: an instance of a Matchevaluator entrusted here, and passed into a callback method - Evaldemo The address of the .getResult. Why do you design this? In fact, the general regular expression replacement operation does not need to be complex, you only need to specify the source to replace, the characters to replace, and the replaceable characters, but in this example situation - you can't be exact What is the character to be replaced! Not? May be href = "/, may also be href = '/, may even be href ?? = ??? /, so you need to process this changed multi-end string through a callback function, after processing the processing result For example, for href = "/, the processing result will be href =" / www.codeproject.com, and the HREF = '/ processing result is href =' / www.codeproject. After understanding this, please Let's take a look at this simple and interesting callback method:
?
Class Evaldemo {??? public static string getResult (match m) ??? {??????? Return Regex.replace (m.Value, @ "[/]", @ "http://www.codeproject .com / "); ???}}
?
Here, the callback method is actually a static method of a custom EvalDemo class, which accepts a Match class instance, which can be processed by Match.Value, which can be processed, and then process the string. Therefore, in the getResult method, I used a basic regular expression replacement operation to find all / (slashes) characters in the string you need to process, replace it into http://www.codeproject.com, Then return the result. In this way, the replacement operation in Main has obtained the required third parameter, and the replacement operation can be performed.
?
Finally, in order to demonstrate, I printed the final result (ie, the modified HTML file content) to the Console. You can choose whether to print, or save it to the file.
?
OK, an application of a less complex regular expression explains here, I hope to take this article to bring you into the door of the regular expression. If you have any opinions or opinions on this article, welcome to come out!