Regular expression

zhaozj2021-02-12  171

The first part: ---------------- Regular expression (RES) is often mistakenly considered to be a mysterious language that is only a few people understand. On the surface, they do look messy, if you don't know its grammar, then its code is just a bunch of text garbage in your eyes. In fact, the regular expression is very simple and can be understood. After reading this article, you will know the general syntax of the regular expression. The regular expressions supporting a variety of platforms were first proposed by mathematician Stephen Klene in 1956. He is proposed on the basis of the increasing research results of natural languages. Regular expressions with full syntax use in terms of format matching of characters, later applied to the field of melting information technology. Since then, the regular expression has been developed through several periods, and the current standard has been approved by ISO (International Standards Organization) and is identified by the Open Group organization. Regular expressions are not a dedicated language, but it can be used to find and replace text in a file or character. It has two standards: basic regular expressions (BRE), extended regular expressions (ERE). ERE includes BRE function and other concepts. Regular expressions are used in many programs, including XSH, EGREP, SED, VI, and programs under UNIX platforms. They can be adopted in many languages, such as HTML and XML, which is usually only a subset of the entire standard. It is more integrated with the development of the program language that you imagine, and the regular expression is transplanted to the cross-platform program language, this function is also increasingly complete and useful. The search engine on the network uses it, the E-mail program is also used, even if you are not a UNIX programmer, you can also use rule language to simplify your program and shorten your development time. Regular expressions 101 Many regular expressions of expression seem to be similar, because you have not studied them before. Wildcard is a structural type of RE, that is, repeated operation. Let's take a look at the most common basic syntax type of ERE standard. In order to provide examples of specific purposes, I will use several different programs.

Part II: ---------------------- Character matching regular expressions The key to the expression is to determine what you want to match, if there is no concept, RES will be useless. Each expression contains instructions that you need to find, as shown in Table A. Table A: Character-Matching Regular Expressions Format Description: --------------- Operation: Explanation: Example: Result: ---------------- .Match any one charactergrep .ord sample.txt will match "ford", "lord", "2ord", etc. In the file sample.txt. ----------------- [] Match Any One Character Listed Between The Bracketsgrep [CNG] ORD Sample.txtWill Match Only "Cord", "Nord", And "gord" ------------------- - [^] Match Any One Character Not listed Between The BracketSgrep [^ cn] ORD Sample.txtWill Match "Lord", "2ORD", etc. But not "Cord" OR "NORD" GREP [A-ZA-Z] ORD Sample.txtWill Match "Aord", "Bord", "Aord", "Bord", etc.grep [^ 0-9] ORD Sample.txtWill Match "Aord", "Aord", etc. But not "2ORD" , ETC. Repeat operator repeat operators, or quantity words, all describe the number of times to find a specific character. They are often used in character matching syntax to find multi-line characters, see table B.

Table B: Regular Expression Repetition Operators format Description: --------------- Operation: Explanation: Example: Result: ---------------? Match Any Character One Time, IF IT EXISTSEGREP "? Erd" Sample.txtwill Match "Berd", "Herd", ETC. and "ERD" ------------------ * Match Declared Element Multiple Times, IF IT EXISTSEGREP "N. * RD" Sample.txtWill Match "Nerd", "NRD", "NEARD", ETC .----------------- - Match Declared Element One or More Timesegrep "[N] Erd" Sample.txtwill Match "Nerd", "Nnerd", ETC., But Not "Erd" ------------- ----- {n} match declared element exactly n timesegrep "[AZ] {2} ERD" Sample.txtWill Match "Cherd", "Blerd", etc. but not "Nerd", "ERD", " Buzzerd ", ETC. ---------------------- {n,} match declared element at least n timesegrep". {2,} ERD "Sample. TXTWILL MATCH "CHERD" and "buzzerd", but not "Nerd" ---------------------- {n, n} match declared element at Least N Timesegrep "N [e] {1,2} rd" Sample.txtwill Match "Nerd" and "Neerd" Part III: --------------- - Anchor anchor refers to the format it to match, as shown in Figure C. Use it to make it easy for you to find a merger of universal characters.

For example, I use the VI line editor command: s represents substeute, the basic syntax of this command is: s / pattern_to_match / pattern_to_substitute / Table C: Regular Expression Anchors ------------ Operation Explanation Example Results --------------- ^ match at the beginning of a lines / ^ / Blah / Inserts "Blah" at the beginning ---------- ----- $ Match At The end of a lines / $ / black / inserts "Blah" at the end of the line --------------- / Match at the end of a words //> / black / inserts "Blah" at the end of the word "/> blah" Sample.txtmatches "Soupblah", ETC .----------- ---- / Bmatch at the beginning of end of a Wordegrep "/ bblah" Sample.txtMatches "Blahcake" and "countblah" --------------- / bmatch in the middle Of a WordEGrep "/ bblah" Sample.txtMatches "SUBLAHPER", ETC. Interval RES is the interval (or insert) symbol. In fact, this symbol is equivalent to an OR statement and represents | symbols. The following statement returns "Nerd" and "Merd" handle in the file sample.txt: egrep "(n | m) Erd" Sample.txt interval is very powerful, especially when you look for different spelling, but you The same results can be obtained in the following example: egrep "[nm] ERD" Sample.txt When you use the interval function to connect to the advanced features of the RES, its truly use is more reflected. Part IV: ---------------- Some of the last most important features of some reserved Characters RES are reserved characters (also known as specific characters). For example, if you want to find the characters of "NE * RD" and "Ni * RD", the format matching statement "n [ei] * rd" is in line with "NeeeeerD" and "NieieierD", but it is not you want to find. character. Because '*' (asterisk) is a reserved character, you must replace it with a backslash symbol, namely: "n [ei] / * rd". Other reserved characters include: ^ (CARAT) [(LEFT BRACKET} $ (LEFT PARENTHESIS) (PIPE) * (PLUS SYMBOL)? (Question Mark) {(Left Curly Bracket, or Left Brace) / Backslash Once you put the above characters in your character search, there is no doubt that RES has become very difficult. For example, the following PHP in EREGI search engine code It's hard to read.

EREGI ("^ [_ a-z0-9 -] (/. [_ a-z0-9 -] ) * @ [A-Z0-9 -] (/. [A-Z0-9 -] ) * $ ", $ Sendto) You can see that the intent of the program is difficult to grasp. But if you leave the reserved character, you often mistakenly understand the meaning of the code. Summary In this article, we unveiled the mystery of regular expressions and list the general syntax of the ERE standard. If you want to read the full description of the rules of the Open Group organization, you can see: Regular Expressions, you are welcome to express your questions or views in the discussion district. Another article ---------------------------------------- Regular expression and Java programming Language ---------------------------------------- Class and Methods The following class is based on regular Expression specified mode, matches the character sequence. An instance of the Pattern class Pattern class represents the regular expression specified in the form of strings, which is similar to the syntax used by Perl. The regular expression specified in the form of a string must be compiled into an instance of the Pattern class. The generated mode is used to create a Matcher object, which matches any character sequence according to the regular expression. Multiple matches can share a mode because it is unique. Compile a given regular expression into a mode with the Compile method, then create a match with the Matcher method, which will match the given input according to this mode. The Pattern method returns the regular expression used by compiling this mode. The SPLIT method is a convenient method that cuts a given input sequence separately in a location that matches this mode. The following example demonstrates: / ** Separates the input string separated by commas and / or spaces with split. * / import java.util.regex. *; public class splitter {public static void main (string [] args) throws exception {// create a pattern to match breakspattern P = pattern.compile ("[, // s] "); // split input with the patternString [] Result = P.Split (" One, Two, Three Four, Five "); for (int i = 0; i

For example, in the string Blahcatblahcatblah, the first AppendReplacement adds Blahdog. The second appendreplacement adds Blahdog, then AppendTail adds Blah, generated: Blahdogblahdogblah. See the example of a simple word. The Charsequence interface Charsequence interface provides a unified read-only access to many different types of characters. You provide data to search from different sources. Using String, StringBuffer, and Charbuffer, charsequence, so you can easily get the data to search from them. If these available data sources are not suitable, you can write your own input source by implementing a Charsequence interface. Regex Scenario The following code example demonstrates Java.util.Regex packages Usage in a variety of common situations: Simple Word Replace / ** This Code Writes "One Dog, Two dogs in the yard." * To the standard output stream: * / import java.util.regex *; public class Replacement {public static void main (String [] args) throws Exception {// Create a pattern to match catPattern p = Pattern.compile ( "cat");. // Create a matcher with an input stringmatcher m = p.matcher ("One cat," "two cats in the yard"); stringbuffer sb = new stringbuffer (); boolean results = m.find (); // loop THROUGH AND CREATE A NEW STRING / / WITH THE ReplacementSwhile (SB, "DOG"); Result = m.Find ();} // add the last segment of infut to // The new stringm. Appendtail (sb); system.out.println (sb.tostring ());}} Email Confirm The following code is an example: You can check that some characters are not an email address. It is not a complete, suitable for all possible email confirmations, but you can add it when you need it.

/ ** Checks for invalid characters * in email addresses * / public class EmailValidation {public static void main (String [] args) throws Exception {String input = "@ sun.com"; // Checks for email addresses starting with // INAPPRIATE SYMBOLS LIKE DOTS OR @ signs.pattern P = pattern.compile ("^ //. | ^ @"); matcher m = p.matcher (input); if (m.find ()) System.err. Println ("Email Addresses Don't start" "with dots or @ signs."); // checks for email address it start with // www. andprints a message if it@p= pattern.compile ("^ Www //. "); m = p.matcher (input); if (m.find ()) {system.out.println (" email address " " with / "www./", ONLY Web pages do. ");} p = pattern.Compile (" [^ a-za-z0-9 /////@_//- ~ #] "); m = p.matcher (input); StringBuffer sb = new StringBuffer (); boolean result = m.find (); boolean deletedIllegalChars = false; while (result) {deletedIllegalChars = true; m.appendReplacement (sb, ""); result = m.find ();} // add the last segment of infut to the new stringm.Appen DTail (SB); Input = sb.toString (); if (deletedillegalchars) {system.out.println ("IT Contained IncorRect Characters", Such as Spaces or Commas. ");}}} Remove the control from the file character / * This class removes control characters from a named * file * / import java.util.regex *;. import java.io *;. public class Control {public static void main (String [] args) throws Exception {/ / Create a file object with the file name // in the argument: file fin = new file ("filename1"); File Fout = New File ("filename2");

// Open and input and output streamFileInputStream fis = new FileInputStream (fin); FileOutputStream fos = new FileOutputStream (fout); BufferedReader in = new BufferedReader (new InputStreamReader (fis)); BufferedWriter out = new BufferedWriter (new OutputStreamWriter (fos)) ; // the pattern matches control characterspattern p = pattern.Compile ("{cntrl}"); matcher m = p.matcher (""); string aline = null; while ((aline = in.readline ())! = NULL) {m.reset (aline); // Replaces Control Characters with An Empty // String.String Result = M.ReplaceAll (""); out.write (result); out.newline ();} in.close (); out.close ();}} file lookup / ** prints out the comments found in a .java file. * / Import java.util.regex. *; import java.io. *; import java.nio. *; import java.nio.charset *;. import java.nio.channels *;. public class CharBufferExample {public static void main (String [] args) throws Exception {// Create a pattern to match commentsPattern p = Pattern.compile ("//.* $", pattern.multiline; // Get a channel for the Source Fi leFile f = new File ( "Replacement.java"); FileInputStream fis = new FileInputStream (f); FileChannel fc = fis.getChannel (); // Get a CharBuffer from the source fileByteBuffer bb = fc.map (FileChannel.MAP_RO, 0, (int) fc.size ()); charset cs = charset.Forname ("8859_1"); charsetdecoder cd = cs.newdecoder (); charbuffer cb = cd.decode (bb); // Run Some matchesmatcher m = P.Matcher (cb); while (m.find ()) System.out.println ("Found Comment:" m.Group ());}} The conclusion Now pattern matching in the Java programming language and many other programming languages The same flexible. Regular expressions can be used in applications to ensure that data is correct before entering the database or sends to the application, and the regular expression can also be used for a wide variety of management. In short, in Java programming, regular expressions can be used in any way that needs mode matching.

FLOATER Edited on 2003-04-21 23:16

Reposted JDK1.4 formal representation Written by William Chen (06/19/2002) ------------------------------- -------------------------------------------------what It is a formal representation, which is for the file, string, through a very special representation, Search and Replace, because there are many system settings on UNIX, are stored in text files, so network management or programming is often It is necessary to search and replace, so developing a special command is called a formal representation. We can use "S / LT; ​​/ G" to convert all the "<" of the "<" in the string into "S / LT; ​​/ G". LT; "Therefore, JDK1.4 provides a set of formal representative packages for your use of JDK1.4 or less than Http://jakarta.apache.org/oro gets a string symbol that is just listed. "S /

转载请注明原文地址:https://www.9cbs.com/read-7430.html

New Post(0)