Regular expression

zhaozj2021-02-16 93

The first part: ---------------- Regular expression (RES) is often mistakenly considered to be a mysterious language that is only a few people understand. On the surface, they do look messy, if you don't know its grammar, then its code is just a bunch of text garbage in your eyes. In fact, the regular expression is very simple and can be understood. After reading this article, you will know the general syntax of the regular expression. The regular expressions supporting a variety of platforms were first proposed by mathematician Stephen Klene in 1956. He is proposed on the basis of the increasing research results of natural languages. Regular expressions with full syntax use in terms of format matching of characters, later applied to the field of melting information technology. Since then, the regular expression has been developed through several periods, and the current standard has been approved by ISO (International Standards Organization) and is identified by the Open Group organization. Regular expressions are not a dedicated language, but it can be used to find and replace text in a file or character. It has two standards: basic regular expressions (BRE), extended regular expressions (ERE). ERE includes BRE function and other concepts. Regular expressions are used in many programs, including XSH, EGREP, SED, VI, and programs under UNIX platforms. They can be adopted in many languages, such as HTML and XML, which is usually only a subset of the entire standard. It is more integrated with the development of the program language that you imagine, and the regular expression is transplanted to the cross-platform program language, this function is also increasingly complete and useful. The search engine on the network uses it, the E-mail program is also used, even if you are not a UNIX programmer, you can also use rule language to simplify your program and shorten your development time. Regular expressions 101 Many regular expressions of expression seem to be similar, because you have not studied them before. Wildcard is a structural type of RE, that is, repeated operation. Let's take a look at the most common basic syntax type of ERE standard. In order to provide examples of specific purposes, I will use several different programs.

Part II: ---------------------- Character matching regular expressions The key to the expression is to determine what you want to match, if there is no concept, RES will be useless. Each expression contains instructions that you need to find, as shown in Table A. Table A: Character-Matching Regular Expressions Format Description: --------------- Operation: Explanation: Example: Result: ---------------- .Match any one charactergrep .ord sample.txt will match "ford", "lord", "2ord", etc. In the file sample.txt. ----------------- [] Match Any One Character Listed Between The Bracketsgrep [CNG] ORD Sample.txtWill Match Only "Cord", "Nord", And "gord" ------------------- - [^] Match Any One Character Not listed Between The BracketSgrep [^ cn] ORD Sample.txtWill Match "Lord", "2ORD", etc. But not "Cord" OR "NORD" GREP [A-ZA-Z] ORD Sample.txtWill Match "Aord", "Bord", "Aord", "Bord", etc.grep [^ 0-9] ORD Sample.txtWill Match "Aord", "Aord", etc. But not "2ORD" , ETC. Repeat operator repeat operators, or quantity words, all describe the number of times to find a specific character. They are often used in character matching syntax to find multi-line characters, see table B.

Table B: Regular Expression Repetition Operators format Description: --------------- Operation: Explanation: Example: Result: ---------------? Match Any Character One Time, IF IT EXISTSEGREP "? Erd" Sample.txtwill Match "Berd", "Herd", ETC. and "ERD" ------------------ * Match Declared Element Multiple Times, IF IT EXISTSEGREP "N. * RD" Sample.txtWill Match "Nerd", "NRD", "NEARD", ETC .----------------- - Match Declared Element One or More Timesegrep "[N] Erd" Sample.txtwill Match "Nerd", "Nnerd", ETC., But Not "Erd" ------------- ----- {n} match declared element exactly n timesegrep "[AZ] {2} ERD" Sample.txtWill Match "Cherd", "Blerd", etc. but not "Nerd", "ERD", " Buzzerd ", ETC. ---------------------- {n,} match declared element at least n timesegrep". {2,} ERD "Sample. TXTWILL MATCH "CHERD" and "buzzerd", but not "Nerd" ---------------------- {n, n} match declared element at Least N Timesegrep "N [e] {1,2} rd" Sample.txtwill Match "Nerd" and "Neerd" Part III: --------------- - Anchor anchor refers to the format it to match, as shown in Figure C. Use it to make it easy for you to find a merger of universal characters. For example, I use the VI line editor command: s represents substeute, the basic syntax of this command is: s / pattern_to_match / pattern_to_substitute / Table C: Regular Expression Anchors ------------ Operation Explanation Example Results --------------- ^ match at the beginning of a lines / ^ / Blah / Inserts "Blah" at the beginning ---------- ----- $ Match At The end of a lines / $ / black / inserts "Blah" at the end of the line --------------- /

Matches "Blahfield", ETC .------------------ /> Match At the end of a words //> / blah / inserts "Blah" at the end of the wordgrep "/> Blah" Sample.txtmatches "Soupblah", ETC .-------------- / bmatch at the beginning beginning or end of a Wordegrep "/ bblah" Sample.txtMatches "Blahcake" and "Countblah" ---------------- / bmatch in the middle of a Wordegrep "/ bblah" Sample.txtMatches "SUBLAHPER", ETC. Another convenient in the interval RES It is a spacing (or insert) symbol. In fact, this symbol is equivalent to an OR statement and represents | symbols. The following statement returns "Nerd" and "Merd" handle in the file sample.txt: egrep "(n | m) Erd" Sample.txt interval is very powerful, especially when you look for different spelling, but you The same results can be obtained in the following example: egrep "[nm] ERD" Sample.txt When you use the interval function to connect to the advanced features of the RES, its truly use is more reflected. Part IV: ---------------- Some of the last most important features of some reserved Characters RES are reserved characters (also known as specific characters). For example, if you want to find the characters of "NE * RD" and "Ni * RD", the format matching statement "n [ei] * rd" is in line with "NeeeeerD" and "NieieierD", but it is not you want to find. character. Because '*' (asterisk) is a reserved character, you must replace it with a backslash symbol, namely: "n [ei] / * rd". Other reserved characters include: ^ (CARAT) [(LEFT BRACKET} $ (LEFT PARENTHESIS) (PIPE) * (PLUS SYMBOL)? (Question Mark) {(Left Curly Bracket, or Left Brace) / Backslash Once you put the above characters in your character search, there is no doubt that RES has become very difficult. For example, the following PHP in EREGI search engine code It's hard to read. EREGI ("^ [_ a-z0-9 -] (/. [_ A-z0-9 -] ) * @ [A-Z0-9 -] (/. [A-Z0 -9 -] ) * $ ", $ sendto) You can see that the intent of the program is difficult to grasp. But if you leave the reserved character, you often mistakenly understand the meaning of the code. Summary in this article, we unveiled The mystery of the regular expression is opened, and the ERE standard is collected. If you want to read the full description of the rules of the Open Group organization, you can see: Regular Expressions, welcome to publish your question in the discussion area. Or view. Another article -------------------------------------- Regular expression And Java programming language --------------------------------------- Class and method The class matches the character sequence according to the mode specified by the regular expression. An instance of the character sequence is indicated by the regular expression specified in the form of string, and its syntax is similar to the syntax used by Perl. The regular expression specified by the string form You must compile an example of a pattern class.

The generated mode is used to create a Matcher object, which matches any character sequence according to the regular expression. Multiple matches can share a mode because it is unique. Compile a given regular expression into a mode with the Compile method, then create a match with the Matcher method, which will match the given input according to this mode. The Pattern method returns the regular expression used by compiling this mode. The SPLIT method is a convenient method that cuts a given input sequence separately in a location that matches this mode. The following example demonstrates: / ** Separates the input string separated by commas and / or spaces with split. * / import java.util.regex. *; public class splitter {public static void main (string [] args) throws exception {// create a pattern to match breakspattern P = pattern.compile ("[, // s] "); // split input with the patternString [] result = ?? P.Split (" One, Two, Three Four, Five "); for (int i = 0; isystem.out.println (Result [i]) The instance of the Matcher class Matcher class is used to match the character sequence according to a given string sequence mode. Use the CharSequence interface to provide the input to the match to support the matching of characters from a variety of diverse input sources. Pass Call a Matcher method of a pattern, generate a match from this mode. After the match is created, you can use it to perform three different matching operations: Matches method Attempt to match the entire input sequence according to this mode. LookingAT method Attempt According to this mode, the input sequence is matched from the beginning. The Find method will scan the input sequence, look for the next place matching the pattern. These methods will return a Boolean value that represents success or failure. If the match is successful, through query matching State, you can get more information This class also defines a method of replacing a sequence of matching sequences with a new string. If necessary, you can derive from the matching result. TheappendReplacement method first adds a string from current from the current Location to all characters between the next matching position, then add a replacement value. The AppendTail added is starting from the last matching position until the end of the end. For example, in the string Blahcatblahcatblah, the first AppendReplacement Add Blahdog. The second appendreplacement adds Blahdog, then AppendTail adds Blah, generates: BlahdogblahDogblah. See the example Simple Word Replace. CHARSEQUENCE Interface Charsequence interface provides unified read-only access to many different types of characters. You provide Data to search from different sources. String, StringBuffer and CH ARBuffer implements Charsequence, so you can easily get the data to search from them. If these available data sources are not suitable, you can write your own input source by implementing a Charsequence interface.

Regex Scenario The following code example demonstrates Java.util.Regex packages Usage in a variety of common situations: Simple Word Replace / ** This Code Writes "One Dog, Two dogs in the yard." * To the standard output stream: * / import java.util.regex *; public class Replacement {public static void main (String [] args) ?????? throws Exception {// Create a pattern to match catPattern p = Pattern.compile. ("cat"); // Create a matcher with an input stringmatcher m = p.matcher ("one cat," ???? "Two cats in the yard"); stringbuffer sb = new stringbuffer (); Boolean Result = m.Find (); // loop through and create a new string // with the replacementSwhile (result) {M.AppendReplacement (SB, "DOG"); result = m.find ();} // add the last Segment of infut to // the new stringm.appendtail (sb); system.out.println (sb.tostring ());}} Email Confirm The following code is an example: You can check some characters is an email address. It is not a complete, suitable for all possible email confirmations, but you can add it when you need it.

/ ** Checks for Invalid Characters * in email addresses * / public class email mainidation {public static void main (string [] args) ?????????? throws exception {?????????? String INPUT = "@ sun.com"; // Checks for Email Addresses Starting with // INAPPROPRIATE SYMBOLS LIKE DOTS OR @ Signs.Pattern P = Pattern.Compile ("^ //. | ^ // @"); Matcher M = p.matcher (input); if (m.find ()) System.err.Println ("Email Addresses Don't start" ???????? "with dots or @ signs."); / / Checks for email addresses That Start with // www. Andprints a message if it@p= pattern.compile ("^ www //."); M = p.matcher (input); if (m.find )) {System.out.println ("email addresses don't start" "with /"www./", Only Web Pages Do. ");} P = pattern.Compile (" [^ a-zA -z0-9 ////@_//- ~ #] "); m = p.matcher (input); stringbuffer sb = new stringbuffer (); boolean results = m.find (); boolean deletedillegalchars = False; while (result) {deletedIllegalchars = true; m.AppendReplacement (SB, "); result = m.find ();} // add the last segment of infut to the new stringm.appen DTAIL (SB); Input = sb.toString (); if (deletedillegalchars) {system.out.println ("IT Contained IncorRect Characters" ?????? ", Such as Spaces or Commas.");}} } Remove the control character from the file / * this class removes control character. * / Import java.util.regex. *; Import java.io. *; Public class control {public static void main (String [] ARGS) ?????????? throws exception {?????????? // Create a file object with the file name // in the argument: file fin = new file ("filename1" FILE Fout = New File ("FileName2");

// Open and input and output streamFileInputStream fis = ?????? new FileInputStream (fin); FileOutputStream fos = ?????? new FileOutputStream (fout); BufferedReader in = new BufferedReader (???? new InputStreamReader ( Fis)); bufferedwriter out = new bufferedWriter (???? New OutputStreamWriter (FOS)); // the pattern matches control charactern.compile ("{cntrl}"); matcher m = p.matcher ("" " String aline = null; while ((aline = in.readLine ())! = Null) {m.reset (aline); // Replaces Control Characters with an an es Empty // String.String Result = m.replaceAll (" "); out.write (result); out.newline (); out.close (); out.close ();}} file search / ** prints out the comments found in a .java file. * / import. * / import. Java.util.regex. *; import java.io. *; import java.nio. *; import java.nio.charset. *; import java.nio.channels. *; public class charbufferexample {public static void main (String [] args) throws exception {// create a pattern to match match.compattern P = pattern.Compile ("//.* $", pattern.multiline); // Get a channel for the Source Fi leFile f = new File ( "Replacement.java"); FileInputStream fis = new FileInputStream (f); FileChannel fc = fis.getChannel (); // Get a CharBuffer from the source fileByteBuffer bb = fc.map (FileChannel.MAP_RO, 0, (int) fc.size ()); charset cs = charset.Forname ("8859_1"); charsetdecoder cd = cs.newdecoder (); charbuffer cb = cd.decode (bb); // Run Some matchesmatcher m = P.Matcher (cb); while (m.find ()) System.out.println ("Found Comment:" m.Group ());}} The conclusion Now pattern matching in the Java programming language and many other programming languages The same flexible. Regular expressions can be used in applications to ensure that data is correct before entering the database or sends to the application, and the regular expression can also be used for a wide variety of management. In short, in Java programming, regular expressions can be used in any way that needs mode matching.

JDK1.4 formal expression Written by William Chen (06/19/2002) -------------------------------- ------------------------------------------------what is Regular representation is for the file, string, through a very special representation of Search and Replace, because there are many system settings on UNIX, are stored in text files, so network management or programming often needs Searching and replacing, therefore developing a special command is called a formal representation we can use "S / So JDK1.4 provides a set of formal representation packages for your use. If you can use the JDK1.4 or less to HTTP: //jakarta.apache.org/oro acquired a string of S / S / S / S / S / S / S / S / S / S / SF in the relevant feature ". a .. ABC AB " " represents one or more character "*" represents zero or more than zero-oriented characteristic primordal primary primordin string AB * ABC ABC "()" Group regular original primord string (ab) * Aabab abab character class formal primord string complies with strings [A-DA-D0-9] * Abcza0 ABCA0 [^ ad] * ABE0 E0 [AD ] * Abcdefgh abab Simply / D is equal to [0-9] digital / d equal to [^ 0-9] non-numeric / S equal to [/ T / N / X0B / F / R] blank font / s equal to [^ / T / N / X0B / F / R] Non-blank character / w equal to [A-ZA-Z_0-9] number or English word / W is equal to [^ A-ZA-Z_0-9] Non-numbers and English words The beginning or end of a row ^ indicates the beginning of each line $ indicates the end of each row ------------------------------------------------------------------------------- ---------------------------------------------- Regular representation Java.util.Regex Related Categories Pattern - Regular Representation Matcher - Formalized Results PatternsyntaxExpression-Exception Thrown While Attempting to Compile A REG Ular Expression Example 1: Replace all the characters in the strings in accordance with "<" into "LT;" Import Java.io. *; Import Java.util.Regex. *; / *** All consistent in the strings " <"The character is replaced into" LT; "* / public static void replace01 () {// BufferedReader lets us read line-by-lineReader r = new inputsTreamReader (system.in); bufferedreader br = New BufferedReader (r); Pattern Pattern = Pattern.Compile ("<"); // Search for a string all of the '<' character Try {while (true) {String line = br.readline (); // null line means Input is Exhaustedif (line == null) Break; matcher a = pattern.matcher (line); while (A.Find ()) {system.out.println ("Search" A.Group ());} System.out.println (A.ReplaceAll ("LT;"));

// Replace all in line with the character in LT;}} catch (exception ex) {ex.printstacktrace ();};} example 2: import java.io. *; Import java.util.regex. *; / ** * Similar to StringTokenizer features * Separate the strings to "," and then compare which token max * / public static void search01 () {// bufferedReader lets us read line-by-lineReader R = New InputStreamReader (System.in); BufferedReader Br = New BufferedReader (r); Pattern Pattern = Pattern.Compile (", // s *"); // Search for a string all "," Try {while (true) {String line = Br. Readline (); String Words [] = pattern.split (line); // Null line means infut is exhaustedif (line == null) Break; ///1 means we haven't Found A Word Yetint LONGEST = -1; INT LONGESTLENGTH = 0; for (INT i = 0; isystem.out.println ("segmentation:" words [i]); if (Words [i] .length ()> longestLength) {longeest = i; longeestLength = Words [i] .length ();}} system.out.println ("The length is the longest:" Words [longest]);}}}}} catch (exception ex) {EX.PrintStackTrace ();};} -------------------------------------------------- -------------------------- Other formal grammar / ^ / s * # ignore the blank font (M (s | R | RS) /.) # in accordance with MS., MRS., And Mr. (Titles)

转载请注明原文地址:https://www.9cbs.com/read-18225.html

9cbs

New Post(0)