Getting started 2

xiaoxiao2021-03-06  14

in

When we were more relaxed in the first grade

Java &

Regex

(Is the abbreviation of regular expressions, with

Java

Some basic usage of the package independent). The primary task of the first grade is:

1 I have made several running programs. Things we have to learn later, we can use similar procedures.

2 We found the Java class associated with Regex - Pattern and Matcher, String, StringBuffer and StringTokenizer we focus on learning them. (In addition, there will be a few 咚咚 -PatternsyntaxException, Java.util.Scanner, temporarily. ).

3 We understand the general characteristics of Regex, a string that generates strings. No matter how it is complicated, it is just a special string,

In the second grade, we prepare the system's learning, and it is impressed. YQJ2065 discovered a super good tool - REGULATOR, it is a senior, Free regex test and learning tool, it makes you ... come here http://regex.osherove.com/ ourselves, I went to download it. ,Ha ha.

Installation ... problem? What does it need one .NET Framework! ! ! A small tool wants that big Framework, I reloaded the system, halo. The heart is also bluffing with M $. It provides its C # source code. 】

I have discovered a super-cool tool - the aitpad pro, which is a senior, free regex test and learning tool, which makes you ... to this Download EditPad Pro Demo for Windows 95/98 / ME / NT4 / 2000 / XP (1.8 MB) look, I downloaded, huh, huh.

The problem is, its regex flavor is almost Identical to The One Used IN Perl 5. This makes me a little unassay because Java compares the difference between Perl 5 in its Pattern document.

Originally, I would like to learn Java & Regex, I can fight the little drum in my heart, I can't help but black. [YQJ2065 Tip: I want to learn to learn from you. Skip the part you don't like. 】

All 2nd topics: Regular expression syntax

§1 Regular Expression: Angel | Devil

Through the first grade study, I found that the two legs are very uncomfortable, lift Java, lift regex, I want a leg walk. I am squatting, like a three-level jump, focusing on regex learning. We don't want JVM, it may be much easier.

Regular Expression translates to a regular expression, a look, it is very learning. If I say that the regular expression starts in Java, no one believes; if someone says the regular expression starts on the UNIX system, we don't believe it.

In 1956, the mathematician Stephen Kleene was based on the early nervous system of Warren McCulloch and Walter Pitts, which made a collection of mathematical symbolic systems - REGULAR SETS, rules. This buzz is quickly used by computer scientists to scan or lexical analysis (Lexical Analysis). Therefore, regular expressions start from automaton theory and form language theory (we will contact regular expressions in formal language and automatological theory courses, belong to theoretical computer science), we may also be exposed to compilation principles Regular expression. [Ref: "Compilation Principle and Practice"] Regular expression powerful text handling ability, soon being applied to UNIX THMPSON to UNIX; Since then, regular expressions are widely used in Unix system, Perl, PHP Language and development environments such as Delphi, JavaScript, C # (. Net), Java, Python, Ruby, and many applications, especially text editors. It is worth mentioning that Perl Regular Expressions forms a general criterion, and people often use PCRE (Perl Compatible Regular Expressions, as IBM compatible machines. [Http://en.wikipedia.org/wiki] Yqj2065

Why is Java to provide support for REGEX until JDK1.4? This makes many people dissatisfied. Before JDK1.4 appears, there are some third-party libraries, and now you may not need it. E.g:

l Package com.stevesoft.pat [http://www.javaregex.com/patfull.html], here there are some interesting things. For example, Regame, The Regular Expression Game is REGAME.

l

Source code open regular expression library: Jakarta-ORO regular expression library, is the most comprehensive regular expression API, and it is fully compatible with Perl 5 regular expressions.

Until now, many people are still learning and using StringTokenizer, on the one hand, the textbook is behind (lag), on the one hand, many people think

Regular expressions, if we regard it as a flood beast, we will never understand it. In fact, the only difficulties of learning regular expressions are just it is not intuitive.

§2 Regular expression is the language of the definition language

Since XML is familiar with people, the language of the definition language is not mysterious. Intuitive, regex is a string that generates strings. These strings generated by a regex have formed a language.

Regular expression r is completely defined by a string set (string set) it matches. This collection is called the language generated by the regular expression (Language generated by the regular expression), which can be written L (R) - R-mode L. The language here only means "a collection of strings." Such as:

L (a) = {a}

L (a ) = {a, aa, aaa, aaaa, ...}

The language first relies on the applicable character set, typically a collection of ASCII characters or a subset of it, Java uses the Unicode character set, this character set is the alphabet we can use in regular expressions - σ.

Regular expression R consists of:

l Character in the alphabet. It is to be noted that A in L (A ) is not simple A, it is a mode (template). l There are special meaningful characters - meta-character. They may also also be formal characters in alphabets, such as * , etc., may not be characters in alphabets, such as / n, / x09, etc. At this time, we process it by escape character (escape character). In the source code, for the former, add one /; for the latter, remove /.

[The following symbol refers to the calculation of the collection]

The minimum regular expression has three forms:

L (a) = {a}. Single character a. At this point A may be the character in any legal alphabet. Such as L (x) = {x}.

L (ε) = {ε}. ε (EPSILON) means an empty string - an empty string is a string that does not contain any character. Similar to string str = "".

l () = {}. ф indicates empty set, which does not match any strings.

Note that {ε} and {}: {} set does not include any strings, and {} contains a string - string without any character.

Three basic operations of regular expressions:

l Converted - expressed by element characters | (vertical line).

If r and s are regular expressions, the regular expression R | S can match any string of R or S matched. R | S Language is the Union of R Language and S Language.

For example: L (a) = {a}, L (c) = {C}

L (a | c) = L (a) ∪L (c) = {a} ∪ {c} = {a, c}.

Another example: L (a | b | c | d) = {A, B, C, D}.

[Equivalent in Java's regular expression, writing methods are: A | C, [AC], A-Z, etc.

l Connect - do not use the metamorphic, and write it.

Regular expression R and regular expression S can be written for RS.

Here, the role of parentheses () metades is illustrated here. Regular expression L (ab) = {ab}, L (c) = {C}, then regular expression L ((A | b) c) = L (a | b) ⊙ L (c) = {AC, BC}

[The connection method is: a {2, 4}, etc.]

l Duplicate or "Closure" - expressed with element characters *.

l

Duplicate or "Closure" - is represented by element character *.

R is a regular expression. Regular expression r * will match any poor connection of the R string. Although we said L (a *) = {} ∪ {a} ∪ {aa} ∪ {aAa} ∪ ... = {ε, a, aa, aaa, aaaa, ...} is an infinite set.

[Closed bag is born in Java's regular expressions:

L (a ) = l (a *) - ε is A matching A, AA, AAA, AAAA, ...

L (a?) = L (ε | a). That is, a? Match ε, a. 】

The priority and brackets of the operation

The three basic operations are priority to closure> links> parallel, unless used () change.

L (a | b *) = {a, ε, b, bb, bbb ...}

L ((a | b) *) = {ε, A, B, AA, AB, BA, BB, ...},

Defining Regular Expression is one of the following:

1. Basic (B A S i c) Regular expression consists of a single character A (where A is in the alphabet å of the regular characters), and the metamorphic or metamorphic character. In the first case, L (a) = {a}; in the second case, L (ε) = {ε}; in the third case, L (ф) = {}. R and s are all regular expressions:

2. R | S format expression: l (r | s) = L (r) ∪ L (s).

3. Expression of RS format: L (R s) = L (r) L (s).

4. R * format expression: l (r *) = l (r) *.

5. (r) Expression of the format: L ((r)) = l (r), parentheses does not change the language, and they only adjust the priority of the operation.

[These things are the basic knowledge you should know. Ref: "Compilation Principle and Practice"]

Exercise

2.1-1 construct a language on simple alphabet σ = {a, b, c}, which is a collection of all strings including b, such as AABA, ABCCA, and more.

Answer: (a | c) * b (a | c) *

For example: string str = "abaAabcccbabaaaaabc", replaced with Java when matching. The output is: Javajavajavajavajava. Why is the above matching method is not AbaAabcccbabaaaAabc because there is so-called maximum matching - greedy match.

2.1-2 Constructing a language on alphabet σ = {a, b}, the collection of string S is composed of one B and there is the same number of objects before and after it: s = {B, ABA, Aabaa, Aaabaaa,. }

Answer: It's notified. Regular expressions do not describe this collection. Repeated operation only has a closed package operation *, but A * BA * can ensure that the number of A before and after B is equal. It is usually expressed as "regular expression that cannot be calculated" - mathematical argument in the automice theory is a famous PUMPING LEMMA theorem.

转载请注明原文地址:https://www.9cbs.com/read-47372.html

New Post(0)