Different behavior of Ecmascript and standardized regular expressions

xiaoxiao2021-03-06 144

Jun Jun published an article in its column, mentioned the difference between the regular expression of JavaScript and the regular expression of .NET. The original address is: http://blog.9cbs.net/ghj1976/76967.aspx.

In the 9CBS user registration, the registration name requires regular expression [A-ZA-Z_0-9], so it is used to match, but / w is equivalent to [A-ZA-Z_0-9] in JavaScript. However, in the verification of the server side, the regular expression verification of .NET Framework is not equivalent to [A-ZA-Z_0-9]. Since the regulatory regular expression is conforming to the regulatory regular expression, it supports the Unicode character category, so it is equivalent to [/ p {ll} / p {lu} / p {lt} / p {lo} / p {nd} / p {pc}], that is, in accordance with Unicode letters (Lu, LL) character classes, Titlecase Characters (LT), Arabic Digital Class (ND), Connection Symbol (PC) and Other Text class.

In fact, this difference is not just the difference between JavaScript regular expressions and .NET regular expressions, is also the difference between EMCAScript and normalized regular expressions. The regular expression and normalized regular expressions of EMCAScript have three aspects, which is the use of EMCAScript regular expressions and standardized regular expressions. These three differences are described in a reference documentation provided by MSDN.

Matching the specified character class in the matching expression is different. By default, normalized regular expressions support Unicode character categories. ECMAScript does not support Unicode. The regular expression capture class that has the backward reference must be updated at each capture iteration. Different ways to process polymity between eight-binary escapes and rear reference modes.

There is a table in the distinction and rear reference explanation.

Standardized Regular Expression Behavior ECMAScript Behavior If / then follows 0, then follow the 0 to 2 octal numbers, it is interpreted as an octal. For example, / 044 always represents s. The same behavior. If / then follows a number from 1 to 9, then there is no other decimal number, the rearward reference is explained. For example, / 9 always represents the backward reference 9, even if the capture 9 does not exist. If the capture does not exist, the regular expression analyzer will trigger a syntax abnormality. If there is a single decimal number capture, the number will be referenced later. Otherwise it is explained as text. If / then follows a number from 1 to 9, there are other decimal numbers, and these numbers are converted to a decimal value. If there is a capture, it is interpreted as a rear reference. Otherwise, it is interpreted as an octal number of the preamble of the preamble of / 377; the remaining number is interpreted as text. For example, for / 400, if there is a capture 400, it is interpreted as a rearward reference 400; if the capture 400 does not exist, / 400 is interpreted as an octave number / 40 of the post-heel 0. If the / follows the numbers from 1 to 9, there is any decimal numbers, by converting as many numbers as possible to the decimal value that can be cited to capture, the backward reference. If any numbers cannot be converted, it is interpreted as an octal number of the preamble of the 877, and the remaining numbers are interpreted as text.

The regular expression of .NET Framework provides a richer selection than EMCAScript, Options can be enumerated by regexoptions, where there is an option to match the EMCAScript behavior. The regular expression can be set to EMCAScript behavior through Regex Options attributes, which sets the regular expression options of the server side to EMCAScript to ensure that the browser side and the server-side verification is exactly the same.

转载请注明原文地址:https://www.9cbs.com/read-125937.html

9cbs

New Post(0)