Flex2.5 User Manual (2)
Flex recognizable input file format (used to describe the scanner to be generated) Flex's input file includes three parts of the following, separated by %%, as follows:
definition
%%
rule
%%
User code
The definition segment includes a simple name declaration, and a macro definition for simplifying the description of the scanner, and the statement of the start condition, and will be introduced later.
The form of defining the name is as follows:
Name definition
The name begins with letters or underscore '_', followed by zero or multiple letters, numbers, _ or - (dash). The definition part is as a sign (token), from the first non-empty character after the name begins until the end of the line is a defined section. Subsequently, the definition can be referenced by name. E.g:
Digit [0-9]
ID [A-Z] [A-Z0-9] *
"DIGIT" is defined as a regular expression that can match a single number, "ID" also represents a regular expression, which can match such a string: "starting with letters, then zero or multiple letters And numbers. "
Reference through the name:
{DIGIT} "." {DIGIT} *
The above expression and the complete equivalent below
([0-9]) "." ([0-9]) *
The above formula can match such a string: "There is one or more numbers in front of the decimal point, and there are zero or more numbers after the decimal point."
The rules definition in the rules segment is adopted as follows:
Mode action
The mode must start writing in the head, and there is no space, and the action must start from the same line. Later, we will further discuss models and movements, here is briefly introduced.
Finally, the content of the user code segment will be copied directly to 'lex.yy.c', as a subroutine of the scanner. This segment is optional; if this paragraph is not provided, the second '%%' will be skipped.
Any text in the definition segment and rule segment and the contents in '% {' and '%}' are copied to the output.
In the rule segment, any row in the first rule or text in% {} can be used to declare local variables and code used in the scanner routine, as long as you enter the scanner routine. Code. The text in multiple lines or% {} located in the rule segment will be copied to lex.yy.c, but this definition is not clear, may cause errors when compiling (this feature is in line with the POSIX specification) Other features will be discussed below).
In the definition segment, a single line annotation (for example, the line starts with "/ *") will be copied directly to Lex.Yy.c. (Note: The above regulations will not follow in the rules.)
Patterns
The mode is the extension of the regular expression, as follows:
`x 'Match Character' X '` `' In addition to any character other than the row (bytes). `[Xyz] 'illustrates a" character class "; at this time, mode matches any of` x', `Y 'or` z'. `[Abj-oz] 'has a range of" character classes "; match' a ',' b ',' j 'to' o 'any letter, or' Z '.
`[^ A-z] '" NEGATED CHARACTER CLASS), which can match all characters other than characters declared in "Non-character classes", in this case, match all the characters except the uppercase letters. `[^ A-z / n] 'Except for any character outside the uppercase letter and the wrap.
`r * 'zero or more R, R can be any regular expression.
`R 'One or more R
`r? 'zero or one r
`r {2,5} '2 to 5 R
`r {2,} '2 to infinity R
`r {4} 'matches exactly 4 R` {name}' here will be expanded in front of the definition of "Name". (See Above)
`" [xyz] / "foo" 'puts foo to match the literal meaning, equivalent to: `[xyz]" foo', '/' for escape symbol
`/ x 'If x is` a', `b ',` f', `n ',` r', `t ', or` v', then / x will be interpreted as an ASIC-C character, otherwise, Will be written to explain the `x '(this form is used to transfix the action like a *' such a operator)
`/ 0 'a Nul Character (ASCII Code 0)
`/ 123 'character represented by an octave 123
`/ x2a 'characters represented by hexadecimal 2A
`(r) 'Matching a R; parentheses used to change the priority of the original characters (see Below)
`RS 'matches the regular expression S after regular expression R; this way is called:" Connect "
`r | s' indicates that matching regular expression R or regular expression S
`r / s' indicates that R is matched when S is matched. '/' Itself does not match any character, when it is necessary to determine that the rule is the longest match, the text that matches S will be included, it should be noted that the text that matches the S-match before the action is executed Returned to the input buffer. Therefore, it can only be seen when performing the action (the translation note: "This means that YYText will not contain characters that match S, and they will not be recorded in YYLENG). This type of mode is called the TRALING CONTEXT. (Flex does not correctly identify some of the combination of `r / s'; please refer to the" Flex's shortcomings and its BUGS "section, there will be" the danger of the right below ")
`^ r 'only matches R (i.e., the position of the scan starts, or after scanning a newline). (Demolition: '^' does not match any character, sometimes we also call this model to the upper left mode)