Chapter 7 Classification Regulations
This chapter describes the basic elements of the C program. You use these elements called "lexical elements" or "symbol" to construct statements, definition, description, etc., and use them to construct a complete program. This chapter discusses the following lexical elements:
* Language symbol
* Note
* Identifier
* Keywords
* Punctuation
* Operator
* Text
This chapter also includes Table 1.1, which gives the priority and combination law of the C operator (priority from the highest to minimum). See Chapter 4, "Expression" in Chapter 4, for a complete discussion of the operator.
Document translation overview
Like the C program and C procedures, each file is composed of one or more files, each file is translated in the following concept order (actual order follows "AS if" rules: As long as these steps are followed, translation inevitable):
1. Lord symbolization: This translation phase performs character mapping and tri-letter processing, line segmentation, and symbolicization.
2. Preprocessing: This translation phase is transferred to the auxiliary source file referenced by the #include command, handles the "character string" and "characterization" commands, perform symbol delivery and macro extensions (if you need more information, you can see it later " The "Preprocessor Command" in the Preprocessor Reference ", the result of the pre-processed is a set of ordered symbols, together with a" conversion unit ".
The preprocessor command always begins with a number symbol (#) (ie, the first non-empty character must be a number symbol). One line can only appear a preprocessor command.
For example: #include // contains iostream.h text in the conversion unit.
#define ndebug // Define ndebug (ndebug contains an empty document string).
3. Generate code: This translation phase is a symbol generated by the pre-processing phase to generate the target code. At this stage, the syntax and semantic check of source code are performed.
If more information is required, please refer to the "Translation Stage" in the "Preprocessor Reference" after this manual. The C preprocessor is a strict superchard of the ANSI C pre-regulator, but the C preprocessor is different in certain instances.
The following listed below is a few different points of the ANSI C and C preprocessors:
* Support for a single line of comments, see "Notes" for details
* A predefined macro __cplusPlus is only defined by C . For details, please refer to "Predefined Macro" in the "Pre-Processor Reference" behind this volume.
* C pre-regulator cannot recognize C . * .-> *, and :: operator. The operator details "Operators" and Chapter 4 "expressions behind this chapter.
Language symbol
A language symbol is a minimum unit that makes sense to the compiler in the C program. The C syntax analysis program identifies the following types of language symbols: identifiers, keywords, text, operators, punctuation symbols, and other separators. This string symbol constitutes a conversion unit.
Language symbols are usually separated by "blank", blank can be one or more:
* Space
* Level or vertical tab
* Removal
* Change page
* Note Coordination Symbol:
Keyword
Identifier
constant
Operator
Punctuation
Prerequisite symbol:
Head name
Identifier
PP number
Character constant
String text
Operator
Punctuation
Each non-empty character cannot be one of the above.
The syntax analysis program scans the input characters from left to right, and selects the longest language symbol set as much as possible from the input stream to divide the language symbol. Consider the following code segment:
A = i j;
The programmer written by this code may have the following two statements:
A = i ( J)
A = (i ) j
Since the syntax analysis program creates the longest symbol set from the input stream, it selects the second explanation to get symbol I , , and J.
Comment
The comment is the text ignored by the compiler, but it is very useful for program designers. Note The code is usually labeled for later reference. The compiler treats them as a blank. When debugging, you can use the comment so that the specific row code does not run; however, # if # Endif preprocessing commands better in this regard, because you can enclose the code included in the comment, but you can't nest.
A C note is written in one way:
> / * (Slash, asterisk) characters, followed by the designer sequence (including new row), follow the * / character. This syntax is the same as ANSI C.
> // (two slashes) characters, followed by the desired character sequence. A new row that is not followed by backwards will end this form of comments. Therefore, it is often referred to as "single-line comments". Note characters (/ *, * / and //) There is no special meaning in characters constant: string text or comment. Therefore, the annotation using the first syntax cannot be used nested, considering the following example:
/ * Intent: Comment Out this block of cotne.
Problem: Nested Comments On Each Line of Code Are Are Illegal.
FileName = String ("Hello.dat"); / * Initialize File String * /
COUT << "File:" << filename << "/ n"; / * print status message * / * /
The execution code cannot be compiled because the compiler scans the input stream, from the first / * to the first * /, it is considered a comment. In this case, the first * / appears in the end of the INITIALIZE FILE STRING. Then there is no one / * paired with it for the last * /.
Note: The single line form (//) follows a continuation result (/) will result in unexpected results. Consider the following code:
#includevoid main () {Printf ("this is a number% d", /// 5);
After pre-processing, the previous code is wrong and displayed as follows:
#indudevoid main () {Printf ("this is a number% d",}
Identifier
A identifier is a character sequence for encoding one of the following:
* Object or variable name
* Class, structure or joint name
* Enumeration type name
* Member L function or class member function of class, structure, joint, or enumeration
* TypedEf name
* Name Name
* Macro name
* Macro parameter
grammar
Identifier: Non-digital identifier Non-digital identifier digital non-number: one of the following - A b C D e f G H i j k L M N o P Q R S T U v w x y z A b C D E f G H i J k L M N o P Q R s T U v w x y z number: The following 0 1 2 3 4 5 6 7 8 9
Microsoft Special Office
The first 247 characters only on the Microsoft C identifier are meaningful. Since the user-defined type name is made complicated by the compiler "modified" to save the facts of the type information. The result name, including type information, can not exceed 247 characters long. (For details, please "Modify Name" in the "Microsoft Visualc 6.0 Programmer Guide". Factors that can affect the length of the modified identifier:
* Whether identifier indicates an object of a user-defined type of object or a derived type of a user-defined type.
* Regardless of the identifier representation or the type of function derived.
* The number of parameters of a function.
Microsoft End
The first character of an identifier must be a letter, regardless of uppercase or lowercase, or a underscore (_). Since the C identifier is sensitive to case, FileName and FileName are different.
The identifier cannot be used as the same spelling and case in use with keywords. The identifier is legal, for example, Pint is a legitimate identifier, although it contains keyword int.
In the beginning of an identifier, use a continuous two underscores (-) or one underlined head to keep up with a capital letter, reserved in all the range of C , you should avoid using a headed underline with a lowercase letter There is a name of the file scope because it may conflict with the identifier saved in the present or future.
Keyword
The keyword is a predefined retention identifier with special meaning. They cannot be used as identifiers in your program. The following is a keyword that is reserved in C :
The symptory keyword is one of the following
asm * auto bad_cast bad_typeid boolbreakcasecatchcharclassconstconst_castcontinuedefaultdeletedodoubleDynamic_castelseenumexceptexplicitexternfalseFinallyfloatforintgotoifinlinelongmutablenamespacenewoperatorprivateprotectedpublicregisterreinterpret_castreturnshortsignedsizeofstaticstatic_caststructswitchtemplatethisthrowtruetrytype_infotypedeftypeidtypenameunionunsignedusingvirtualvoidvolatilewhile * to achieve compatibility with other C reserved, but not implemented. Use --ASM.
Microsoft Special Office
In Microsoft C , the identifier at the beginning of two underscores is preserved for the compiler. Therefore, the Microsoft specifies that the double underscore is added before the Microsoft specific keyword, which cannot be used as an identifier.
allocate3 - inlineproperty3 - asm1 - int8selectany3 - based2 - int16 - single_inheritance - cdecl - int32 - stdcall - declspec - int64thread3dllexport3 - leave - trydllimport3 - multiple_inheritance uuid3 - exceptnaked3 - uuidof - FastCallnothrow3 - Virtual_inheritance - Finally
1. Alternative C ASM syntax
2. --BASED keyword is limited to 32-bit target compilation
3. These are special identifiers when used with -declspec, and there is no limit to applications in other context.
The Microsoft Expansion section is allowed in the default, to make sure your program is fully portable, you can use the specified ANSI compatibility / zA command line option to make the Microsoft expansion portion during compilation. When you do this, Microsoft Specific keywords are unavailable.
When Microsoft expands partial enable, you can use the keywords in front of the columns in your program. For ANSI applications, these keywords are crossed to double downline. For backward compatibility, all keywords except __except, _ _finally, _ _leave and __TRY, their single underline versions are supported, and __ cdecl is also available in front of the previous underscore.
Microsoft End
Punctuation
The punctuation in C has syntax and semantic meaning for the compiler, but it does not specify some kind of operation that generates values. Some punctuation, whether or in use or in combination, it can be used as an operator of C or is important for the pre-processor.
grammar
Practice symbol: one of the following
!% ^ & * () - = {} | ~
[] /; ': "<>,. / #
Point symbols [], () and {} must appear in the translation of the 4th phase.
Operator
The operator specifies the following evaluation operation:
* One operand (single operator)
* Two operands (double-purpose operator)
* Three operands (three-mean operator)
The C language contains all operators in C, which also adds several new operators, and Table 1.1 lists the operators available in Microsoft C .
The operator determines the order of operations that contain these operators in accordance with strict priority. The operator either combines the expression on the left, either combined with the expression on the right, which is called "combined law".
The operators in the same group have the same priority, and in the expression from left to right, the priority and combination law of the C operator is given from the left-to-right operation in the expression. High to low). Table 1.1 C operator priority and combination
Operator name or meaning combined with the law operator name or meanings Combination Law :: Range Dismiam:: The overall "array subscript from left to right () function call from left to right () type conversion None -> member selection (Pointer) from left to right · Member selection (object) from left to right suffix increase 1 No - Refix 1 No New Assignment Object No delete Undo Object Assignment None DELETE [] Undo Object Assignment No Prefix 1 No - prefix minimize 1 no * Cancellation association no & Take address No single operator plus no - arithmetic non-operation (single-grade) no! Logic is not bits, the size of the SizeOf object is not sizeof () Type size No TypeId () Type None (Type) Type Force (Conversion) Forced (conversion) No Dynamic-Cast type Force (conversion) No reinterpret-cast type Force (conversion) No static -cast type mandatory (conversion) None. * Using the pointer to the class member (object) from left to right-> * Cancellation class member pointer from left to right * multiply from left to right / except from left to right% (take Modulus) from left to right from left to right - reduce from left to right << left shift from left to right >> right shift from left to right
Text
Non-argumentable program elements are called "text" or "constant". The terms "text" and "constant" are used interchangeably.
There are four main categories: integer, character, floating point, and string.
grammar
Text:
Integrity constant
Character constant
Floating point constant
String text
Integrity constant
The integer constant is a constant data element without the decimal part or index, always starting with a number. You can specify integer constants in decimal, octal or hexadecimal form. Can be specified as a symbol or no symbol type
And long or short.
grammar
Integral constant:
Decimal constant integer suffix OPT
Octa constant integer suffix OPT
Hexadecimal constant integer suffix OPT
'C Character Sequence' Decimal Constant:
Non-0 number
Decimal constant number
Eight reformed constants:
0
Octa-made constant eight feed numbers
Hexadecimal constant:
0x hexadecimal number
0x hexadecimal number
Hexadecimal constant hexadecimal number
Non-0 numbers: one of the following
1 2 3 4 5 6 7 8 9
Okimony: one of the following
0 1 2 3 4 5 6 7
Hexadecimal number: one of the following
0 1 2 3 4 5 6 7 8 9
A b C D e f
A b C D e f
Integer suffix:
No symbolic suffix long suffix OPT
Long suffix unsigned suffix OPT
No symbolic suffix: one of the following
u u u u
Long suffix: one of the following
L L
64-bit integer suffix
i64
Use an octal or hexadecimal count method to specify an integer constant, use a prefix indicate the base. To specify an integer constant of a given type, use a suffix specified type.
Specify a decimal constant and must begin with a non-0 number. E.g:
INT i = 157; // decimal constant
INT j = 0198; // is not a decimal number, is a wrong octic constant
INT K = 0365; // The head of the head is specified is an octal constant, not a decimal number.
Specify an octal constant, then start with 0, followed by a digital sequence from 0-7. Number 8 and 9 are erroneous when specifying an octal constant. For example: INT i = 0377; // Octa constant
INT j = 0397; // Error: 9 is not an octave
Specifies a hexadecimal constant, starting with 0x or 0x (case of size), followed by the number of sequences within the range of 0-9 and A (or A) -F (or F). The numerical range representative of hexadecimal A (or A) to F (or F) is 10-15. For example: INT i = 0x3FFF; // Hex hexadecimal constant int J = 0x3FFF; // is specified as a non-symbol type, then uses a U. E.g
: unsigned uval = 328u; // No symbol number
Long lval = 0x7ffffl; // long value as hexadecimal constant
Unsigned long ulval = 0776745 ul; // No symbol long value
Character constant
The character constant is one or several members in the Source Clear Set, and the source character set is a character set used in a program, which is enclosed by single quotation marks ('). They are used to indicate the "execution character set", that is, the character set in the character set of the machine executes the machine.
Microsoft Special Office
For Microsoft C , the source character set and the execution character set are the ASCII code.
Microsoft End
There are three character constants:
* Ordinary characters
* Multi-character constant
* Wide character constant
Note: Use a wide character constant to replace the multi-character constant to ensure that the portability character constant is specified as one or more characters enclosed in single quotes, for example:
CHAR CH = 'x'; // Specify ordinary characters
INT MBCH = 'ab'; // Specify multi-character constants depend on the system
Wchar_t wcch = l'ab '; // Specifying the wide character constant When the type of MBCH is int, if it is illustrated as a CHAR type, the second byte will be retained. A multi-character constant has four meaningful characters. If the specified number of characters exceeds four, an error message will be generated.
grammar
Character Constant: C 'Character Sequence' L'C Character Sequence 'C Character Sequence: C Character C Character Sequence C Character C Character: Source Character Concentration Insertion No. ('), Reverse Slash (/) or Livelife Any character transfusion sequence transfusion sequence: Simple escape sequence eight-binary transfusion sequence hexadecimal transfusion sequence simple escape sequence: one of the following / '/ "/? // / A / b / f / n / r / T / V eight refrigeration sequence: / octal digital / eight-input digital eight-en-numeric / eight-input digital eight-binary digital eight-way digital hexadecimal transfusion sequence: / X hexadecimal digital hexadecimal transfusion sequence hex digital
Microsoft C supports normal characters, multi-characters, and wide characteristics, using wide character constants to specify members of the expansion of the character set (for example, to support international applications). Ordinary characters constants are type char, multi-character constant type INT, wide character constant type is Wchar_t. (Wchar_t type is defined in Standard Header file stddef.h, stdlib.h, and string.h, but the prototype of the wide character function is only in stdlib.h).
The only difference between the specified ordinary and wide-characterized constant is a wide character constant letter to head. E.g:
Char schar = 'x'; // ordinary character regular
Wchar_t wchar = l '/ x81 / x19'; // Wide word constant
Table 1.2 gives these characters to be represented by a escape sequence or non-displayed character depends on the system or not allowed to be used in the character constant.
Table 1.2 C reserved or non-displayed characters
Character ASCII indicates that ASCII value transfusion sequence wrap NL (LF) 10 or 0x0A / N horizontal tab HT9 / T vertical tab VT11 or 0x0B / V retractable BS8 / B Enter CR13 or 0x0D / R switch FF12 or 0x0C / F Alarm BEL7 / A backslash / 92 or 0x5c // question mark? 63 or 0x3f /? Single quota "34 or 0x27 / 'double quotation mark" 34 or 0x22 / "eight into the number OOO- / OOO sixteen Motor hHH- / XHHH empty character NUL0 / 0
If the character behind the reverse slash is not a specified legal escape sequence, the result is determined when implemented. In Microsoft C , the characters followed by the reverse slash are received as text, although the escape does not exist, and the warning of the layer 1 ("Unrecognized character escape sequence"). The octacular transfi sequence specified in the form of / OOO includes a reverse slash and one, two or three octic characters; the hexadecimal transfusion sequence specified in the form of / xhhh includes characters / x and subsequent Six-in-one digital sequence; different from the octal transfusion sequence is that there is no limit to the number of hexadecimal numbers in the hexadecimal transfusion sequence.
The octal transfusion sequence ends by the first non-eight-feed number or when three characters are visible, for example, Wchar_t OCH = L '/ 076a'; // End CHAR CH = '/ 233' at A; // The sequence ends similarly after 3 characters, and the hexadecimal transfusion sequence ends at the first non-hexadecimal number.
Since hexadecimal numbers include letters A ~ F (and A ~ F), it is determined that the escape sequence ends at the desired number. Since single quotes (') are used to enclose characters, use the escape sequence /' to represent the included single quotes. Double quotes (") may not be used to indicate the sequence. The reverse slash (/) is placed in the end of the row, if you want a reverse slash in a character constant, you must Place two backslash in a row (more information on the row can refer to the "translation phase" in the "Preprocessor Reference" behind this volume).
Floating point constant
The value specified by floating-point constants must have a fractional portion, which contains a decimal point (.) And can include an index.
grammar
Quantity of floating point:
Decimal constant index part OPT floating point suffix OPT
Digital Sequence Index Partial Floating Point Sufficient OPT
Decimal constant:
Digital sequence OPT. Digital Series
Digital sequence
Index section:
E Sword OPT Digital Sequence
E Sword OPT Digital Sequence
Symbol: one of the following
- Digital sequence:
digital
Digital sequence digital floating point suffix: one of the following
f L f L
Floating-point constants have a "mantissa" specified value, an "index" specified value, and an optional suffix specifies the type of constant. The mantissa is specified as a character sequence with a decimal point with an optional digital sequence of the number of fractional portions. E.g:
18.46
38.
If an index appears, it is used as a power of 10 to specify the magnitude of the number, as shown in the following example:
18.46E0 //18.46
18.46e1 //184.6
If an index occurs, the decimal point of the tail may not, the whole number is like 18E0. The default type of floating point constant is Double. The suffix f or L (or F or L, the suffix is not case-sensitive), and constants can be specified as Float or Long Double, respectively.
Although Long Double and Double have the same representation, they are different types. For example, you can overload functions:
Void func (double); and void func (long double);
String text
String text is a string of 0 or more characters in the source character set enclosed in a double quotation, a string text represents a character sequence, together and constitutes a string ending with spaces.
grammar
String text:
"S Character Sequence OPT"
L "S Character Sequence OPT"
s Character Sequence:
s character
S Character Sequence S Character
s Character:
Any member of the source character set in addition to double quotation marks ("), backslash (/) or newline characters
Escape sequence
C strings have these types:
* CHAR [N] array, n is a string (in character mode) length plus 1 because the end of the end value '/ 0' identifies the end of the string.
* Wchar_t array, for a wide string.
The result of modifying string constants is uncertain, for example:
CHAR * SZSTR = "1234";
SZSTR [2] = 'a'; / / The result is uncertain
Microsoft Special Office
In some cases, the same string text can be "merge" to save the space of the file, when the string text merges, the compiler makes all references to a particular string text to the same location in the memory, and Not allowing each reference to point to their respective string text instances. / GF Compilation option enables string merge.
Microsoft End
When the string text is specified, the adjacent strings will be connected. Therefore, as follows:
Char szstr [] = "12" "34"
The same is identical to the following description: char SZSTR [] = "1234";
This connection of such adjacent strings makes it easy to specify a long string through a plurality of rows.
Cout << "Four Score and Seven Years"
"AGO, OUR Forefathers Brought Forth"
"Upon this Continent a new nation."
In the previous example, the entire string "Four Score and Seven Years AGO, OurForefathers Brought Forth Upon this Continent A New Nation." Is divided together. This string can also be specified using a row split as follows:
Cout << "Four Score and Seven Years /
AGO, OUR Forefathers BROUGHT FORTH /
Upon this Continent a new nation. "
After all adjacent strings in the constant are connected, the NULL character '/ 0' is added, providing a string end tag for the C-character string processing function.
When the first string contains an essential character, the connection of the string may generate unexpected results, considering the following two instructions:
Char szstr1 [] = "/ 01" "23";
Char szstr2 [] = "/ 0123";
Although the SZSTR1 and SZSTR2 contain the same value naturally, the values involved in the actual are given by Figure 1.1.
Figure 1.1 Connection between escape and string
Microsoft Special Office
A string text is about 2048 bytes, which is suitable for a string of char [] and wchar_t [] types. If a string text contains the part enclosed by the double quotes, the preprocessor connects this part into a single string, and is connected to each line, plus an additional byte on the total byte.
For example, a string contains 40 rows, 50 characters per row (2000 characters), and a line of 7 characters, and each line is enclosed by double quotes, with a total of 2007 bytes, plus one byte End empty characters, a total of 2008 bytes. When the string is connected, the previous 40 rows add an additional character to the total, so that the total of 2048 bytes (additional characters are not true to the string). Note, if you use the continuation (/) instead of dual quotation marks, the preprocessor does not add an additional character for each row.
Microsoft End
By counting the number of characters to add 1 or add 2 to the WCHAR_T type, then determine the size of the string object.
Because dual quotes (") are used to enclose the string, the transfalth sequence (/") is represented by the essential dual quotes itself. Single quotes (') may not be represented without the sequence of escape. The reverse slash is placed at the end of the row, if you want the reverse slash in a string, you must hit two backslashes (//) (see questions about the continuation problem) The "Translation Phase" section of the "Preprocessor Reference" later.
In order to specify a string of a wide character (Wchar_t []), start with a double quotation mark with character L. E.g:
Wchar_t wszstr [] = l "lalg";
All common escape code listed in "Character Constants" is legal in string constants, for example:
Cout << "first line / nsecond line";
COUT << "Error! Take Corrective Action / A";
Since the escape code is terminated at the first non-heteo-numbers of characters, it is possible to specify a string constant with a sixteen-based escape code to specify the unexpected result. The following example is intended to create a included ASCII 5 string text, followed by characters FIVE:
"/ x05five"
The actual result is the 5F of hexadecimal, that is, the Underline of the ASCII code, followed by the character IVE.
The following example is generated by the desired result:
"/ 005five" // uses an octal constant