C language: traps and defects

xiaoxiao2021-03-06  54

Original: Andrew Koenig - AT & T Bell Laboratories Murray Hill, New Jersey 07094 Translation: LOVER_P0 Introduction C language and its typical implementation are designed to be easily used by experts. This language is simple and accompanied by expression. But there are some restrictions that protect those impetuous people. An impetuous person can get some help from these terms. In this article, we will look at these unknown benefits. This is because it is unknown, we can't make a complete classification for it. However, we still do this by studying things you need to do for a C program. We assume that the reader has at least a shallow understanding of the C language. The first part studies the problem that occurs when the program is divided into a mark. The second part continues to study the problem that the program's marker is compiled as a declaration, expression, and statement. The third part studies the C procedures composed of multiple parts, compiling and bind them together. The fourth part handles the conceptual misunderstanding: what happens when a program is implemented. The fifth part studies our procedures and the relationship between the common libraries they use. In the sixth section, we noticed that the procedures we have written are not the procedure we run; the preprocessor will first run. Finally, the seventh part discusses the replacement problem: a reason why a program that can run in an implementation cannot be run in another implementation. The first part of the 1 lexical defect compiler is often referred to as a lexical analyzer. The lexical analyzer checks the character sequence of the assembly program, and divides them into a marker as a sequence with one or more characters, which has one (associated) uniform meaning when it is compiled. In C, for example, the meaning of the marker-> has a significant difference between each independent character that makes it, and its significance is independent of the context environment. Another example, considering the following statement: if (x> BIG) BIG = X; each of the separated characters in this statement is divided into a marker, except for two instances of the keyword IF and identifier BIG. In fact, the C procedure is divided into a marker twice. The first is the preprocessor read program. It must divide the program to discover the identifier of the identifier macro. It must replace the macro by evaluating each macro. Finally, the macro replacement program is collected into a compiler. The compiler resorted this stream for the second time. In this section, we will explore universal misunderstandings of the signing, and the relationship between the marker and the characters that make up them. We will talk about the preprocessor later. 1.1 = Not == Language derived from Algol, such as PASCAL and ADA, using: = indicating assignment and use = indication comparison. The C language is used to assign a value with = ==. This is because the frequency of assignment is higher than the comparison, so a shorter symbol is allocated. In addition, C also treats assignment as an operator, so the multi-repetition value (such as A = B = C) can be easily written, and the assignment can be embedded in a large expression. This convenient result has a potential problem: it may be written to assign values. Therefore, the following statement seems to appear to check if X is equal to Y: if (x = y) foo (); and actually, the value of X is set to Y and checks whether the result is non-zero. In the following, a loop that wants to skip spaces, tabs, and wraps: while (c == '|| c =' / t '|| c ==' / n ') c = getc (f) Local programmers compared to '/ t' error use = instead of ==.

This "comparison" is actually assigned '/ t' to C, then determines whether the (new) value is zero. Because '/ t' is not zero, this "comparison" will always be true, so this loop will exhaust the entire file. What happens after this depends on whether a program allows a program to read more than the file tail. This loop will run if it is allowed. Some C compilers give a warning to the condition such as E1 = E2 to remind the user. When you really need to assign a variable after assigning a variable is not zero, in order to avoid warning messages in this compiler, it should be considered to give a comparative comparison. In other words, will: if (x = y) foo (); rewrite is: if ((x = y)! = 0) foo (); this can clearly indicate your intention. 1.2 & and | Not && and || Easy to == Writing is = because many other languages ​​use = indicate comparative operations. Other Optical Optical Operators are also & & &&, or | and |, this is mainly because of the & and | operators in the C language differ in other languages ​​with similar functions. We will observe these operators near Section 4.. 1.3 Multi-character markers Some C markers, such as /, *, and = only one character. And some other C markers, such as / * and ==, and identifiers, have multiple characters. When the C compiler encounters / and *, it must be able to decide whether to identify the two characters as two separate markers or a separate mark. The C language reference manual describes how to determine: "If the input streams flows to a given string has been identified as a mark, the next character should be included to form the longest string that can constitute a mark. Therefore, if the / is the first character of a mark, and / follows the following *, the two characters constitute the beginning of the annotation, regardless of other context environments. The following statement looks like a value that is set to x with the value of x, the value points to the value: y = x / * p / * p pointing to the division * /; actually, / * start a comment, so compile Simply devour the program text, until * / appearance. In other words, this statement only sets the value of Y to x, but does not see P. Rewrite this statement as: y = x / * p / * p pointing to the division * /; or simply y = x / (* p) / * p pointing to the division * /; it can make an comment implied by the division . This ambiguity is a problem in other environments. For example, the old version of C uses = represents = in the current version. Such a compiler will treat A = -1; a = - 1; or A = a - 1; this will make the programmer who intends to write a = -1; is surprised. On the other hand, this old version of the C compiler will set a = / * b; the selection is A = / * b; although / * looks like a comment. 1.4 Exception combination assignment operator such as = is actually two marks. Therefore, A / * STRANGE * / = 1 and A = 1 are one meaning. It seems like a separate marker and actually has a number of marks. In particular, P -> a is illegal. It and P -> a are not synonymous. On the other hand, some old compilers still treat = as a separate marker and = is synonym. 1.5 Strings and character single quotes and dual quotations are completely different in C. They will cause strange results instead of error messages in some confusing contexts. A character surrounded in single quotes is just another way to write an integer. This integer is a given value in a corresponding value in the realized comparison sequence.

Therefore, in an ASCII implementation, 'A' and 0141 or 97 represent exactly the same thing. A string surrounded in double quotes, just writes a pointer to a nominal group with a character between the characters between the double quotes and the character initialized by the zero character. The following two program pieces are equivalent: Printf ("Hello World / N"); char Hello [] = {'h', 'e', ​​'L', 'L', 'O', '', 'W', 'o', 'R', 'L', 'D', '/ N', 0}; Printf (Hello); uses a pointer to replace an integer to get a warning message (vice versa ), Use double quotes to replace single quotes, will also get a warning message (vice versa). But except for compilers that do not check the parameter type. Therefore, use printf ('/ n'); instead of printf ("/ n"); usually weird results at runtime. Since an integer is usually large enough, some C compiler allows multiple characters to be stored in one character constant. This means that "Yes" will not be discovered with 'yes'. The latter means "The first address of the four consecutive memory regions of Y, E, S and an empty character," and the former means "in some implementation defined styles represent by characters Y, An integer composed of E, S combined. " Any consistency between the two is purely a coincidence. 2 syntactic defects should understand the C language program, only to understand the mark constituting it is not enough. It is also understood how these marks constitute declarations, expressions, statements, and programs. Although these components are generally defined, these definitions are sometimes contrary to instinct or confusion. In this section, we will focus on some unexpected syntactic structures. 2. I understand that I have been talking to some people with some people, they are writing C processes that are written on a small microprocessor on a small microprocessor. When the switch is turned on, the hardware calls the subroutine at 0. In order to imitate the power to open, we want to design a C statement to explicitly call this subroutine. After some thinking, we wrote the following statement: (* (void (*) ()) 0) (); this expression will make the C programmer is awkward. But don't need this, because they can easily construct it with a simple rule: declare it in the way you are using. Each C variable declaration has two parts: one type and a set of expectations with specific formats to be used to evaluate this type. The simplest expression is a variable: float f, g; indicating expressions F and G - when evaluating - has type float. Since the expression is given, cracker: float ((f)) can be freely used; (f)) evaluation is Float and therefore, by inference, F is also a float. The same logic is used in functions and pointer types. For example: float ff (); represents expression ff () is a float, so FF is a function that returns a FLOAT. Similarly, float * pf; indicates that * pf is a float and therefore PF is a pointer to a FLOAT. These forms of combined declarations are the same. Therefore, Float * g (), (* h) (); represents * g () and (* h) () are Float expressions. Since () ratio * is more tight, * g () and * (g ()) indicates the same thing: g is a function that returns a Float pointer, and H is a pointer to the function of returning float.

When we know how to declare a given type variable, we can easily write a type of model (CAST): as long as the variable name and the semicolon are deleted and all things are enclosed in a pair of parentheses. Therefore, due to float * g (); declaration G is a function that returns a Float pointer, so (float * ()) is its model. With these knowledge of the armed, we can now prepare (* (void (*) ()) 0) (). We can divide it into two parts. First, suppose we have a variable FP, which contains a function pointer, and we want to call the function points to the FP. You can write: (* fp) (); if the FP is a pointer to the function, * fp is the function itself, so (* fp) () is a way to call it. Brackets in (* fp) are must, otherwise this expression will be analyzed * (fp ()). We now have to find an appropriate expression to replace FP. This problem is our second step analysis. If c can read and understand the type, we can write: (* 0) (); but this is not, because * operator requirements must have a pointer as his operand. In addition, this operand must be a pointer to a function to ensure that the result of * can be called. Therefore, we need to convert 0 to a type that can describe "pointers that point to a VOID". If the FP is a pointer to the function that returns VOID, (* fp) is a VOID value, and its declaration will be like this: void (* fp) (); therefore, we need to write: void * fp) (); (* fp) () to declare a dumb variable. Once we know how to declare this variable, we also know how to convert a constant to this type: as long as the name is removed from the variable declaration. Therefore, we will convert 0 to a "pointer to the function to return VOID" below: (Void (*)) 0 Next, we use (void (*)) 0 to replace FP: (* (Void (*) ()) 0) (); the semicolue at the end is used to convert this expression into a statement. Here, we don't use the typedef declaration when we solve this problem. By using it, we can solve this problem more clearly: typedef void (* funcptr) (); (* (* (funcptr) 0) (); 2.2 operator does not always have the priority assumption you imagined, there is a statement The constant Flag is an integer that is set in the binary representation (in other words, it is a power of 2), and you want to test a integer variable Flags that this bit is set. The usual way of writing is: IF (Flags & Flag) ... It is meaningful for many C programmers: whether the result of the expression in parentheses in parentheses is 0. For clear purposes we can write it more specified: IF (Flags & Flag! = 0) ... This statement is now easier to understand. But it is still wrong, because! = Ratio & binding is more tight, so it is analyzed as: if (Flags & (Flag! = 0)) ... this (occasionally) is ok, such as Flag is 1 Or 0 (!), But for other 2 power is not a footnote [2]. Suppose you have two integer variables, h and l, which are between 0 and 15 (including 0 and 15), and you want R to set R to 8-bit values, which is L, high h.

A natural way is: r = h << 4 1; unfortunately, this is wrong. The addition is more tight than the shift binding, so this example is equivalent to: r = h << (4 L); two correct methods: r = (h << 4) L; r = h < <4 | L; A method to avoid this problem is to enclose all things in parentheses, but the brackets in the expression will be difficult to understand, so it is best to remember the priority in C. Unfortunately, there are 15, too difficult. However, by grouping them can be easily. Binding the most tight operator is not a real operator: subscript, function call, and structural selection. These are associated with the left. Next is a yuan operator. They have the highest priority in the true operator. Since the function call is better than the one-component operator, you must write (* p) () to call the P-pointed function; * p () means that P is a function that returns a pointer. The conversion is a one-dimensional operator and has the same priority as other one-component operators. One yuan operator is the right binding, so * p represents * (p ), not (* p) . The next is the true binary operator. The math operator has the highest priority, then the shift operator, the relationship operator, the logical operator, the assignment operator, and finally the conditional operator. Two important things that need to be remembered are: All logical operators have a low priority than all relationship operators. The shift operator is more closely tied to the relational operator, but it is not as good as the mathematical operator. In these operator categories, there are some strange places. Multiplication, division and sub-expensive have the same priority, addition and subtraction has the same priority, and the shift operator has the same priority. There is also six relational operators that do not have the same priority: == and! = The priority is lower than other relationship operators. This allows us to determine whether A and B have the same order as C and D, for example: a

The above example is not difficult to write correctly: While ((c = getc (in))! = EOF) PUTC (C, OUT); however, this error is difficult to discover in many complex expressions. For example, the LINT program released with the UNIX system usually has the following error line: IF (((t = btype (pt1-> aty) == start || t == unionty) {This statement wants to give T A value, then look at whether T is equal to Strty or UnionTy. And the actual effect is large, the priority of the logical operator in C has historical reasons. B - C Senior - Have and C The logical operators corresponding to the operator. Although their definition is bitbus, the compiler is deemed to be the same as && and | in the condition. When they separate them, they are separated in C. The priority change is a very dangerous footnote [4] .2.3 Take a look at these semicolons! C which is usually different: or an empty statement, no effect; or compiler may Propose a diagnostic message to make it easy to remove it. An important difference is that in the IF and WHILE statements that must be followed by a statement. Consider the following example: if (x [i]> big); BIG = X [i] This does not compile errors, but this program is meaning and: if (x [i]> BIG) BIG = x [i]; it is different. The first block equivalent to: if (x [I]> BIG) {} BIG = x [i]; is equivalent to: BIG = x [i]; (unless x, i or big macro is with side effects). Another branch caused huge Different places are the end of the function definition in the end of the structure declaration [Translation: This sentence is not very nice, see the examples ". Consider the following program fragment: struct foo {int x;} f () {...} After the first} next to F lost a semicolon. Its effect is to declare a function f, the return value type is Struct Foo, which constitutes a part of the function declaration. If there is a semicolon, a semicolon, Then f will be defined as a default integer return value footnote [5]. 2.4 Switch statement usually in the Switch statement in c in c, can enter the next. For example, consider the following C and PASCAL sequence: Switch (Color ) {Case 1: Printf ("red"); Break; Case 2: Printf ("Yellow"); Break; Case 3: Printf ("blue"); Break;} Case Color OF1: Write ('red'); 2: Write ('Yellow'); 3: Write ('Blue') End These two program pieces are identical: depending on the value of the variable color is 1, 2 or 3 print Red, Yellow, or Blue (no new bank). These two program pieces are very similar, only a point difference: There is no CAK statement in the Pascal program. The CASE tag in C is a real label: The control process can enter a case tag without restrictions.

Take a look at another form, assuming that the C program seems more like Pascal: switch (color) {casse 1: printf ("red"); case 2: printf ("Yellow"); case 3: Printf ("blue" );} And assume that the value of the color is 2. The program will print YellowBlue because the control is naturally transferred to the next Printf () call. This is both the advantage of the C language Switch statement and its weakness. Say it is weak, because it is easy to forget a BREAK statement, resulting in an unusual behavior of the program. Say it is the advantage, because by deliberately removing the Break statement, it can easily implement the control structure that other methods can be implemented. Especially in a large SWITCH statement, we often find the processing of a CASE to simplify other special processes. For example, it is imagined that a program is a translator of a virtual machine. Such a program may contain an Switch statement to process various opcodes. On such a machine, it is usually subtracted to become the same after the second arithmetic number is changed. Therefore, it is best to write such a statement: Case Subtract: OPND2 = -OPND2; / * No Break; * / Case Add: ... Another example, consider the compiler to find a marker by skipping the blank character. Here, we will see the space, tab, and new rows, except for new lines, it is necessary to increase the growth of row counter: case '/ n': linecount ; / * no break * / case '/ t ': Case' ': ... 2.5 Function calls and other programming languages, C requires a function call must have a parameter list, but there is no parameters. Therefore, if F is a function, f (); is the statement called the function, and f; nothing. It will be evaluated as a function address, but it will not call it footnote [6]. 2.6 Suspension ELSE Issues We will not forget to mention this problem when discussing any grammar defects. Although this problem is not unique to the C language, it still hurts those C programmers who have experienced experience. Consider the following program segment: if (x == 0) ife (y == 0) error (); else {z = x y; f (& z);} The purpose of the programmer writing this program is obviously The situation is divided into two: x = 0 and x! = 0. In the first case, the block is not done unless Y = 0 is called Error (). In the second case, the program sets z = x y and calls f () as a parameter as the address. However, the actual effects of this program are greatly different. The reason is that an ELSE is always associated with its nearest IF. If we want this program to operate according to the actual situation, you should write: if (x == 0) {if (y == 0) error (); else {z = x y; f (& z);} } In other words, when X! = 0 does not do anything.

If you want to achieve the effect of the first example, you should write: if (x == 0) {if (y == 0) error ();} else {z = z y; f (& z);} 3 link one The C program may have a lot of components, which are compiled separately, and is bound to a program typically referred to as a linker, a link editor, or a loader. Since the compiler can only see a file at a time, it cannot detect the contents of multiple source files that require programs to discover. In this section, we will see some of this type of error. There are some C implementations, but not all, with a program called LINT to capture these errors. If you have such a program, it is not too much whether it emphasizes its importance. 3.1 You must check the external type assumption that you have a C program and is divided into two files. One includes the following statement: int N; makes a declaration: long n; this is not a valid C program, because some external names are declared in two files as different types. However, many implementations are not detected, because the compiler does not know the content of another file when compiling one of the files. Therefore, the taste of the check type can only be done by the linker (or some tool programs such as a LINT); if the linker of the operating system cannot identify the data type, the C compiler can not force it too much. So what happens when this program is running? This has many possibilities: achieving smart enough, capable of detecting type conflicts. Then we get a diagnostic message that n has different types in two files. The implementation you use will consider int and long as the same type. Typically, the machine can naturally carry out 32-bit operations. In this case your program may be able to work, it seems to be declared twice as long (or int). But the work of this program is purely casual. Two instances of N require different storage, which share the storage area in some way, that is, the assignment of one of them is also valid. This may happen, for example, the compiler can arrange int. Regardless of whether the system is based on the machine, the operation of this program is also occasionally. Two instances of N are shared in another way, that is, the effect of assigning one of them is different to another value. In this case, the program may fail. An example of this happened is surprisingly frequent. A file of the program contains the following declaration: char filename [] = "etc / passwd"; and another file contains this declaration: char * filename; although the behavior of the array and pointers in some environment, they It is different. In the first statement, FileName is the name of a character array. Although the name of the array can generate a pointer to the first element of the array, this pointer onlys only occurred and will not continue. In the second declaration, FileName is the name of a pointer. This pointer can point to the programmer anywhere to point to. If the programmer does not give it a value, it will have a default 0 value (NULL) [Decolification: In fact, a pointer to the initialized pointer usually has a random value, which is very dangerous! ]. These two declarations use the storage area in different ways, they can't coexist. A way to avoid this type of conflict is to use tools such as LINT (if possible). In order to check the type conflict between a program's different compilation units, some programs need to see all of them. A typical compiler cannot be completed, but Lint can. Another way to avoid this problem is to put the external statement into the included file.

At this time, the type of an external object has only one footnote [7]. 4 Semantic defects can be precisely spelled and there is no syntax error, but it is still meaningless. In this section, we will see some programs will make them look a meaning, but it is actually another completely different meaning. We must also discuss the environment that looks reasonably on the surface but actually does not define the result. What we discuss here does not guarantee that you can work in all C. We temporarily forget these things that can work in some implementations but may not work in other implementations, until Section 7 discussions can perform problems. 4.1 Expression Quotation Sequence Some C operators evaluate their operands in a known, specific order. But other can't. For example, consider the following expression: a

Consider the following sections used to find a specific element in a table: i = 0; While (i

In fact, C implementations typically allow a file to include the include statement to include a statement of SQRT () these library functions, but for programmers who write functions, writing declarations are also necessary - or say extraordinary The people of the C procedure are necessary. Here is a more spectacular example: main () {Int i; char C; for (i = 0; i <5; i ) {scanf ("% d", & c); Printf ("% d", i) ,} PRINTF ("/ n");} On the surface, this program reads five integers from standard inputs and writes 0 1 2 3 4 to standard output. In fact, it doesn't always do this. For example, in some compilers, its output is 0 0 0 0 1 2 3 4. Why? Because the declaration of C is char than INT. When you let Scanf () read an integer, it needs a pointer to an integer. But here it gets a pointer of a character. But scanf () does not know that it doesn't get it needs: it will be viewed as a pointer to integer and stores an integer to there. Since integer occupies more memory, this will affect the memory near C. It is exactly what is the compiler in the vicinity; in this case this is likely to be the low position of I. Therefore, whenever a value is read into C, I is set to zero. When the program finally reaches the end of the file, scanf () no longer attempts to put a new value to c, i can grow normally until the end of the loop. 4.5 Pointer is not an array C program typically convert a string into a character array ending with an empty character. Suppose we have two such strings S and T, and we want to connect them as a separate string R. We usually be done using library functions strcpy () and strcat (). The following obvious method does not work: char * r; strcpy (r, s); strcat (r, t); this is because R is not initialized to point to anywhere. Although r may potentially represent a piece of memory, this does not exist until you assign it. Let's try again, allocate some memory for R: Char R [100]; strcpy (r, s); strcat (r, t); this is only when the string pointed to by S and T can jobs. Unfortunately, C requiring us that the size specified for the array is a constant, so it cannot be determined if R is large enough. However, many C implementation with a library function called Malloc (), which accepts a number and allocates so much memory. There is also a function strlen (), you can tell us how many characters in a string: therefore, we can write: char * r, * malloc (); r = malloc (Strlen (S) Strlen (t)) Strcpy (r, s); strcat (r, t); however, this example will fail because of two reasons. First, malloc () may deplete the memory, and this event is only represented by quietly returning an empty pointer. Second, more importantly, malloc () does not assign enough memory. A string is ended with an empty character. The strlen () function returns the number of characters contained in its string parameters, but does not include empty characters ending. Therefore, if strlen (s) is n, then s needs n 1 characters to hold it. So we need to allocate an additional character for R.

Plus the inspection Malloc () is successful, we get: char * r, * malloc (); r = malloc (Strlen (S) Strlen (T) 1); if (! R) {company (); exit (1);} STRCPY (R, S); STRCAT (R, T); 4.6 Synecdoche, SIN-ECK-duh-key) is a literary technique, a bit similar to the metaphor or metaphor In the Oxford English Dictionary as follows: "A More Comprehensive Term is buy for a less comprehensive or vice versa; as whole for part or pedies for genus, ETC. (will be used as a comprehensive unit Uncommon units, or vice versa, as a whole, generally to the whole, generally to special or special pairs, etc.) "This can accurately describe the pointer to the data of the data to be pointed to it. It will often happen in a string. For example: char * p, * q; p = "xyz"; although the value of P is considered to be XYZ is sometimes useful, this is not true, it is very important to understand this. The value of P is a pointer to the 0th element in an array of four characters, which is 'x', 'Y', 'Z' and '/ 0'. So if we do now: Q = P; P and Q will point to the same memory. The characters in the memory are not copied by assigning values. This situation looks like this:

To remember, copying a pointer does not copy what it points to. Therefore, if we execute: Q [1] = 'y'; q pointed to the memory contains a string XYZ. P is also because P and Q point to the same memory. 4.7 Empty pointer is not an empty string to convert an integer into a pointer to implement the relevant-dependent, except for an exception. This exception is constant 0, which guarantees a pointer that is converted into a pointer that is not equal to other effective pointers. This value is usually similar to this: #Define null 0 but its effect is the same. An important thing to remember is that when it is used as a pointer, it will never be released. In other words, after you assign 0 to a pointer variable, you cannot access the memory it points to. Can't write: if (p == (char *) 0) ... can not write: if (strcmp (p, (char *) 0) == 0) ... because strcmp () always passes Parameters to view memory addresses. If P is an empty pointer, it is also invalid: Printf (p); or Printf ("% s", p); 4.8 Integer overflow C language About the overflow or underflow of integer operations is very clear. As long as one operand is unsigned, the result is no symbol, and is molded in 2 ^ N, where n is the word length. If the number of operands are symbol, the result is undefined. For example, suppose A and B are two non-negative integer variables, you want to test whether A B overflow.

An obvious way is this: if (A B <0) complain (); usually, this is not working. Once A B has spill, it is meaningless to any beta of the result. For example, on some machines, an additional operation sets an internal register to four states: positive, negative, zero or overflow. On such a machine, the compiler has the right to implement the above example to first add A and B, and then check if the internal register status is negative. If the operation overflows, the internal register will be in an overflow state, which will fail. A correct way to succeed this special test is to depends on a good definition of unsigned arithmetic, both to be converted between symbols and unsigned: if ((int) (unsigned) B <0) Complain (); 4.9 Two reasons for shift operators will make troubles using shift operators: In the right shift operation, the empty bit is filled with 0 fill or use symbolic pads? What are the number of shifts? The answer to the first question is simple, but sometimes it is related. If the operand to be shifted is unsigned, it will be moved into 0. If the operand is a symbol, the implementation has the right to decide whether to move into 0 or move into the symbol bit. If you care about the space in a right shift, use unsigned to declare the variable. This way you have the right to assume that the vacancy is set to 0. The answer to the second question is equally simple: if the length of the length to be shifted is N, the number of shifts must be greater than or equal to 0 and is strictly less than n. Therefore, it is impossible to remove all bits from the variable in a separate operation. For example, if an int is 32 bits, and N is an int, write n << 31 and n << 0 is legal, but n << 32 and n << -1 are illegal. Note that even if the symbol is moved into the vacancy, the right shift operation of a symbolic integer and the power of 2, is not equivalent. To prove this, it is impossible to consider (-1) >> 1, this is not 0. [Translation: (- 1) / 2 results are 0. ] 5 library functions Each useful C program uses library functions, because there is no way to build input and output into the language. In this section, we will see some non-expected behaviors that are widely used in some cases in some situations. 5.1 getc () Returns the following programs: #include main () {char C; while ((c = getchar ())! = EOF) Putchar (c);} This program seems to seem To talk about standard input to standard output. In fact, it doesn't matter all all. The reason is that C is declared as a character rather than an integer. This means that it will not receive all characters that may appear include EOF. So there are two possibilities here. Sometimes some legal input characters can cause the same value of C to carry and EOF, and sometimes C will not store the EOF value. In the previous case, the program will stop replication in the middle of the file. In the latter case, the program will fall into an infinite loop. In fact, there is still a third possibility: the program will accidentally work correctly. C Language Reference Manual strictly defines the result of expression ((c = getchar ())! = EOF). It is declared in its 6.1: When a longer integer is converted to a shorter integer or a char, it will be cut on the left; the excess bits are simply discarded. 7.14 Declaration: There are a lot of assignment operators, they are all combined from right to left. They all require a left value as an operand on the left, and the type of assignment expression is the type of operand on the left.

Its value is the value of the left operating number of values. The combined effect of these two clauses is that the high level of getchar () must be discarded, and the truncated value is compared to EOF later. As part of this comparison, c must be expanded into an integer, or take the zip of the left side 0 to fill, or appropriately taking a symbol extension. However, some compilers do not implement this expression correctly. They do to assign a lower number of getchar () values ​​to C. But in the comparison of C and EOF, they use the value of getchar ()! The compiler that makes this instance can be "correctly" work. 5.2 Buffer Output and Memory Allocation When a program generates an output, how important it is to see it immediately? This depends on the program. For example, the terminal is displayed on the terminal and requires people to answer one problem in front of the terminal, people can see the output to know what the input is critical. On the other hand, if it is output to a file, and eventually transmitted to a row printer, only all the output can eventually reach there. The display immediately arranged is usually much more expensive than the output of it is temporarily stored in a large piece. Therefore, C realization usually allows the programmer control to generate how much output is actually written. This control is often defined as a library function called SetBuf (). If BUF is a character array with an appropriate size, setbuf (stdout, buf); the output will be written to STDOUT to be buffered as an output buffer as a buf, and wait until BUF is full or programmer Fflush () is actually written. The appropriate size of the buffer is defined in as bufsiz. Therefore, the following program explains the standard input to standard output by using setbuf (): #include main () {Int C; char buf [buffs]; setbuf (stdout, buf); while (c = getchar ())! = EOF) Putchar (c);} Unfortunately, this program is wrong because a subtle reason. To know where the problem is, we need to know when the last refresh is in the buffer. Answer: After the main program is completed, the cleaning of the cleaning that is executed before the control is retracted to the operating system. At this moment, the buffer has been released! There are two ways to avoid this problem. First, use a static buffer or explicitly declare it as static: static char buf [buffs]; or move the entire statement to the main function. Another possible way is to dynamically allocate buffers and never release it: char * malloc (); setbuf (stdout, malloc ()); Note In the latter case, you don't have to check the return value of malloc () Because if it fails, an empty pointer will be returned. SetBUF () can accept an empty pointer as its second parameter, which will make STDOUT into non-buffered. This will run very slow, but it can run. The 6 pre-regulator running procedure is not what we have written: because the C preparation is first converted. For two main reasons (and many times), the preprocessor provides us with some simplified ways. First, we hope to change all instances of a special amount (such as the size of the table) by changing a number and recompiling the program [9]. Second, we may want to define something, they look like a function but there is no function to call the required operation overhead. For example, Putchar () and getChar () are usually implemented as macros to avoid function calls to each character's input and output.

6.1 Macro is not a function. Some programmers sometimes treat them equivalent because macros can be like a function. Therefore, look at the definition below: #define max (a, b) ((a)> (b)? (A): (b)) Pay attention to all brackets in the macro. They are in order to prevent A and B from being brought to a ratio> priority low. An important issue is that the macro of MAX () is two times and will be evaluated twice. Therefore, in this example, if the A is large, A will be evaluated twice: once in comparison, and the other is when calculating the max () value. This is not only inefficient, and there will happen: biggest = x [0]; i = 1; while (i (Biggest): (x [i ])); First, BiggeST is compared to X [i ]. Since i is 1 and X [1] is 3, this relationship is "false". Its side effects is that I increases to 2. Since the relationship is "false", the value of X [i ] is to be assigned to Biggest. However, at this time I becomes 2, so the value assigned to the Biggest is the value of X [2], that is, 1. Avoiding these issues is to ensure that the parameters of the max () macro have no side effects: Biggest = x [0]; for (i = 1; i _ cnt> = 0? (* (P) -> _ ptr = (x)): _flsbuf (x, p)) PUTC () The first parameter is a character to be written to the file, and the second parameter is a pointer to one internal data structure representing the file. Note that the first parameter can be used to use something like * z , although it appears twice in the macro, it will only be evaluated once. The second parameter will be evaluated twice (in the macro body, X appear twice, but due to its two appearances in one: two sides, there is a case in the PUTC () There is only one evaluation value). Since the file parameters in the PUTC () may have side effects, this occasionally there will be problems. However, in the user manual documentation: "Since PUTC () is implemented as a macro, it may have side effects. In particular, PUTC (C, * F ) does not work correctly." But PUTC (* C , F) It is possible to work in this implementation. Some C is very unhappy. For example, no one can handle PUTC (* C , F) correctly. Another example, consider the TouPper () function that appears in many C libraries. It converts a lowercase letter into a corresponding uppercase letter, while other characters are unchanged.

If we assume that all lowercase letters and all uppercase letters are adjacent (there may be gap between case), we can get such a function: TouPper (c) {if (c> = 'a' &&c <= 'Z') C = 'A' - 'A'; RETURN C;} In many C implementations, in order to reduce the call overhead than actual calculations, it is usually implemented as a macro: #define TouPper c) ((c)> = 'a' && (c) <= 'z'? (c) ('a' - 'a'): (c)) Many times this is indeed a better than the function. However, when you try to write TouPper (* p ), we will appear strange results. Another place to note is that the use of macros can produce huge expressions. For example, continue to consider max () definition: #define max (a, b) ((a)> (b)) assumes that we define this definition to find A, B, C, and D Maximum value. If we write directly: max (A, Max (B, Max (C, D))) it will be expanded to: ((a)> (((((c)> (d)? (C) ):? (d))) (b):? (((c)> (d) (c):? (d))))) (a): (((b)> (((c)> (d)? (c): (d))))))))))))))))))))))))))))))))))))))))): (((c)> (d)): (D)))))) This surprisingly huge. We can make it shorter by balanced operands: max (Max (A, B), Max (C, D)) This will result: (((a)> (b)? (A): (b) ))> (((c)> ((((((a)> (b)? (a): (b))): ((c)> ( D)? (c): (d)))) This does still write: biggest = a; if (Biggest

In contrast, both C and D are defined as a pointer to the structure, as T2 is like a true type. 7 Portable defects C is implemented and run on many machines. This is also the reason why the C procedure written in one place should be easily transferred to another programming environment. However, because there are many implementations, they don't communicate with others. In addition, different systems have different needs, so the C implementation on a machine and how many on the other. Since many early C implementations are related to UNIX operating systems, the nature of these functions is detailed in the system. When some people began to implement C in other systems, they try to make the behavior of the library similar to the behavior in the UNIX system. But they are not always able to succeed. What's more, many people start from different versions of UNIX systems, and some of the nature of some library functions inevitably occur. Today, a C programmer must know many of these subtle differences if they want to write programs for users in different environments. 7.1 What are the names in a name? Some C compilers treat all characters in an identifier as a signature. And other characters outside the storage identifier will ignore a limit. The target program generated by the C compiler will be handled by the loader to access subroutines in the library. The loader typically applies its own constraints for the names they can handle. A common loader constraint is that all external names must only be capitalized. In the face of such a loader constraint, the C realizes will force all external names to be capitalized. Such constraints are described in Section 2.1 of the C LAN Reference Manual. A identifier is a character and a digital sequence, and the first character must be a letter. Underline _ calculates letters. Capital letters and lowercase letters are different. Only the top eight characters are signatures, but you can use more characters. There are more limitations that can be used by a variety of assembler and loaders: Here, some examples continue to give some examples such as some implementation require the external identifier to have separate cases, or less than eight There are characters, or both. Because of all of this, it is important to choose the identifier carefully in a program that can be portable. Selecting Print_fields and Print_float for two subprograms. This is not a good way. Consider this significant function: char * malloc (unsigned n) {char * p, * malloc (); p = malloc (n); if (p == null) PANIC ("out of memory"); Return P; } This function is a simple way to ensure exhautive memory and will not lead to no testing. Programmers can replace malloc () by calling mallo (). If malloc () misfortune fails, PANIC () will be called to display an appropriate error message and terminate the program. However, what happens when the function is used in a system that ignores the case difference. At this time, the name Malloc and Malloc are equivalent. In other words, the library function malloc () is completely replaced by the Malloc () function above, and when Malloc () is called itself. Obviously, the result is that the first attempts to allocate memory will fall into a recursive loop and confusion. But in some implementations that can be distingurated, this function is still working. 7.2 How big is an integer? C Provide three integers sizes for programmers: ordinary, short and long, and characters, their behavior is like a small integer. C language definitions The size of the various integers is not guaranteed: the four dimensions of the integer are non-decreasing. The size of ordinary integers should be sufficient to store any array subscript. The size of the character should reflect the essence of a particular hardware. Many modern machines have 8 characters, but there are some 7-bit or 9 characters. So characters are usually 7, 8 or 9 bits. Long integer is usually at least 32 bits, so a long integer can be used to represent the size of the file. Normal integers are usually at least 16 bits, because too small integers will limit the maximum size of an array.

A short integer is always 16 bits. What does this mean in practice? The most important point is that don't expect any specific accuracy. Informal case you can assume a short integer or a normal integer is 16 bits, and a long integer is 32-bit, but it does not guarantee that there will always be these sizes. You can of course use ordinary integers to compress the size and subscript, but when a variable must store a tens of thousands of numbers? A more portable approach is to define a "new" type: typef long Tenmil; now you can use this type to declare a variable and know that it's width, in the worst case, you also change this separate Type definitions allow all of these variables to have the correct type. 7.3 The character is a symbol or no symbol? Many modern computers support 8 characters, so many modern C compilers implement the characters as an 8-bit integer. However, not all compilers explain these 8 digits in the same way. These issues are especially important when converting a char value into a larger integer. For the opposite conversion, the result is a good definition: excess bit is simply dropped. But a compiler converts a char to an int but need to make a selection: Is Char considerate as a symbol or no sign? If the former, the CHAR is extended to int symbolic bits; if it is the latter, you should use 0 to populate 0. The results of this decision are very important for those who are used to high position 1 when handling characters. This determines that the 8-bit character range is from -128 to 127 or from 0 to 255. This also affects the design of the programmer to the hash table and the conversion table. If you care about whether a character value is seen as a negative position, you should explicitly declare it as unsigned char. This ensures that it is a symbol in some implementations when converting to an integer, and is not as symbols in some implementations as a normal CHAR variable. In addition, there is a misunderstanding that when C is a character variable, it can be written to a unsigned integer with the equivalent price of C. This is wrong because a char value is converted to INT before any operation (including conversion). At this time, C will first convert to a symbol integer to convert to an unsigned integer, which will produce strange results. The correct method is to write (unsigned char) c. 7.4 Right shift is a symbol or no symbol? Here again, repeat: How to care about how to move the right movement is best declared to the quantity of all to be shifted as unsigned. 7.5 How to round? Suppose we use B to add A to a quote for q q = a / b; r = a% b; we temporarily assume B> 0. What is the association between A, B, Q and R? Most importantly, we expect Q * B R == a because this is the definition of the remainder. If the symbol of A changes, we expect Q's symbols to change, but the absolute value is unchanged. We want to guarantee R> = 0 and R

This ensures the first point and the second point. A lot of C is doing this. However, the definition of the C language guarantees only the first point and | r | <| b | and when A> = 0 and B> 0 r> = 0. This is smaller than the second point or third point, in fact, some compilers satisfy the second or third point, but not common (if an implementation may always be rounded to the farthest direction) . Although sometimes there is no need to flexibility, the C language is enough to allow us to do what we have to do, provide what we want to know. For example, suppose we have a number n to represent some of the functions of characters in an identifier, and we want to get a hash table inlet H, where 0 <= h <= HashSize. If we know that n is non-negative, we can write: h = n% hashsize; however, if n is possible to be negative, this is not good, because h may be negative. However, we know H> -hashsize, so we can write: h = n% HashSize; if (n <0) h = hashsize; also, the n declare is unsigned. 7.6 How big is a random number? This size is fuzzy and also affected by the design of the library. In the PDP-11 footnote [10], only the only C implementation running on the machine, has a function called RAND () to return a (pseudo) random non-negative integer. The integer length of the PDP-11 includes a symbol bit 16 bits, so the RAND () returns an integer between 0 and 215-1. When C is implemented on VAX-11, the length of the integer becomes 32 positions. So what is the RAND () function on VAX-11? For this system, the University of California believes that the return value of RAND () should cover all possible non-negative integers, so their RAND () version returns an integer between 0 and 231-1. The people at AT & T feel that if the RAND () function returns a value between 0 and 215, it is easy to expect RAND () to return a program that is less than 215, ported to VAX-11 . Therefore, it is difficult to write a program that does not depend on implementation and call the RAND () function. 7.7 Calculation Switch TouPper () and TOLOWER () functions have similar history. They were originally implemented as Hong: #define Toupper (C) ((C) 'A' - 'A') # Define TOLOWER ((C) 'A' - 'A') When a given one When lowercase letters are input, TouPper () will generate corresponding uppercase letters. TOLOWER (). Both macros rely on the realized character set, which requires all the differences between uppercase letters and corresponding lowercase letters are constant. This assumption is valid for ASCII and EBCDIC character sets, possibly not very dangerous, because these unmatched macros can be encapsulated into a separate file and contain them. These macros have a defect, namely: when a given thing is not an appropriate character, it will return garbage.

Therefore, the following is unable to work by using these macros to convert a file to lowercase: INT C; while ((c = getchar ())! = EOF) Putchar (Tolower (c)); we must write: INT C; WHILE ((c = getchar ())! = EOF) Putchar (isupper (c)? TOLOWER (C): c); in this, Unix Development Organization in AT & T reminds us, TouPper () and Tolower It is tested by some appropriate parameters in advance. Consider rewriting these macros: #define topper ((c)> = 'a' && (c) <= 'z'? (C) 'a' - 'A': (c)) # define TOLOWER (C)> = 'A' (c) <= 'z'? (c) 'A' - 'a': (c)) But know, here C's three appearances must As a result, this will destroy the expression such as TouPper (* p ). Therefore, you can consider rewriting TouPper () and TOLOWER () as a function. TouPper () looks like this: int Toupper (int C) {IF (c> = 'a' && c <= 'z') Return C 'A' - 'A'; return c;} TOLOWER () similar. This change brings more questions that will introduce the function call overhead each time you use these functions. Our hero thinks some people may not be willing to pay these overhead, so they will be named this macro: #define _toupper (c) ((c) 'a' - 'a') # define _tolower (c) ((C) ) 'A' - 'a') This allows users to choose convenient or speed. In fact, there is only one question: Berkeley's people and other C realizes did not follow this. This means that a program that uses TouPper () or TOLOWER () written on the AT & T system, and if not delivering the correct case letter parameter, you may not work in other C implementations. If you don't know these history, it may be difficult to track such errors. 7.8 First release, reassign a number of C implementations provide users with three memory allocation functions: malloc (), realloc (), and free (). Call Malloc (n) Returns a pointer to a newly allocated memory with n characters, which can be used by programmers. Passing a pointer to the memory allocated by malloc () can be reused by malloc (). By a pointer to the assigned area and a new size call realloc () can expand or narrow the new dimensions, which may be copied during this process. Maybe someone will think, the truth is a bit slight. Below is a description of the Realloc () appearing in the System V interface definition: Realloc changes a block of the Size by PTR, and returns the pointer (possibly moving) pointer. The content below one size in the new old size will not be changed. This segment contains a copy of this paragraph in the reference manual of the Unix System.

In addition, another segment describing Realloc (): If the block points to the PTR pointing after the last call Malloc, Realloc or Calloc, Realloc can still work; therefore, the order of Free, Malloc and Realloc can utilize Malloc compression Finding strategy stored. Therefore, the following code snippet is legal in the Unix seventh edition: Free (p); p = realloc (p, news); this feature is retained in a system derived from UNIX seventh edition: you can release a piece first The storage area and then reassign it. This means that the content in the memory released in these systems can be guaranteed until the next memory allocation. Therefore, in these systems, we can use the following strange idea to release all the elements in a linked list: for (p = head; p! = Null; p = p-> next) free (char *) P ); Don't worry that free () can cause P-> next unavailable. Needless to say, this technology is not recommended, because not all C implementation can keep its content long enough time after memory is released. However, the seventh edition of the manual has left an undeclared question: the original implementation of Realloc () is actually necessary to release the redistribution. For this reason, some C processes are released first, and there is a problem when they are transplanted into other implementations. 7.9 A instance of a portability problem lets us look at a problem that many people have solved many times. The following program comes with two parameters: a long integer and a function (pointer). It converts an integer transition bit decimal number and uses characters representing each of them to call a given function. Void PrintNum (long n, void (* p)) {if (n <0) {(* p) ('-'); n = -n;} if (n> = 10) printnum (N / 10) , p); (* p) (N% 10 '0');} This program is very simple. First check if N is negative; if yes, a symbol is printed and n becomes a positive number. Next, the test is N> = 10. If yes, it contains two or more numbers in its decimal representation, so we recursively call printnum () to print all the numbers outside the last digit. Finally, we print the last digit. This program - due to its simple - with many portability issues. The first is a method of converting N low digit into a character form. Using n% 10 to get the value of the low digit is good, but adding '0' to get the corresponding character, it is not good. This addition assumes that the number of characters corresponding to the number of orders in the machine, and therefore, the value of '0' 5 and '5' is the same, and so on. Although this assumption is established for the ASCII and EBCDIC character sets, it may not be found for other machines. Avoiding this problem is to use a table: void printnum (long n, void (* p) ()) {if (n <0) {(* p) ('-'); n = -n;} IF N> = 10) PrintNum (N / 10, P); (* P) ​​("0123456789" [N% 10]); Another problem occurs when n <0. At this time, the program prints a negative number and sets N to -n.

This assignment can overflow because the negative number of negative numbers that can be represented on the machine using 2 is more than the positive number. For example, one (long) integer has K-bit and an additional bit representation symbol, then -2 ^ K can be represented and 2 ^ K cannot be. There are many ways to solve this problem. The most intuitive one is to assign N to a UNSIGNED long value. However, some C compilers may not implement unsigned long, so let's take a look at how it is. On the first implementation and the second implementation machine, change the symbol of a positive integer guarantee that there is no overflow. The problem is only when there is a symbol that changes a negative number. Therefore, we can avoid this problem by avoiding the N to become positive. Of course, once we print a negative symbol, we can see the negative and positive numbers as the same. The following method is forced that N is negative after the print symbol, and all of our algorithms are completed with a negative value. If we do this, we must ensure that the part of the print symbol in the program is only executed; a simple method is to divide this program into two functions: Void PrintNum (long n, void (* p) ()) {ix ( n <0) {(* P) ​​('-'); PrintNeg (N, P);} else printneg (-n, p);} void printneg (long n, void (* p) ()) {ix ( n <= -10) PrintNeg (N / 10, P); (* P) ​​("0123456789" [- (n% 10)]);} printnum () Now only checks if the number to print is negative; if it is If you print a symbol. Otherwise, it calls PrintNeg () with a negative value of N. We also changed the origin of Printneg () to accommodate n forever is negative or zero. What are we get? We use N / 10 and N% 10 to get N leading numbers and end numbers (via appropriate symbol transformation). Calling an integer division behavior is achieved when one of the operands is negative. Therefore, n% 10 may be positive! At this time, - (n% 10) is a positive number, which will exceed the end of our numeric character array. In order to solve this problem, we build two temporary variables to store merchants and remainder. After completing the method, we check if the remainder is within the correct range, if not, adjust these two variables. Printnum () has not changed, so we only list printneg (): void printneg (long n, void (* p)) {long {^ {r; if (r> 0) {r - = 10; Q ; } IF (n <= -10) {printneg (q, p);} (* p) ("0123456789" [- r]);} 8 Here is that there is also a lot of idle spaces that may make C programmers mistaken The place did not mention this article. If you find it, please contact the author. It will be included in the later version, and add a footnote expressed gratitude. Refer to "The C Programming Language" is the most authoritative C work. It contains an excellent tutorial for those who are familiar with other senior language programming, and a reference manual, simply describe the entire language. Although this language has undergone many changes since 1978, this book is still a pavilion for many themes. This book also includes "C Language Reference Manual" mentioned many times in this article.

"The C Puzzle Book" (Feuer, Prentice-Hall, 1982) is a book that is rarely cultivating people's ability. This book collected a lot of puzzles (and answers), and their solutions can test knowledge about the readers' somewhere in C language. "C: a referenct manual" (Harbison and Stele, Prentice Hall 1984) is a reference for realizes. Others will also find it is especially useful - because he can refer to the details. ------------- Footnotts 1. This article is an expansion based on book "c traps and pitfalls" (Addison-Wesley, 1989, ISBN 0-201-17928-8), interested readers You can read it. 2. Because! = The result is not 1 is 0. 3. Thank Guy Harris to point out this issue. 4. Dennis Ritchie and Steve Johnson also pointed out this problem. 5. Thanks to a unknown volunteer to ask this question. 6. Thanks to Richard Stevens to point out this problem. 7. Some C compilers require only one definition per external object, but there can be multiple declarations. When using such a compiler, how can we easily put a statement into an included file and put it in other places. This means that the type of each external object will appear twice, but this is more than two times. 8. Separate function parameters The comma is not a comma operator. For example, in F (X, Y), the acquisition order of X and Y is undefined, but is not the case in G ((x, y)). Where G has only one parameter. Its value is determined by evaluating the X, discarding this value, and then determines the value. 9. The pre-processor can also easily organize such explicit constants to easily find them. 10. PDP-11 and VAX-11 are trademarks of Array Equipment Group (DEC). Author Blog:

http://blog.9cbs.net/neptunex/

转载请注明原文地址:https://www.9cbs.com/read-82536.html

New Post(0)