[Translation] C language traps and defects (revised)

xiaoxiao2021-03-06  86

C language traps and defects [1]

Original: Andrew Koenig - AT & T Bell Laboratories Murray Hill, New Jersey 07094 Original: Favorites Translation: Lover_P

From: http://blog.9cbs.net/loverp/archive/2004/16/75725.aspx

[Revision instructions]

The first revision. Corrected most of the typical words and format errors in the article, and rewrite some sentences in accordance with Chinese habits.

[Translation]

Those who think that "learning" C language, please read this article carefully. The road is still long, and many things must learn. me too……

[Overview]

The C language is like a carving knife, sharpness, and is very useful in the hands of the technician. Like any sharp tool, C will hurt those people who can't master it. This article introduces the method of caught careless people in C language, and how to avoid harm.

[content]

0 Introduction 1 lexical defect

1.1 = not == 1.2 & and | Not && & || 1.3 multi-character markers 1.4 exception 1.5 strings and characters 2 syntax defects

2.1 Understanding Declaration 2.2 Operators do not always have the priority you imagined 2.3 to see these sections! 2.4 Switch Statement 2.5 Function Call 2.6 Suspension ELSE Question 3 Connection

3.1 You must check the external type 4 semantic defect

4.1 Expression Quotation Sequence 4.2 &&, || and! Operators 4.3 Skeppearance from zero start 4.4 C does not always convert the real parameters 4.5 pointer is not an array 4.7 Avoiding the Method 4.7 empty pointer is not empty string 4.8 integer overflow 4.9 Shift operator 5 library function

5.1 Getc () Returns an integer 5.2 Buffer Output and Memory Assignment 6 Preprocessors

6.1 Macro is not a function 6.2 macro is not type definition 7 portability defect

7.1 What are the names in a name? 7.2 How big is an integer? 7.3 The character is a symbol or no symbol? 7.4 Right shift is a symbol or no symbol? 7.5 How to round? 7.6 How big is a random number? 7.7 Case Conversion 7.8 Release, Reallocate 7.9 A Example of Portable Problem 8 This is an idle space reference footnote

0 Introduction

The C language and its typical implementation are designed to be easily used by experts. This language is simple and accompanied by expression. But there are some restrictions that protect those impetuous people. An impetuous person can get some help from these terms.

In this article, we will see these unknown benefits. It is because of its unknown, we can't make a complete classification. However, we still do this by studying things you need to do for a C program. We assume that the reader has at least a shallow understanding of the C language.

The first part studies the problem that occurs when the program is divided into a mark. The second part continues to study the problem that the program's marker is compiled as a declaration, expression, and statement. The third part studies the C procedures composed of multiple parts, compiling and bind them together. The fourth part handles the conceptual misunderstanding: what happens when a program is implemented. The fifth part studies our procedures and the relationship between the common libraries they use. In the sixth section, we noticed that the procedure we have written may not be the procedure we have; the preprocessor will first run. Finally, the seventh part discusses the replacement problem: a reason why a program that can run in an implementation cannot be run in another implementation.

1 lecity defect

The first part of the compiler is often referred to as a Lexical Analyzer. The lexical analyzer checks the character sequence of the assembled procedure and divides them into a marker as a sequence consisting of one or more characters, which has one (associated) uniform meaning when the language is compiled. In C, for example, the meaning of the marker-> has a significant difference between each independent character that makes it, and its significance is independent of the context environment. Another example, consider the following statement:

IF (x> BIG) BIG = X;

Each of the separated characters in this statement is divided into a marker, except for two instances of the keyword IF and identifier BIG.

In fact, the C procedure is divided into a marker twice. The first is the preprocessor read program. It must divide the program to discover the identifier of the identifier macro. It must replace the macro by evaluating each macro. Finally, the macro replacement program is collected into a compiler. The compiler resorted this stream for the second time.

In this section, we will explore universal misunderstandings of the signing, and the relationship between the marker and the characters that make up them. We will talk about the preprocessor later.

1.1 = not ==

Language derived from Algol, such as Pascal and ADA, using: = to assign a comparison with = indication. The C language is used to assign a value with = ==. This is because the frequency of assignment is higher than the comparison, so a shorter symbol is allocated.

In addition, C also treats assignment as an operator, so the multi-repetition value (such as A = B = C) can be easily written, and the assignment can be embedded in a large expression.

This convenient result has a potential problem: it may be written to assign values. Therefore, the following statement seems to look at whether X is equal to Y:

IF (x = y) foo ();

In fact, the value of x is set to Y and check whether the result is non-zero. A loop that is designed to skip spaces, tabors, and wraps below:

While (c == '|| c =' / t '|| c ==' / n ') c = getc (f);

Local programmers compared to '/ t' mistakenly use = instead of ==. This "comparison" is actually assigned '/ t' to C, then determines whether the (new) value is zero. Because '/ t' is not zero, this "comparison" will always be true, so this loop will exhaust the entire file. What happens after this depends on whether a program allows a program to read more than the file tail. This loop will run if it is allowed.

Some C compilers give a warning to the condition such as E1 = E2 to remind the user. When you really need to assign a variable after assigning a variable is not zero, in order to avoid warning messages in this compiler, it should be considered to give a comparative comparison. In other words,

IF (x = y) foo ();

Rewrite is:

IF ((x = y)! = 0) foo ();

This will clearly indicate your intention.

1.2 & | is not && and ||

It is easy to == 错 == because many other languages ​​use = indicate comparative operations. Other Optical Optical Operators are also & & &&, and | and || This is mainly because the & and | operators in the C language are significantly different in other languages. We will observe these operators near Section 4..

1.3 multi-character marker

Some C markers, such as /, *, and = only one character. And some other C markers, such as / * and ==, and identifiers, have multiple characters. When the C compiler encounters / and *, it must be able to decide whether to identify the two characters as two separate markers or a separate mark. C Language Reference Manual Describes how to determine: "If the input flows to a given string has been identified as a marker, the next character should be included to make the longest string that can constitute a mark ([translation] Usually the "maximum substring principle"). Therefore, if the / is the first character of a mark, and / follows the following *, the two characters constitute the beginning of the annotation, regardless of other context environments. The following statement looks like a value that is set to X with the value of the value to the value:

Y = x / * p / * p pointing to the division * /;

In fact, / * starts a comment, so the compiler simply engscape the program text until * / appears. In other words, this statement only sets the value of Y to x, but does not see P. Overrout this statement as:

Y = x / * p / * p pointing to the division * /;

Or simply

Y = x / (* p) / * p pointing to the division * /;

It can make an annotation implicit division.

This ambiguity is a problem in other environments. For example, the old version of C uses = represents = in the current version. Such a compiler will

A = -1;

Deserve

a = - 1;

or

a = a - 1;

This will make interactive

A = -1;

The programmer was surprised.

On the other hand, this old version of the C compiler will

A = / * b;

Find out

A = / * b;

Although / * looks like a comment.

1.4 exception

Combined assignment operators such as = are actually two marks. therefore,

A / * STRANGE * / = 1

with

A = 1

It is a meaning. It seems like a separate marker and actually has a number of marks. In particular,

P -> a

It is not legal. It and

P -> a

Not synonymous.

On the other hand, some old compilers still treat = as a separate marker and = is synonym.

1.5 strings and characters

Single quotes and dual quotes are completely different in C, which will cause strange results instead of error messages in some confusing contexts.

One of the characters surrounded in single quotes is only another way to write an integer. This integer is a given value in a corresponding value in the realized comparison sequence. Therefore, in an ASCII implementation, 'A' and 0141 or 97 represent exactly the same thing. A string surrounded in dual quotes, just writing a nominal number of pointers that have a double quotation between characters and an additional binary value of zero characterization.

The following two program pieces are equivalent:

Printf ("Hello WORLD / N");

Char hello [] = {'h', 'e', ​​'l', 'l', 'o', '', 'w', 'o', 'R', 'L', 'D', ' / N ', 0}; Printf (Hello);

Using a pointer instead of an integer usually get a warning message (vice versa), using double quotes instead of single quotes will also get a warning message (vice versa). But except for compilers that do not check the parameter type. Therefore

Printf ('/ n');

Printf ("/ n");

It is usually a strange result at runtime. ([Translation] Tip: As mentioned above, '/ n' represents an integer, which is converted to a pointer, and this pointer is meaningless.)

Since an integer is usually large enough, some C compiler allows multiple characters to be stored in one character constant. This means that "Yes" will not be discovered with 'yes'. The latter means "the first address of the four consecutive memory regions of Y, E, S and an empty character," and the former means "in some implementation defined styles" means by characters Y, E, S combined an integer "." Any consistency between the two is purely a coincidence.

2 syntactic defect

To understand C language programs, only to understand the mark constituting it is not enough. It is also understood how these marks constitute declarations, expressions, statements, and programs. Although these components are generally defined, these definitions are sometimes contrary to instinct or confusion.

In this section, we will focus on some unexpected syntactic structures.

2.1 understanding statement

I used to talk to some people, they were writing C procedures that were written on a small microprocessor on a small microprocessor. When the switch is turned on, the hardware calls the subroutine at 0.

In order to imitate the power to open, we want to design a C statement to explicitly call this subroutine. After some thinking, we wrote the following statement:

(* (void (*) ()) 0) ();

Such an expression will make the C programmer. But don't need this, because they can easily construct it with a simple rule: declare it in the way you are using.

Each C variable declaration has two parts: one type and a set of expressions with specific formats, which are desired to be used to evaluate this type. The simplest expression is a variable:

Float F, g;

Description Expression F and G - When evaluating - has type float. Due to the expression, crackers can be freely used:

Float ((f));

This representation ((f)) evaluation is Float and therefore, by inference, F is also a float.

The same logic is used in functions and pointer types. E.g:

Float ff ();

Indicates that expression ff () is a float, so FF is a function that returns a FLOAT. Similarly,

Float * pf;

Indicates that * PF is a float and therefore PF is a pointer to a FLOAT.

These forms of combined declarations are the same. therefore,

Float * g (), (* h) ();

Represents * g () and (* h) () are Float expressions. Since () ratio * is more tight, * g () and * (g ()) indicates the same thing: g is a function that returns a Float pointer, and H is a pointer to the function of returning float.

When we know how to declare a given type variable, we can easily write a type of model (CAST): as long as the variable name and the semicolon are deleted and all things are enclosed in a pair of parentheses. Therefore, due to

Float * g ();

The declaration G is a function that returns a Float pointer, so (float * ()) is its model.

With these knowledge of the armed, we can now prepare (* (void (*) ()) 0) (). We can divide it into two parts. First, suppose we have a variable FP, which contains a function pointer, and we want to call the function points to the FP. Can write this:

(* fp) ();

If the FP is a pointer to the function, * fp is the function itself, so (* fp) () is a way to call it. Brackets in (* fp) are must, otherwise this expression will be analyzed * (fp ()). We now have to find an appropriate expression to replace FP. This problem is our second step analysis. If c can read and understand the type, we can write:

(* 0) ();

But this is not, because * operator must have a pointer as its operand. In addition, this operand must be a pointer to a function to ensure that the result of * can be called. Therefore, we need to convert 0 to a type that can describe "pointers that point to a VOID".

If the FP is a pointer to the function that is pointing to the VOID, (* fp) is a VOID value, and its declaration will be like this:

Void (* fp) ();

Therefore, we need to write:

Void (* fp) (); (* fp) ();

To declare a dumb variable. Once we know how to declare this variable, we also know how to convert a constant to this type: as long as the name is removed from the variable declaration. Therefore, we will convert 0 to a "pointer to the function to return VOID" below.

(void (*) ()) 0

Next, we use (void (*) ()) 0 to replace FP:

(* (void (*) ()) 0) ();

The semicolue at the end is used to convert this expression into a statement.

Here, we did not use the typedef declaration when we solve this problem. By using it, we can solve this problem more clearly:

TYPEDEF VOID (* Funcptr) (); (* (funcptr) 0) ();

2.2 Operators don't always have the priority you think

Suppose there is a declared constant flag, it is an integer, a certain bit of its binary representation (in other words, it is a power of 2), and you want to test a integer variable Flags this bit Whether it is set. The usual way of writing is:

IF (Flags & Flag) ...

Its significance is very clear for many C programmers: whether the result of the expression in parentheses in parentheses is 0. We can write it more specifically for clear purposes:

IF (Flags & Flag! = 0) ...

This statement is now easier to understand. But it is still wrong, because! = Ratio & binding is more tight, so it is analyzed as:

IF (Flags & (Flag! = 0) ...

This (occasionally) is ok, such as Flag is 1 or 0 (!), But for the power of the other 2 is not [2].

Suppose you have two integer variables, h and l, which are between 0 and 15 (including 0 and 15), and you want R to set R to 8-bit values, which is L, high h. A natural way is:

R = h << 4 1;

Unfortunately, this is wrong. The addition is more tight than shift, so this example is equivalent to:

R = h << (4 L);

There are two correct ways:

R = (h << 4) L;

R = h << 4 | L;

One way to avoid this problem is to enclose all things in parentheses, but the brackets in the expression will be difficult to understand, so it is best to remember the priority in C.

Unfortunately, there are 15, too difficult. However, by grouping them can be easily.

Binding the most tight operator is not a real operator: subscript, function call, and structural selection. These are associated with the left. Next is a yuan operator. They have the highest priority in the true operator. Since the function call is better than the one-component operator, you must write (* p) () to call the P-pointed function; * p () means that P is a function that returns a pointer. The conversion is a one-dimensional operator and has the same priority as other one-component operators. One yuan operator is the right binding, so * p represents * (p ), not (* p) .

The next is the true binary operator. The math operator has the highest priority, then the shift operator, the relationship operator, the logical operator, the assignment operator, and finally the conditional operator. Two important things that need to be remembered are:

All logical operators have a low priority than all relationship operators. The shift operator is more closely tied to the relational operator, but it is not as good as the mathematical operator.

In these operator categories, there are some strange places. Multiplication, division and sub-expensive have the same priority, addition and subtraction has the same priority, and the shift operator has the same priority.

There is also six relational operators that do not have the same priority: == and! = The priority is lower than other relationship operators. This allows us to determine whether A and B have the same order as C and D, for example:

a

In the logical operator, there is no two priorities with the same priority. The bitmap operator is tightly binding than all sequential operators, each with the operator is more tight than the corresponding or operator, and the bit is different or the (^) operator is intervened in bit and and press Bit or between.

The priority of the three-yuan operator is lower than the priority of all operators we mentioned. This ensures the logical combination feature of the relational operator contained in the expression, such as:

z = a

This example also shows that the assignment operator has a lower priority than the conditional operator. In addition, all composite assignment operators have the same priority and are combined with right to left, so

A = b = c

with

B = C; a = B;

Is equivalent.

The minimum priority is a comma operator. This is easy to understand because the comma is usually used to replace the semicolon when the expression is required instead of a statement.

Assignment is another operator, usually has a mixed priority. For example, consider this loop used to copy files below:

While (C = Getc (in)! = EOF) PUTC (C, OUT);

The expression in this While cycle looks like a value that C is assigned with Getc (IN), and then determines if it is equal to EOF to end the loop. Unfortunately, the priority of assignment is low than any comparison operation, so the value of C will be the result of getc (in) and EOF comparison, and will be abandoned. Therefore, the file obtained by "copy" will be a file consisting of byte streams having a value of 1.

It is not difficult to write correctly above this example:

While ((C = Getc (in))! = EOF) PUTC (C, OUT);

However, this mistake is difficult to find in many complex expressions. For example, the LINT program released with the UNIX system usually has the following error line:

IF (((t = btype (pt1-> aty) == start || t == unionty) {

This statement wants to assign a value to t, then look at whether T is equal to STRTY or UnionTy. The actual effect is large. [3].

The priority of the logical operator in C has historical reasons. B Language - Sen for Seniors - The logical operator corresponding to the & and | operators in C. Although their definitions are bitten, the compiler is as dedicated to && and || in the conditional judgment context. When they separate them in C, the priority changes are dangerous [4]. 2.3 Take a look at these sections!

A redundant semicolon in C usually brings a little different: or an empty statement, no effect; or the compiler may make a diagnostic message, which can be convenient to remove it. An important difference is that in the IF and WHILE statements that must be followed by a statement. Consider the following example:

IF (x [i]> big); BIG = X [I];

This will not compile errors, but the meaning of this program:

IF (x [i]> BIG) BIG = x [i];

It is very different. The first block is equivalent to:

IF (x [i]> BIG) {} BIG = X [I];

That is to equivalence:

BIG = X [I];

(Unless X, I or BIG is a macro with side effects).

Another place due to the semicolon caused a huge difference is the end of the structural declaration in front of the function ([注] This sentence is not very good, see the examples will understand). Consider the following program fragment:

Struct foo {Int x;} f () {...}

A semicolon is lost after the first} next to F}. Its effect is to declare a function f, the return value type is Struct Foo, which constitutes a part of the function declaration. If a semicolon appears here, f will be defined as a default integer return value [5].

2.4 Switch statement

The CASE segment in the Switch statement in CASE can enter the next one. For example, consider the following C and Pascal seasses:

Switch (color) {Case 1: Printf ("red"); Break; Case 2: Printf ("Yellow"); Break; Case 3: Printf ("Blue"); Break;}

Case Color of1: Write ('red'); 2: Write ('Yellow'); 3: Write ('Blue'); END

These two program fragments do things: according to the value of the variable color is 1, 2 or 3 print Red, Yellow, or Blue (no new bank). These two program fragments are very similar, only a point difference: There is no CAK statement in the Pascal program. The CASE tag in C is a real label: The control process can enter a case tag without restrictions.

Take a look at another form, assuming that the C program seems more like Pascal:

Switch (color) {casse 1: Printf ("red"); case 2: Printf ("Yellow"); case 3: Printf ("blue");}

And assume that the value of Color is 2. The program will print YellowBlue because the control is naturally transferred to the next Printf () call.

This is both the advantage of the C language Switch statement and its weakness. Say it is weak, because it is easy to forget a BREAK statement, resulting in an unusual behavior of the program. Say it is the advantage, because by deliberately removing the Break statement, it can easily implement the control structure that other methods can be implemented. Especially in a large SWITCH statement, we often find the processing of a CASE to simplify other special processes.

For example, it is imagined that a program is a translator of a virtual machine. Such a program may contain an Switch statement to process various opcodes. On such a machine, it is usually subtracted to become the same after the second arithmetic number is changed. Therefore, it is best to write such a statement: Case Subtract: OPND2 = -OPND2; / * No Break; * / Case Add: ...

Another example, consider the compiler to find a marker by skipping the blank character. Here, we will see the space, tab, and new lines as the same, in addition to the new line, the increase in row counter:

Case '/ n': Linecount ; / * no break * / case '/ t': case '': ...

2.5 function call

Unlike other programming languages, C requiring a function call must have a list of parameters, but there is no parameters. Therefore, if F is a function,

f ();

Is the statement called the function,

F;

do nothing. It will be evaluated as a function address, but it will not call it [6].

2.6 suspended ELSE problem

We will not forget to mention this problem when discussing any grammar defects. Although this problem is not unique to the C language, it still hurts those C programmers who have experienced experience.

Consider the following program clip:

IF (x == 0) error (y == 0) error (); else {z = x y; f (& z);}

The purpose of the programmer written in this program is clearly divided into two types: x = 0 and x! = 0. In the first case, the block is not done unless Y = 0 is called Error (). In the second case, the program sets z = x y and calls f () as a parameter as the address.

However, the actual effects of this program are greatly different. The reason is that an ELSE is always associated with its nearest IF. If we hope that this program can run according to the actual situation, you should write:

IF (x == 0) {if (y == 0) error (); else {z = x y; f (& z);}}

In other words, don't do anything when X! = 0 occurs. If you want to achieve the effect of the first example, you should write:

IF (x == 0) {IF (y == 0) error ();} else {z = z y; f (& z);

3 connection

A C program may have a lot of components, which are compiled separately, and is bound to a program that is commonly referred to as a connector, a connection editor, or a loader. Since the compiler can only see a file at a time, it cannot detect the contents of multiple source files that require programs to discover.

In this section, we will see some of this type of error. There are some C implementations, but not all, with a program called LINT to capture these errors. If you have such a program, it is not too much whether it emphasizes its importance.

3.1 You must check the external type yourself

Suppose you have a C program and is divided into two files. One includes the following statement:

Int n;

Let a statement containing the following statement:

Long N;

This is not a valid C program because some external names are declared in two files as different types. However, many implementations not detect this error, because the compiler when compiling a file which does not know the contents of another file. Therefore, the taste of the check type can only be done by the connector (or some tool programs such as a LINT); if the system's connector does not recognize the data type, the C compiler can not force it too much. So what happens when this program is running? There are many possibilities:

Achieve smart enough to detect the type of conflict. Then we will get a diagnostic message, indicating that n of different types in both files. The implementation you use will consider int and long as the same type. Typically, the machine can naturally carry out 32-bit operations. In this case your program may be able to work, it seems to be declared twice as long (or int). But the work of this program is purely casual. Two instances of N require different storage, which share the storage area in some way, that is, the assignment of one of them is also valid. This may occur, for example, the compiler can arrange low for long int's. This is based on whether the system is still based on machine, operating such a program is also a chance. Two instances of N are shared in another way, that is, the effect of assigning one of them is different to another value. In this case, the program may fail.

An example of this happened is surprisingly frequent. A file for the program contains the following statement:

char filename [] = "etc / passwd";

And the other file contains this statement:

Char * filename;

Although the behavior of the arrays and pointers in certain environments is very similar, they are different. In the first statement, FileName is the name of a character array. Although the name of the array can generate a pointer to the first element of the array, this pointer onlys only occurred and will not continue. In the second declaration, FileName is the name of a pointer. This pointer can point to any local programmers make it points to. If the programmer does not give it a value, it will have a default 0 value (NULL) ([epocolate] actually, a pointer to the initialized pointer usually has a random value, which is very dangerous!) .

These two declarations use the storage area in different ways, which cannot coexist.

One way to avoid this type of conflict is to use tools like lint (if possible). In order to check the type conflict between a program's different compilation units, some programs need to see all of them. A typical compiler cannot be completed, but Lint can.

Another way to avoid this problem is to put the external statement into the included file. At this time, the type of an external object only occurs once [7].

4 semantic defect

A sentence can be accurate spelling and there is no syntax error, but it still does not make sense. In this section, we will see some writing program will make them appear to be a meaning, but in fact is an entirely different meaning.

We would also like to discuss something that looks legitimate but actually produce undefined results of the surface environment. What we discuss here does not guarantee that you can work in all C. Let us forget those able to work in some implementations, but may not work in other implementations of what, up until Section 7 discusses possible implementation issues.

4.1 Expression quotation order

Some operators C in a known, specific order evaluated their operands. But other can't. For example, consider the following expression:

a

C Language Definition Specary A

To evaluate a

All other operators in C are undefined to the evaluation order of the operands. In fact, the assignment operator does not make any guarantees for the order of value.

For this reason, the method of copying the previous N elements in the array x to the array y in the following:

i = 0; While (i

The problem is Y [i]'s address does not guarantee the value before I grow. In some implementations, this is possible; but in other implementations are not possible. Another situation will fail with the same reason:

i = 0; While (i

The following code is workable:

i = 0; While (i

Of course, this can be abbreviated as:

For (i = 0; i

4.2 &&, || and! Operators

There are two logical operators in C, in some cases, can be exchanged: bit operators &, | and ~, and logical operators &&, || A programmer replaces the corresponding class of operators with a class operator to get some strange effect: The program may work correctly, but this is purely casual.

&& ,|| These operators return 1 indicating "true" and returning 0 means "false", and && and || operators can determine their return values ​​through the number of operands on the left, no value to the right number of operands.

Therefore, 10 is zero because 10 is non-zero; 10 && 12 is 1, because 10 and 12 are not zero; 10 || 12 is also 1, because 10 is non-zero. Further, 12 of the last expression will not be evaluated, and f () in 10 || F () will not be evaluated.

Consider the following sections used to find a specific element in a table:

i = 0; While (i

The meant behind this cycle is that if i is equal to TABSIZE, the elements are not found. Otherwise, i contains an index of an element.

Suppose this example && accident is replaced for &, this loop may still work, but only two lucky situations can stop.

First, both operations return 0 when the condition is false, and when the condition is true, returns 1. As long as x and y are 1 or 0, X & Y and X && Y have the same value. However, if the two operators are interchanged when using a non-zero value other than 1 means "true", this loop will not work.

Second, since the array element does not change, it is harmless when the last element is entered, and the loop will be fortunately stopped. The wrong program will cross the end of the array, because & unlike &&, always evaluate all operands. Therefore, the last acquisition of the loop is equal to TABSIZE. If tabsize is the number of elements in tab, it will be taken to a value that does not exist in Tab. 4.3 The subscript starts from scratch

In many languages, the number of the array of N elements is the number of its element and its subscript, from 1 to N strict. But not this in C.

An element having no subscript N-N-element in a C array having n elements, the subscript of the element from 0 to N - 1. Therefore, programmers who go to C language from other languages ​​should be particularly carefully used:

INT I, A [10]; for (i = 1; i <= 10; i ) a [i] = 0;

The purpose of this example is to set each element in A to 0, but there is no desired effect. Because the comparison I <10 in the FOR statement is replaced by an element that is numbered 10 in the I <= 10, the element does not exist is set to 0, so that the word behind the memory is destroyed. If the compiler compiles the program assigns memory to the user variable according to the descending address, then a later is i. Setting I zero will cause the loop to fall into an infinite loop.

4.4 C does not always convert the argument

The following blocks will fail because two reasons are:

Double S; s = SQRT (2); Printf ("% g / n", s);

The first reason is that SQRT () requires a Double value as its parameters, but not yet. The second reason is that it returns a Double value but does not have this claim. Corrected methods only one:

Double S, SQRT (); s = SQRT (2.0); Printf ("% g / n", s);

Two simple rules in C controlling the conversion of the function parameters: (1) The integer is converted to int; (2) is converted to Double than the short floating point type than the Double. All other values ​​are not converted. Make sure the correctness of the function parameter type is the programmer's responsibility.

Therefore, a programmer must be passed to the Float or Double type parameter only if you want to use a function such as SQRT () to accept a Double type parameter. Constant 2 is an int, so its type is wrong.

When a function of a function is used in an expression, its value is automatically converted to an appropriate type. However, in order to complete this automatic conversion, the compiler must know the type of the function actually returned. Functions without a further denomination are assumed to return int, so the function that is called such a function is not necessary. However, SQRT () returns Double, so you must have a name before successfully use it.

In fact, C implementations typically allow a file to include the include statement to include a reputation of SQRT () these library functions, but for programmers who write their own write functions, the name is also necessary - or saying extraordinary The people of the C procedure are necessary.

Here is a more spectacular example:

Main () {Int i; char C; for (i = 0; i <5; i ) {scanf ("% d", & c); Printf ("% d", i);} printf ("/ n" }

On the surface, this program reads five integers from standard inputs and writes 0 1 2 3 4 to standard output. In fact, it doesn't always do this. For example, in some compilers, its output is 0 0 0 0 1 2 3 4.

why? Because fame is c char instead of int. When you let Scanf () read an integer, it needs a pointer to an integer. But here it gets a pointer of a character. But scanf () does not know that it doesn't get it needs: it will be viewed as a pointer to integer and stores an integer to there. Since the integer take up more memory than the character, doing so will affect the nearby c memory. What exactly is nearby c compiler thing; in this case it may be the low of i. Therefore, whenever a value is read into C, I is set to zero. When the program finally reaches the end of the file, scanf () no longer attempts to put a new value to c, i can grow normally until the end of the loop.

4.5 Pointer is not an array

C program typically a character string into a null-terminated character array. Suppose we have two such strings S and T, and we want to connect them as a separate string R. We usually be done using library functions strcpy () and strcat (). Below this obvious method does not work:

Char * r; strcpy (r, s); strcat (r, t);

This is because R is not initialized to point to anywhere. Although r could potentially represent a block of memory, but this does not exist until you assign it.

Let's try again, and allocate some memory for R:

Char R [100]; STRCPY (R, S); STRCAT (R, T);

This is only able to work when the string pointed to by S and T is not very large. Unfortunately, C requiring us that the size specified for the array is a constant, so it cannot be determined if R is large enough. However, many C implementation with a library function called Malloc (), which accepts a number and allocates so much memory. Often there is still a function called Strlen (), can tell us how many characters in a string: So we can write:

Char * r, * malloc (); r = malloc (Strlen (S) Strlen (T)); strcpy (r, s); strcat (r, t);

However, this example will fail because of two reasons. First, malloc () may deplete the memory, and this event is only represented by quietly returning an empty pointer.

Second, more importantly, malloc () does not assign enough memory. A string is ended with an empty character. And strlen () function which returns the number of characters contained in the string argument but not including the null character. Therefore, if strlen (s) is n, then s needs n 1 characters to hold it. So we need to allocate an additional character for R. Plus check malloc () is successful, we get:

Char * r, * malloc (); r = malloc (Strlen (S) Strlen (T) 1); if (! r) {Complain (); exit (1);} STRCPY (R, S); STRCAT (r, t);

4.6 Avoiding a message

Synecdoche (Synecdoche, sin-ECK-duh-key) is a literary device, somewhat similar to simile or metaphor, Oxford English Dictionary explained as follows: "a more comprehensive term is used for a less comprehensive or vice versa; AS WHOL for Part For Whole, Genus for Species or Species for Genus, ETC. (will be used as uncommon units, or vice to; ,and many more.)"

This can accurately describe errors that typically make pointers in Ceration as their pointing data. It will often happen in a string. For example: char * p, * q; p = "xyz";

Although it is considered to be XYZ sometimes useful, this is not true, it is very important to understand this. The value of P is a pointer to the 0th element in an array of four characters, which is 'x', 'Y', 'Z' and '/ 0'. So if we do now:

Q = P;

P and Q will point to the same block. The characters in the memory are not copied by assigning values. This situation looks like this:

To remember, copy a pointer does not copy what it points to.

So if we will perform:

Q [1] = 'y';

q The memory pointed to contains a string xyz. P is also because P and Q point to the same memory.

4.7 Empty pointer is not empty string

The result of converting an integer into a pointer is implementation-dependent, except for one exception. This exception is constant 0, which guarantees a pointer that is converted into a pointer that is not equal to other effective pointers. This value is usually defined like this:

#define null 0

But its effect is the same. An important thing to remember is that when it is used as a pointer, it will never be released. In other words, after you assign 0 to a pointer variable, you cannot access the memory it points to. Can't write this:

IF (p == (char *) 0) ...

You can't write this:

IF (strcmp (p, (char *) 0) == 0) ...

Because strCMP () is always viewed by its parameters.

If P is an empty pointer, it is also invalid to write:

Printf (p);

or

Printf ("% s", p);

4.8 integer overflow

The C language is very clear about the overflow or underflow of integer operation.

As long as one operand is unsigned, the result is no symbol, and is molded in 2n, where n is the word length. If the number of operands are symbol, the result is undefined.

For example, suppose A and B are two non-negative integer variables, you want to test whether A B overflow. A obvious way is this:

IF (A B <0) Complain ();

Usually, this is not working.

Once A B has spill, it is meaningless to any beta of the result. For example, on some machines, an additional operation sets an internal register to four states: positive, negative, zero or overflow. On such a machine, the compiler has the right to implement the above example to first add A and B, and then check if the internal register status is negative. If the operation overflows, the internal register will be in an overflow state, which will fail.

A correct way to succeed this special test is to depends on a good definition of unsigned arithmetic, that is, conversion between symbols and unsigned:

IF ((int) (unsigned) b) <0) COMPLAIN ();

4.9 shift operator

Two reasons will make people feel annoying using shift operators:

In the right shift calculation, the space is filling with 0 padding or using symbolic bits? What are the number of shifts?

The answer to the first question is simple, but sometimes it is related. If the operand to be shifted is unsigned, it will be moved into 0. If the operand is a symbol, the implementation has the right to decide whether to move into 0 or move into the symbol bit. If you care about the space in a right shift, use unsigned to declare the variable. This way you have the right to assume that the vacancy is set to 0. The answer to the second question is equally simple: if the length of the length to be shifted is N, the number of shifts must be greater than or equal to 0 and is strictly less than n. Therefore, it is impossible to remove all bits from the variable in a separate operation.

For example, if an int is 32 bits, and N is an int, write n << 31 and n << 0 is legal, but n << 32 and n << -1 are illegal.

Note that even if the symbol is moved into the vacancy, the right shift operation of a symbolic integer and the power of 2, is not equivalent. To prove this, it is impossible to consider (-1) >> 1, this is not 0. [Translation: (- 1) / 2 results are 0. ]

5 library function

Each useful C program uses library functions, because there is no way to build input and output into the language. In this section, we will see some non-expected behaviors that are widely used in some cases in some situations.

5.1 Getc () Returns an integer

Consider the following procedure:

#include

MAIN () {

Char C;

While ((c = getchar ())! = EOF)

PUTCHAR (C);

}

This program seems to copy standard input to standard output. In fact, it doesn't matter all all.

The reason is that C is declared as a character rather than an integer. This means that it will not receive all characters that may appear include EOF.

So there are two possibilities here. Sometimes some legal input characters can cause the same value of C to carry and EOF, and sometimes C will not store the EOF value. In the previous case, the program will stop replication in the middle of the file. In the latter case, the program will fall into an infinite loop.

In fact, there is still a third possibility: the program will accidentally work correctly. C language reference manual strictly defines the expression

((c = getchar ())! = EOF)

the result of. Its 6.1 declaration:

When a longer integer is converted to a shorter integer or a char, it will be cut on the left; the exceeding bit is simply discarded.

Section 7.14 declaration:

There are a lot of assignment operators, they are all combined from right to left. They all require a left value as an operand on the left, and the type of assignment expression is the type of operand on the left. Its value is the value of the left operand that has already been assigned.

The combined effect of these two clauses is that the high level of getchar () must be discarded, and the truncated value is compared to EOF later. As part of this comparison, c must be expanded into an integer, or take the zip of the left side 0 to fill, or appropriately taking a symbol extension.

However, some compilers do not implement this expression correctly. They do to assign a lower number of getchar () values ​​to C. But in the comparison of C and EOF, they use the value of getchar ()! The compiler that makes this instance can be "correctly" work.

5.2 Buffer Output and Memory Allocation

When a program generates an output, how important it can be immediately seen? This depends on the program.

For example, the terminal is displayed on the terminal and requires people to answer one problem in front of the terminal, people can see the output to know what the input is critical. On the other hand, if it is output to a file, and eventually transmitted to a row printer, only all the output can eventually reach there.

The display immediately arranged is usually much more expensive than the output of it is temporarily stored in a large piece. Therefore, C realization usually allows the programmer control to generate how much output is actually written. This control is often defined as a library function called SetBuf (). If BUF is an array of characters with an appropriate size, then

SetBUF (stdout, buf);

The output that tells the I / O library to the stdout should be buffered as an output buffer as a buf, and wait until the BUF is full or the programmer calls FFLUSH (). Suitable size in the buffer

It is defined as bufsiz.

Therefore, the following program explains that the standard input is copied to the standard output by using setbuf ():

#include

MAIN () {

INT C;

Char buf [bufsiz];

SetBUF (stdout, buf);

While ((c = getchar ())! = EOF)

PUTCHAR (C);

}

Unfortunately, this program is wrong because a subtle reason.

To know where the problem is, we need to know when the last refresh is in the buffer. Answer; After the main program is complete, the library will control part of the cleaning of the operating system before the operating system. At this moment, the buffer has been released!

There are two ways to avoid this problem.

First, use a static buffer or explicitly declare it to static:

Static char buf [buffsiz];

Or move the entire statement to the main function.

Another possible way is to dynamically allocate buffers and never release it:

Char * malloc (); setbuf (stdout, malloc ());

Note In the latter case, you don't have to check the return value of malloc (), because if it fails, it will return an empty pointer. SetBUF () can accept an empty pointer as its second parameter, which will make STDOUT into non-buffered. This will run very slow, but it can run.

6 Preprocessor

The running program is not what we have written: Because the C preparation is converted first. For two main reasons (and many times), the preprocessor provides us with some simplified ways.

First, we hope to change all instances (such as the size of the table) [9] by changing a number and recoiling the program.

Second, we may want to define something, they look like a function but there is no function to call the required operation overhead. For example, Putchar () and getChar () are usually implemented as macros to avoid function calls to each character's input and output.

6.1 Macro is not a function

Some of the programmers sometimes treat them equivalent because the macros can appear like a function. Therefore, look at the definition below:

#define max (a, b) ((a)> (b)? (a): (b))

Pay attention to all brackets in the macro. They are in order to prevent A and B from being brought to a ratio> priority low.

An important issue is that the macro of MAX () is two times and will be evaluated twice. Therefore, in this example, if the A is large, A will be evaluated twice: once in comparison, and the other is when calculating the max () value.

This is not only inefficient, and there will be an error:

Biggest = x [0]; i = 1; While (i

When max () is a true function, this will work normally, but when max () is a macro, it will fail. For example, suppose X [0] is 2, x [1] is 3, x [2] is 1. Let's take a look at what happens during the first cycle. The assignment statement will be expanded to: biggest = ((Biggest)> (x [i ]): (x [i ]));

First, BiggeST is compared to X [i ]. Since i is 1 and X [1] is 3, this relationship is "false". Its side effects is that I increases to 2.

Since the relationship is "false", the value of X [i ] is to be assigned to Biggest. However, at this time I becomes 2, so the value assigned to the Biggest is the value of X [2], that is, 1.

Avoiding these issues is to ensure that the parameters of the max () macro have no side effects:

Biggest = x [0]; for (i = 1; i

There is also a dangerous example is a mixed macro and its side effects. This is from the eighth version of UNIX

The definition of the PUTC () macro:

#define PUTC (x, p) (- (p) -> _ CNT> = 0? (* (p) -> _ ptr = (x)): _flsbuf (x, p)))

The first parameter of the PUTC () is a character to be written to the file, and the second parameter is a pointer to an internal data structure representing a file. Note that the first parameter can be used to use something like * z , although it appears twice in the macro, it will only be evaluated once. The second parameter will be evaluated twice (in the macro body, X appear twice, but due to its two appearances in one: two sides, there is a case in the PUTC () There is only one evaluation value). Since the file parameters in the PUTC () may have side effects, this occasionally there will be problems. However, in the user manual documentation: "Since PUTC () is implemented as a macro, it may have side effects. In particular, PUTC (C, * F ) does not work correctly." But PUTC (* C , F) It is possible to work in this implementation.

Some C is very unhappy. For example, no one can handle PUTC (* C , F) correctly. Another example, consider the TouPper () function that appears in many C libraries. It converts a lowercase letter into a corresponding uppercase letter, while other characters are unchanged. If we assume that all lowercase letters and all uppercase letters are adjacent (might gap between case), we can get such a function:

TouPper (c) {if (c> = 'a' && c <= ') C =' A '-' A '; RETURN C;}

In many C implementations, in order to reduce the call overhead than actual calculations, it is usually achieved as a macro:

#define TouPper (C) ((c)> = 'a' && (c) <= 'z'? (c) ('A' - 'A'): (c))

Many times this is indeed a better than the function. However, when you try to write TouPper (* p ), we will appear strange results.

Another place to note is that the use of macros can produce huge expressions. For example, continue to consider the definition of MAX ():

#define max (a, b) ((a)> (b)? (a): (b))

Suppose we define this definition to find the maximum in A, B, C, and D. If we write directly:

Max (A, Max (B, Max (C, D)))))

It will be expanded to:

(((a)> ((((((((((((((((((((((c)> (d)? (c): ((((c): ((c): ((c): (((c)> (d)? d)))))))))))))))))))))))): (((((((((((((((((((((((((((((c)> (d) ? (c): (d))))))))))))))))))

This is surprisingly huge. We can make it shorter by balanced operands:

Max (Max (A, B), Max (C, D))

This will get:

(((((a)> (b))> ((c)> (d)? (c): (D))))? (((a)> (b) ? (a): (b))): ((c)> (d)? (c): (d))))))))))))))))))))))))

This looks still written:

Biggest = a; if (Biggest

It is better.

6.2 Macro is not type definition

A usual use of macros is to ensure that multiple things in different places have the same type:

#define footype struct foofootype a; footype b, c;

This allows programmers to change the types A, B, and C by only one line in the program, although A, B, and C may declare to different parts of the country.

Use such macro definitions and portable advantages - all C compilers support it. Many C compilers do not support another method:

Typedef struct foo footype;

This defines FOOTYPE as a new type equivalent to Struct Foo.

These two methods for type names can be equivalent, but TypeDef is more flexible. For example, consider the following example:

#define t1 struct foo * typedef struct foo * t2;

These two definitions make T1 and T2 equivalent to a pointer to Struct Foo. But see what happens when we try to declare more than one variable in a row:

T1 A, B; T2 C, D;

The first statement is expanded to:

Struct foo * a, b;

Here a a defined as a structural pointer, but B is defined as a structure (rather than a pointer). In contrast, both C and D are defined as a pointer to the structure, as T2 is like a true type.

7 portability defect

C is implemented and run on many machines. This is also the reason why the C procedure written in one place should be easily transferred to another programming environment.

However, because there are many implementations, they don't communicate with others. In addition, different systems have different needs, so the C implementation on a machine and how many on the other.

Since many early C implementations are related to UNIX operating systems, the nature of these functions is detailed in the system. When some people began to implement C in other systems, they try to make the behavior of the library similar to the behavior in the UNIX system.

But they are not always able to succeed. What's more, many people start from different versions of UNIX systems, and some of the nature of some library functions inevitably occur. Today, a C programmer must know many of these subtle differences if they want to write programs for users in different environments.

7.1 What are the names in a name?

Some C compilers treat all characters in an identifier as a signature. Other characters other than one limit are ignored when the identifier is stored. The target program generated by the C compiler will be handled by the loader to access subroutines in the library. The loader typically applies its own constraints for the names they can handle. A common loader constraint is that all external names must only be capitalized. In the face of such a loader constraint, the C realizes will force all external names to be capitalized. Such constraints are described in Section 2.1 of the C LAN Reference Manual.

A identifier is a character and a digital sequence, and the first character must be a letter. Underline _ calculates letters. Capital letters and lowercase letters are different. Only the top eight characters are signatures, but you can use more characters. There are more limitations that can be used by multiple assembler and loaders:

Here, some examples continue to give some examples if some implementations require the external identifier to have a separate case format, or less than eight characters, or both.

Because of all of this, it is important to choose the identifier carefully in a program that can be portable. Selecting Print_fields and Print_float for two subprograms. This is not a good way.

Consider this significant function:

Char * malloc (unsigned n) {char * p, * malloc (); p = malloc (n); if (p == null) PANIC ("Out of Memory"); Return P;}

This function is a simple way to ensure exhautive memory without causing no detection. Programmers can replace malloc () by calling mallo (). If malloc () misfortune fails, PANIC () will be called to display an appropriate error message and terminate the program.

However, what happens when the function is used in a system that ignores the case difference. At this time, the name Malloc and Malloc are equivalent. In other words, the library function malloc () is completely replaced by the Malloc () function above, and when Malloc () is called itself. Obviously, the result is that the first attempts to allocate memory will fall into a recursive loop and confusion. But in some implementations that can be distingurated, this function is still working.

7.2 How big is an integer?

C Provide three integers sizes for programmers: ordinary, short and long, and characters, their behavior is like a small integer. C language definition does not make any guarantee for the size of the various integers:

The four dimensions of the integer are non-decreasing. The size of ordinary integers should be sufficient to store any array subscript. The size of the character should reflect the essence of a particular hardware.

Many modern machines have 8 characters, but there are some 7-bit 9-bit characters. So characters are usually 7, 8 or 9 bits.

Long integer is usually at least 32 bits, so a long integer can be used to represent the size of the file.

Normal integers are usually at least 16 bits, because too small integers will limit the maximum size of an array.

A short integer is always 16 bits.

What does this mean in practice? The most important point is that don't expect any specific accuracy. Informal case you can assume a short integer or a normal integer is 16 bits, and a long integer is 32-bit, but it does not guarantee that there will always be these sizes. You can of course use ordinary integers to compress the size and subscript, but when a variable must store a tens of thousands of numbers?

A more portable approach is to define a "new" type:

Typedef long Tenmil;

Now you can use this type to declare a variable and know that it is width. In the worst case, you can change all of these variables as needed to change this individual type definition.

7.3 The character is a symbol or no symbol?

Many modern computers support 8 characters, so many modern C compilers implement the characters as an 8-bit integer. However, not all compilers explain these 8 digits in the same way.

These issues becomes particularly important when converting a CHAR system into a larger integer. For the opposite conversion, the result is a good definition: excess bit is simply dropped. But a compiler converts a char to an int but need to make a selection: Is Char considerate as a symbol or no sign? If the former, the CHAR is extended to int symbolic bits; if it is the latter, you should use 0 to populate 0. The results of this decision are very important for those who are used to high position 1 when handling characters. This determines that the 8-bit character range is from -128 to 127 or from 0 to 255. This also affects the design of the programmer to the hash table and the conversion table.

If you care about whether a character value is seen as a negative position, you should explicitly declare it as unsigned char. This ensures that it is a symbol in some implementations when converting to an integer, and is not as symbols in some implementations as a normal CHAR variable.

In addition, there is a misunderstanding that when C is a character variable, it can be written to a unsigned integer with the equivalent price of C. This is wrong because a char value is converted to INT before any operation (including conversion). At this time, C will first convert to a symbol integer and convert to a unsigned integer, which will produce strange results.

The correct method is to write (unsigned char) c.

7.4 Right shift is a symbol or no symbol?

Here again, repeat: How to care about how to move the right movement is best declared to the quantity of all to be shifted as unsigned.

7.5 How to round?

Suppose we use B to except A to get the merchant for q q number R:

q = a / b; r = a% b;

We temporarily assume B> 0.

What is the association between A, B, Q and R?

Most importantly, we expect Q * B R == a because this is the definition of the remainder. If the symbol of A changes, we expect Q's symbols to change, but the absolute value is unchanged. We want to guarantee R> = 0 and R

These three points clearly describe integer division and existence. Unfortunately, they can't be true at the same time.

Consider 3/2, more than 1 business. This satisfies the first point. What about -3 / 2? According to the second point, the commercial should be -1, but if this is the case, the remainder must be -1, which violates the third point. Alternatively, we can satisfy the third point by marking the remainder to 1, but at this time, it should be -2 according to the first point. This in violation of the second point.

Therefore, C and anywhere of any language that implements integer division rounds must abandon at least one of the above three principles.

Many programming languages ​​have given up the third point, requiring the remaining symbols must be the same. This ensures the first point and the second point. A lot of C is doing this.

However, the definition of the C language guarantees only the first point and | r | <| b | and when A> = 0 and B> 0 r> = 0. This is smaller than the second point or third point, in fact, some compilers satisfy the second or third point, but not common (if an implementation may always be rounded to the farthest direction) .

Although sometimes there is no need to flexibility, the C language is enough to allow us to do what we have to do, provide what we want to know. For example, suppose we have a number n to represent some of the functions of characters in an identifier, and we want to get a hash table inlet H, where 0 <= h <= HashSize. If we know that N is non-negative, we can write:

H = n% hashsize;

However, if n may be negative, this is not good, because h may be negative. However, we know H> -hashsize, so we can write: h = n% hashsize; if (n <0) h = hashsize;

Similarly, N declaration is available for unsigned.

7.6 How big is a random number?

This size is fuzzy and also affected by the design of the library. In the only C implementation running on the PDP-11 [10] machine, there is a function called RAND () to return a (pseudo) random non-negative integer. The integer length of the PDP-11 includes a symbol bit 16 bits, so the RAND () returns an integer between 0 and 215-1.

When C is implemented on VAX-11, the length of the integer becomes 32 positions. So what is the RAND () function on VAX-11?

For this system, the University of California believes that the return value of RAND () should cover all possible non-negative integers, so their RAND () version returns an integer between 0 and 231-1.

The people at AT & T feel that if the RAND () function returns a value between 0 and 215, it is easy to expect RAND () to return a program that is less than 215, ported to VAX-11 .

Therefore, it is difficult to write a program that does not depend on implementation and call the RAND () function.

7.7 case switching conversion

TouPper () and Toolower () functions have similar history. They were originally implemented as Hong:

#define TouPper (C) ((C) 'A' - 'A') # define TOLOWER (C) ((c) 'A' - 'A')

When a lowercase letter is given as an input, TouPper () will generate the corresponding uppercase letters. TOLOWER (). Both macros rely on the realized character set, which requires all the differences between uppercase letters and corresponding lowercase letters are constant. This assumption is valid for ASCII and EBCDIC character sets, possibly not very dangerous, because these unmatched macros can be encapsulated into a separate file and contain them.

These macros have a defect, namely: when a given thing is not an appropriate character, it will return garbage. Therefore, the following is unable to work by using these macros to convert a file to lowercase:

INT C; while ((c = getchar ())! = EOF) PUTCHAR (TOLOWER (C));

We must write:

INT C; While ((c = getchar ())! = EOF) PUTCHAR (Isupper (c)? TOLOWER (C): C);

In this regard, UNIX Development organization in AT & T reminds us that TouPper () and Tolower () are tested in advance by some appropriate parameters. Consider rewriting these macros:

#define TouPper (C) ((c)> = 'a' && (c) <= 'z'? (c) 'a' - 'a': (c)) # Define TOLOWER (C) ((C) )> = 'A' && (c) <= 'z'? (C) 'a' - 'a': (c))

However, we must know that the three appearances of C here must be evaluated, which will destroy expressions such as TouPper (* p ). Therefore, you can consider rewriting TouPper () and TOLOWER () as a function. TouPper () looks like this: int Toupper (INT C) {IF (C> = 'a' && c <= 'z') Return C 'A' - 'A'; Return C;}

TOLOWER () is similar.

This change brings more questions that will introduce the function call overhead each time you use these functions. Our hero thinks some people may not be willing to pay these overhead, so they will be named:

#define _toupper (c) ((c) 'a' - 'a') # define _tolower (c) ((c) 'a' - 'a')

This allows users to choose convenient or speed.

In fact, there is only one question: Berkeley's people and other C realizes did not follow this. This means that a program that uses TouPper () or TOLOWER () written on the AT & T system, and if not delivering the correct case letter parameter, you may not work in other C implementations.

If you don't know these history, it may be difficult to track such errors.

7.8 Release, reassign again

Many C implementation provides users with three memory allocation functions: malloc (), realloc (), and free (). Call Malloc (n) Returns a pointer to a newly allocated memory with n characters, which can be used by programmers. Passing a pointer to the memory allocated by malloc () can be used again. By a pointer to the assigned area and a new size call realloc () can expand or narrow the new dimensions, which may be copied during this process.

Maybe someone will think, the truth is a bit slight. Below is a description of the Realloc () that appears in the System V interface definition:

Realloc changes a block of the SIZE byte by PTR and returns the pointer (possibly move) pointer. The content below one size in the new old size will not be changed.

This segment contains a copy of this paragraph in the reference manual of the Unix System. In addition, another paragraph describing Realloc ():

If the block pointed to by the PTR after the last call Malloc, Realloc or Calloc, Realloc can still work; therefore, the order of Free, Malloc, and Realloc can utilize Malloc compressed search strategies.

Therefore, the following code snippet is legal in the Unix seventh edition:

Free (p); p = realloc (p, newsize);

This feature retains in systems derived from Unix Seventh Edition: You can release a storage area before you reallocate it. This means that the content in the memory released in these systems can be guaranteed until the next memory allocation. Therefore, in these systems, we can use the following strange ideas to release all the elements in a linked list:

For (p = head; p! = null; p = p-> next) free (char *) P);

And don't worry that free () will cause P-> Next to be unavailable.

Needless to say, this technology is not recommended, because not all C implementation can keep its content long enough time after memory is released. However, the seventh edition of the manual has left an undeclared question: the original implementation of Realloc () is actually necessary to release the redistribution. For this reason, some C processes are released first, and there is a problem when they are transplanted into other implementations. 7.9 One example of a portability problem

Let us look at the problem that has been solved many times in many times. The following program comes with two parameters: a long integer and a function (pointer). It converts an integer transition bit decimal number and uses characters representing each of them to call a given function.

Void PrintNum (long n, void (* p)) {if (n <0) {(* p) ('-'); n = -n;} if (n> = 10) printnum (N / 10) , p); (* p) (N% 10 '0');

This program is very simple. First check if N is negative; if yes, a symbol is printed and n becomes a positive number. Next, the test is N> = 10. If yes, it contains two or more numbers in its decimal representation, so we recursively call printnum () to print all the numbers outside the last digit. Finally, we print the last digit.

This program - due to its simple - with many portability issues. The first is a method of converting N low digit into a character form. Using n% 10 to get the value of the low digit is good, but adding '0' to get the corresponding character, it is not good. This addition assumes that the number of characters corresponding to the number of orders in the machine, and therefore, the value of '0' 5 and '5' is the same, and so on. Although this assumption is established for the ASCII and EBCDIC character sets, it may not be found for other machines. Avoiding this method is to use a table:

Void PrintNum (long n, void (* p)) {if (n <0) {(* p) ('-'); n = -n;} if (n> = 10) printnum (N / 10) , p); (* p) ("0123456789" [n% 10]);

Another problem occurs when n <0. At this time, the program prints a negative number and sets N to -n. This assignment can overflow because the negative number of negative numbers that can be represented on the machine using 2 is more than the positive number. For example, one (long) integer has a k-bit and an additional bit representation symbol, then -2K can be represented and 2K is not.

There are many ways to solve this problem. The most intuitive one is to assign N to a UNSIGNED long value. However, some c will not implement unsigned long together, so let's take a look at the don't do it.

On the first implementation and the second implementation machine, change the symbol of a positive integer guarantee that there is no overflow. The problem is only when there is a symbol that changes a negative number. Therefore, we can avoid this problem by avoiding the N to become positive.

Of course, once we print a negative symbol, we can see the negative and positive numbers as the same. The following method is forced that N is negative after the print symbol, and all of our algorithms are completed with a negative value. If we do this, we must ensure that the part of the print symbol in the program is only executed once; a simple method is to divide this program into two functions:

Void PrintNum (LONG N, VOID (* P) ​​()) {IF (n <0) {(* P) ​​('-'); PrintNeg (N, P);} else printneg (-n, p);} Void PrintNeg (long N, Void (* P) ​​()) {if (n <= -10) PrintNeg (N / 10, P); (* P) ​​("0123456789" [- (n% 10)]); } printnum () Now only checks if the number to be printed is negative; if it is, print one symbol. Otherwise, it calls PrintNeg () with a negative value of N. We also changed the origin of Printneg () to accommodate n forever is negative or zero.

What are we get? We use N / 10 and N% 10 to get N leading numbers and end numbers (via appropriate symbol transformation). Calling an integer division behavior is achieved when one of the operands is negative. Therefore, n% 10 may be positive! At this time, - (n% 10) is a negative number, which will exceed the end of our numeric character array.

In order to solve this problem, we build two temporary variables to store merchants and remainder. After completing the method, we check if the remainder is within the correct range, if not, adjust these two variables. Printnum () has not changed, so we only list printneg ():

Void printneg (long n, void (* p) ()) {long {INT R; if (r> 0) {r - = 10; q ;} if (n <= -10) {printneg (q, p );} (* P) ​​("0123456789" [- r]);

8 Here is free space

There are many places that may make C programmers mistaken into lost. If you find it, please contact the author. It will be included in the later version, and add a footnote expressed gratitude.

reference

"THE C Programming Language" is the most authoritative C work. It contains an excellent tutorial for those who are familiar with other senior language programming, and a reference manual, simply describe the entire language. Although this language has undergone many changes since 1978, this book is still a pavilion for many themes. This book also includes "C Language Reference Manual" mentioned many times in this article.

"The C Puzzle Book" (Feuer, Prentice-Hall, 1982) is a book that is rarely cultivating people's ability. This book collected a lot of puzzles (and answers), and their solutions can test knowledge about the readers' somewhere in C language.

"C: a referenct manual" (Harbison and Stele, Prentice Hall 1984) is a reference for realizes. Others will also find it is especially useful - because he can refer to the details.

footnote

转载请注明原文地址:https://www.9cbs.com/read-120973.html

New Post(0)