C starts from scratch (5)
What is a pointer?
This document shows the key - pointer type in C , and illustrates two very meaningful concepts - static and dynamics.
Array
As mentioned earlier, it is accessible to memory in C , but according to the previous description, only the variables can be operated by variables, which means that the first memory must be first The address and a variable name are bind up, which is very bad. For example, there are 100 memory to record 100 workers' salary. Now it is necessary to increase the salary of each worker by 5%. In order to know how much the salary after each worker has increased, it defines a variable float A1; The salary of 1 worker, then performs statement A1 = A1 * 0.05F; then A1 is increasing salary. Since it is 100 workers, there must be 100 variables, and 100 wages are recorded separately. Therefore, the above assignment statement needs to be 100, each of which is not the same. The above needs to repeat the writing variable definition statement float A1; 100 times (a variable name each time), unnecessary work. Therefore, I think of the continuous memory to the operating system to apply for 100 * 4 = 400 bytes. So to modify salary to the i-I worker, just start with the first address, I will start with 4 * i bytes (because Float takes up 4 One byte). In order to provide this function, C proposes a type-array. An array is a set of numbers, each of which is called elements of the corresponding array, and the magnitude of each element must be equal (because the elements in the array are identified by fixed offsets), ie array represents a set of the same type of number, It must be stored continuously in memory. When defining a variable, it is to indicate that a variable is an array type, adding square brackets after the variable name, indicating the number of array elements that want to apply in square brackets, ending with the semicolon. Therefore, the above records record 100 wages can be defined as follows: float a [100]; the above define a variable A, allocated 100 * 4 = 400 bytes of continuous memory (because of a float) Elements take up 4 bytes) and then bind their first address and variable name a. The type of variable A is referred to as an array having 100 float type elements. The contents of the corresponding memory in the variable A (type is how to explain the content of the memory): A corresponding address identifier is the first address of a continuous memory, the size of this continuous memory is just enough to accommodate 100 float. Type of numbers. Therefore, the previous FLOAT B; this definition can be regarded as a Float array variable B that defines an element. In order to access one of the arrays in the array, a number in square brackets can be placed in square brackets, and the numbers must be non-floating point numbers, even if they are used with binary origin or complement. Such as A [5 3] = 32; the value of the 5 3 elements of the array variable A increases 32. Also: long c = 23; float b = a [(C - 3) / 5] 10, D = a [C - 23]; the value of the above B is added to the value of the fourth element of the array variable A 10, the value of D is the value of the 0th element of the array variable A. That is, the elements in the array of C are numbered in 0, that is, the A [0] actually represents the value of the first element in the array variable A, and is 0, indicating a corresponding to a. Address plus 0 * 4 The address obtained is the address of the first element.
It should be noted that you can't write this: long a [0]; the array of definitions 0 elements is meaningless, the compiler will report error, but it can be written like this in the structure or class or the union, that is C language. A technique proposed in the era has a variable length of the structure type, which will be explained in "C from zero (nine)". It should also be noted that the variable cannot be written in square brackets when defined the array, that is, long b = 10; float a [b]; is wrong, because when this code is compiled, the value of the variable B is not least, and it cannot be assigned RAM. But the front is clearly written B = 10; why not know the value of B? That is because it is not possible to know how the address corresponding to B is. Because the compiler is compiled, it is only binding B and an offset, which is not a real address, that is, the BASE - 54 corresponding to B, and the base is to apply to the operating system when executed at the beginning of the program. The tail address of the bulk memory, because it may change, so it is unable to know the actual address (actually under the Windows platform, because the virtual address space is used, it can get the actual virtual address, but still not the actual address Therefore, it is unable to compile the value of the value of a variable). However, the compiler can still be based on the previous long b = 10; and the value of Base - 54 is 10? The key is that the compiler sees long b = 10; when you know that you want to generate an instruction, this instruction will be put into the Base - 54 memory, the other will not be able to ask (there is no need to ask), so even if you write it. Long b = 10; the compiler cannot know the value of B. The above speaking array is a type, in fact, it is not accurate, and actually should be a type modifier that defines a type modification rule. About type modifiers will be detailed later. String
In "C from zero (2)", it has been said that the ASCII code corresponding to a character is checked, and the single quotes are added to both sides of this character, such as 'a' is equivalent to 65. When you want to indicate multiple characters, you can use the double quotes, such as "ABC". In order to record characters, it is necessary to record its corresponding ASCII code, and the value of the ASCII code is within -128 to 127, so the use of a char variable can record an ASCII code, and in order to record "ABC", it is normal. Use a Char array to record. As follows: char A = 'a'; char b [10]; B [0] = 'a'; b [1] = 'b'; B [2] = 'c'; the value of the above A is 65, B [0] The value is 65, B [1] is 66, B [2] is 67. Because B is an array of 10 elements, it records a string of 3 characters long, but how to know the first element is a valid character when it gets B's address. If B [4] above is not assigned, how do you know B [4] should not be interpreted as characters? As will be checked from the 0th element to check the value of each char element until the value of a CHAR element is 0 (because 0 no corresponding characters in the ASCII code table), all of the elements in front of it are It is considered that the characters should be explained using the ASCII code table. Therefore, it should be B [3] = 0; to indicate the end of the string. The above rules are widely used. All the operations of all the strings provided in the C run period are based on the above rules to interpret the string (regarding C running period library, refer to "C from scratch (19)" ). However, in order to record a string, it seems cumbersome point, and how long the string needs to write a few assignment statements, but also need to assign the elements at the end to 0, and if you forget, the problem is serious. For this, C enforcement provides a shorthand method, as follows: Char B [10] = "abc"; above is equivalent to all of the work made, "ABC" is a number of address types (accurate Said to be an initialization expression, "C " from zero start (Nine) "), its type is char [4], that is, a 4 element Char array, more end elements for placing 0 to identify characters The end of the string.
It should be noted that since B is CHAR [10], "ABC" returns CHAR [4], the type does not match, and implicit type conversion is required, but it does not change, but a series of assignment operations ( Just as the work done earlier), this is a C hardness specified, called initialization, and only for initialization when the array is defined, that is, the following: Char B [10]; b = "abc"; Is char b [4]; b = "abc"; still error, because the type of B is an array, indicating a plurality of elements, and assigning a number of elements that is not defined, ie: float d [4]; Float DD [4] = D; also erroneous, because the elements do not define D in sequence are sequentially placed in the corresponding elements in the DD, or in reverse order, it is not possible to assign a variable of an array type. Because the current characters are increased (originally only using English letters, there are now many characters such as Chinese, Japanese). It used to use the char type to represent characters, and only 255 characters can be represented (0 to indicate the end of the string). Therefore, the so-called multi-byte string appears, the text file recorded by this representation is called the MBCS format, and the character string that is represented by the CHAR type is called single byte string (SINGLEBYTE) ), The text file recorded in this representation is called an ANSI format. Since the char type can represent a negative number, when the character is extracted from the string, if the value of the resulting element is negative, the element and the next char element are combined to form a short type number, and then according to Unicode coding rules ( A coding rule, equivalent to the ASCII code table of the previously enumerated, to explain the number of this short type to get the corresponding character. The above "ABC" returns the string represented by the multi-character format, because there is no Chinese characters or special symbols, it seems to be represented by the single-by-word section, but if: char b [10] = "ab Han C ";, then B [2] is -70, b [5] is 0, not because 4 characters of the 4 characters are 0, because" Han "characters take up two bytes. The harm of the above multi-byte format is the length of each character is not fixed. If you want to take the value of the third character in the string, you must check the value of each element from the beginning and cannot be 3 multiplying one. The fixed length reduces the processing speed of the string, and when the string is displayed, it is lowered whether the value of the current character is less than zero, so it has introduced the third character representation format: Wide Birace string (WideChar " ), Text files recorded in this representation are called unicode formats. Its difference from the multiberi is whether this character can be expressed in ASCII, using a number of short types, that is, the length of each character is fixed to 2 bytes, and C provides support. Short B [10] = L "Ab Ham C"; add "L" in front of the double quotation marks (must be capitalized, can't write) tells the compiler that the characters in this double quotes should be encoded using the Unicode format, so The above B array is to use Unicode to log string.
Similarly, there are also: Short C = l'a '; wherein C is 65. If it doesn't understand it above, it is not tight, and it will gradually understand the use of strings in the examples mentioned later. Static and dynamic
The above still does not solve the fundamental problem - C can only access memory by variables, and must first establish a corresponding mapping before accessing a block of memory, that is, the variable is defined. What is wrong? Let us first understand what static and dynamics mean. The cashier develops a ticket, manually, every time I develop tickets, I use the printed invoice to give the guests to the guests. I only printed four lattices to record the name of the product, when the merchant bought one time When more than 4 or more, two or more invoices must be opened. The number of plaids in the invoice here is called static, namely, no matter where any guests buy things, the invoice is printed on the traum of the product name. Supermarket's cashier develops a ticket, input a computer name and quantity, and then print an invoice to guests, then different guests, the length of the invoice printed may be different (some guests have more and more At this time, the length of the invoice is called dynamic, that is, different guests can buy things at different times, the length of the invoice may be different. Regardless of how much it is implemented, the program is always applied for a fixed size memory during the application, and this memory is static allocation. When the definition variable proposed earlier, the compiler helps us allocated from the stack belong to static allocation. Each executive program may apply for a different size of memory according to the user input, saying that this memory is dynamically assigned, and the assignment from the heap is dynamically allocated later. Obviously, the dynamic is higher than the static efficiency (high utilization of the length of the invoice), but requires higher-requires computers and printers, and requires higher quality (can operate computer), while static requirements are lower , Only need the invoice joint, and only the cashier will write. Similarly, the memory utilization rate of static allocation is not high or the use is not flexible, but the code is easy to write and the running speed is faster; the memory utilization rate is high, but it is complex when writing code, it needs to process the management of memory (allocation and Released) and due to this management intervention, the running speed is slower and the code length is increased. Static and dynamic meaning is not only the case, but there are many deepening, such as hard coding and soft coding, tight coupling and loose coupling, are static and dynamic deepening.
address
As mentioned earlier "Address is a number to uniquely identify a particular memory unit", and then "," the address is the same as the long integer, single-precision floating point number, is a type ", that address Is both a number of numbers and numbers? Isn't it a bit conflict? As follows: The floating point is a number-decree - another digital type. That is, the front of the front is the use of the actual address, and the latter is due to only the status of the computer, but how to handle the state to handle the type, so the address is used to tell the compiler to store the compiler. Identification to process the corresponding state.
pointer
It has been learned that the dynamically allocation memory and static allocation memory are different. Now you want to record the order data entered by the user, the number of orders that the user enters is not necessarily, and the memory is selected on the heap. Suppose now, according to the user's input, a 1M memory is required to temporarily record the data entered by the user, and to operate the 1M continuous memory, it is necessary to record its first address, but because this memory is dynamically assigned, it is not Assigned by the compiler (but the code of the program is dynamically assigned), it is not possible to establish a variable to map this first address, so you must record this first address. Because any address is 4-word-wide binary number (for 32-bit operating system), it is static allocated a 4-byte memory to record this first address. In the front, the data of the first address can be presented in the unsigned long type variable A, and then to read the contents of the 4-byte length memory at the 4th byte of this 1M memory, by adding A, 4 That is, the corresponding address is obtained, and then the contents of the four byte memory have been removed. But how do I write a code for an address corresponding to memory? As long as the number of the address type is returned, it will automatically take the corresponding content due to the address type. But if you write directly: A 4, because A is unsigned long, the A 4 returns the Unsigned long type, not the address type, what should I do? C proposes an operator - "*", called the content operator (actually called the method is not accurate). It is the same, but it only digits on the right side, ie * (A 4). This expression is returned to the number of Unsigned long numbers after the value of A. The value of the NNSIGNED LONG number is added to the address type. But there is a problem: How is the content of the memory represented by A 4? Take 1 byte or 2 bytes? What format explains the content of the removed? If you write assembly code, this is not a problem, but now is the compiler to write assembly code, so you must tell the compiler to explain how the given address is the content of memory. C proposes a pointer, as in the above array, is a type modifier. When defining a variable, in front of the variable name, "*" indicates that the corresponding variable is a pointer type (as in the variable name "[]" indicates that the corresponding variable is an array type), the size is fixed to 4 bytes. . Such as: Unsigned long * pa; top PA is a pointer variable, which is 4 bytes because it is a 32-bit operating system, which is 4 bytes, when * pa; first calculates the value of the PA, it is to return from the PA The memory begins, takes the contents of the four bytes of content, then calculates "*", converts the content you just taken to the address type of the unsigned long, then calculate the number of this address type, return to the original code format Content to get the number of unsigned long, finally calculate the number of this unsigned long to explain its binary number to the original format. That is to say, when a type of address is a pointer, it indicates that the contents of the memory corresponding to this address should be interpreted by the compiler as an address. Because the variable is the address mapping, each variable has a corresponding address, and the C provides an operator to take an address of a variable - "&", called the address operator. It is the same as the "number and" operator, but it is always on the right side (not two sides).
On the right side of "&" can only connect the number of address types. Its calculation is the number of digital simple types of the address type on the right into a pointer type, and then returns a number of pointer types, just right and take content operation. Contrary to "*". Under normal circumstances, you should make you halo, let's take doubts below. Unsigned long a = 10, b, * pa; pa = & a; b = * pa; (* pa) ; the first sentence above defines a pointer type variable Pa, ie compiler We allocated a 4-byte memory on the stack and bind the first address and PA (ie forming a mapping). Then "& A" Since A is a variable, it is equivalent to the address, so "& A" is calculated, returns a number of type unsigned long * (ie, unsigned long). It should be noted that although the number returned above is a pointer type, but its value and a correspondence, but why not directly say that the number of the address of the unsigned long, and a plot type is stunned? Because the number of the pointer type is directly returned to its binary value, the number of the address type is the content that returns its binary value corresponding to its binary value. Therefore, it is assumed that the address corresponding to the above variable A is 2000, then a; will return 10, and & a; will return 2000. See what the return value of the pointer type is. When writing PA;, return the address corresponding to the PA (which should be 2008 in the above assumption), calculate the value of this address, return to the number 2000 (because already pa = & a;), its type is unsigned long *, then this The number of unsigned long * is calculated, and directly returns the number of binary numbers corresponding to 2000 (note the contents of the previous red word). Take the content operator "*", the right number type is the pointer type or array type, and its calculation is to convert the number of this pointer type directly into the number of address types (because the numbers of the pointer type and address type) The numbers are the same in value, and only the calculation rules are different). Therefore,: b = * pa; returns the address corresponding to the PA, calculate the value of this address, return the type of UNSigned long *, then "* pa" returns the number 2000 of the type of unsigned long, then calculate this address type The value of the number, returns 10, and then simply assigns the value. Similarly, for (* Pa) (due to "*" below the prefix , "* PA" is calculated first, then the "* pa" is calculated to return the number of address types of unsigned long, then Calculate the prefix , and finally return the number of address types of the unsigned long. If you still can't understand the difference between the address type and the pointer type, I hope that the following sentence can be useful: the number of address types is used for compiler during compile time, and the number of pointer type is used in runtime. If you still don't understand, you can help after you have seen the type modifier of the following. Assign memory on the heap
As mentioned earlier, the so-called allocation on the heap is to apply for memory to the operating system during the running period, and to apply for memory to the operating system. Different operating systems provide different interfaces, which have different application for memory, and this is mainly passed The function prototype needs to be called is different (regarding the function prototype, refer to "C from zero (7)"). Since C is a language, it should not be an operating system related, so C provides a unified application, the New operator. As follows: unsigned long * pa = new unsigned long; * PA = 10; unsigned long * pb = new unsigned long [* pa], the above, the memory (ie, the value of Pa) is applied. Memory) is 4-byte size, while the memory referred to in PB is 4 * 10 = 40 bytes. It should be noted that since the New is an operator, its structure is a New
So you need to add "[]" after release PB, "[]" is added "[]" to indicate that the array is released, but in the VC, regardless of the former or the latter, you can correctly release memory, no need to "[]" intervention to help compile The device to release the memory correctly, because the VC of the developer is used in Windows operating system, while the Windows operating system is released, there is no need to know the length of the memory block to release, because It has been recorded inside (this statement is not accurate, and it should be that the C run period will do these things, but it is depends on the operating system to do, that is, there is two layers of packaging of memory management. This is not a table). Type modifier (Type-Specifier)
Type modifiers, namely the symbols for types of roles, which are used to further indicate how to operate the variable corresponding to memory during defining variables. Because some generic mode of operation, this mode of operation is applicable to each type, so it separates them separately to the convenience of code, as if it is fruit. Eat apples, eat pear, don't eat apple's skin, don't eat pears. The apples and pears are all kinds of fruit, equivalent to type, and "XXX's flesh", "XXX's skin" is used to modify apple or pear this type to generate a new type - Apple The skin of the pear, which is equivalent to the type modifier. The arrays and pointers described in this article are type modifiers. The "&" "&" of the previously mentioned reference variable is also a type modifier, and several type modifiers will be proposed in "C from zero (7)". Also explain the two important concepts, and make a declact modifier. Type modifiers work only when defining variables, such as Unsigned long A, B [10], * PA = & A, & ra = A; The above three type modifiers - "[]", "*" and "&" are used here. The above Unsigned Long is called as the original type, indicating that the previous type before the type modifier is used. The role of these three type modifiers will be described below. Array modifier "[]" - which is always connected behind the variable name, in square brackets, a total number C is interspersed to indicate the number of array elements, to indicate that the current type is the original type C element continuously stored, length Multiplion with C. The length of the original type. Therefore, LONG A [10]; the type representing a 10 long-stored elements are continuously stored, the length is 10 * 4 = 40 bytes. LONG A [10] [4];, it is said that a 10 long [4] type elements are continuously stored, and the length is 10 * (4 * 4) = 160 bytes. I believe that it has been found that due to multiple "[]", there is a relationship of calculation order, why isn't the element of 4 long [10] types continuously stored but fell? The modified order of the type modifier is calculated from left to right, but when a repeated type modifier occurs, the same type of modifier is calculated from right to left to meet people's habits. Therefore, short * a [10] is indicated by 10 types of elements for short * continuously store, the length is 10 * 4 = 40 bytes, while short * b [4] [10]; indicates 4 types of Short * [10] The element is continuously stored, the length is 4 * 40 = 160 bytes. The pointer modifier "*" - which is always connected in front of the variable name, indicating a pointer to the original type of original type. Therefore: short a = 10, * pA = & a, ** ppa = & pa; note that the PPA here is called multi-stage pointers, that is, the pointer of its type Short, which is short **. SHORT ** PPA = & PA; meaning is the value of the address of the PA, a type of number of address types for short *, then "&" the "&" operator converts this number to the number of the pointer type of Short *, last Assign the value to the variable PPA. If it is very faint, don't think about it, as long as the type matches match, the following is a brief explanation: assuming that the address of A is 2000, then the address of the PA is 2002, the address of the PPA is 2006. For PA = & A ;. First calculate the value of "& a" because a is equivalent to the address, "&" functions, directly converts the address of A to the short * type and returns, and then assigns PA, then the value of PA is 2000.
For PPA = & pa; First calculate the value of "& pa", because the PA is equivalent to the address, "&" play a role, directly converting the address of the PA to a short ** type (because the PA is already the type of short *) and returns, then assign For PPA, the value of the PPA is 2002. Quote Modifier "&" - It is always connected in front of the variable name, indicating that this variable does not have to be distributed and binding it, and when the type is described, it cannot be there, below. Since the corresponding variable does not have to be distributed to generate a mapping, it is not like the above two type modifiers, which can be repeated multiple times because there is no meaning. And it must be on the right side of the "*" modifier, you can short ** & b = ppa; but not short * & * b; or short & ** b; because the order is calculated according to the modifier from left to right, Short * & * Indicates the reference to the reference to the pointer of the short, the reference is just to inform the compiler Do not assign memory to the variable on the stack, and the actual and type is not related to the type, so the referenced pointer is meaningless. SHORT & ** indicates the pointer of the pointer of the spin of the short, which is still meaningless. The same LONG & A [40]; it is also wrong because it represents the memory of 40 elements that allocate a plurality of references that can be continuously stored, the reference is just a means of inform the compiler, and cannot be used as a type. Samples are instantiated (for example, please see "C from zero (ten)"). It should be noted that the reference is not type (but for convenience, it is often referred to as a type), and long ** & rppa = & pa; will be wrong, because the sentence is indicated by the variable RPPA allocation memory Use the addresses of "=" as their corresponding address, and the number of address types is not the number of address types, but the compiler will not match the type of error. But even if long ** & rppa = pa; also failed, because long * and long ** are different, but due to the type of match, the following is possible (where RPA2 is very doubtful, will start from Zero "C (7) "Description): Long a = 10, * PA = & A, ** PPA = & pa, * & rpa1 = * ppa, * & rpa2 = * (PPA 1); type modifiers and original type combinations Form a new type, such as long * &, short * [34], etc., is all new types, you should pay attention to the