Data structure and algorithm foundation

xiaoxiao2021-03-06 55

1. Introduction Basic Concepts and Terminology (DATA)

The data is the carrier of information. It can be processed by computer identification, storage, and processing, is "raw materials" processing of computer program processing. With the expansion of the computer application, the scope of data includes: integer, real, string, image, and sound, etc. Data Element

The data element is the basic unit of the data. Data elements are also known as elements, nodes, vertices, records. A data element can be composed of several data items (also known as fields, domain, attributes). Data items are minimum identification units with independent meaning. Data structure

The data structure refers to the interrelationship between data, that is, the organizational form of data. 1. The data structure generally includes the following three aspects:

1 logical relationship between data elements, also known as the logical structure of the data; the logical structure of the data is to describe data from the logical relationship, independent of the data of the data, which is independent of the computer. The logical structure of the data can be regarded as a mathematical model from a specific problem abstraction.

2 Data elements and their relationships in the computer memory, called the storage structure of the data; the storage structure of the data is the implementation of the logical structure computer language (also known as an image), which depends on the computer language. For machine language, the storage structure is specific. Typically, the storage structure is discussed only at the level of the advanced language.

3 The operation of the data, that is, the operation applied to the data. The operation of data is defined on the logical structure of the data, and each logical structure has a collection of operations. The most commonly used retrieval, insertion, deletion, update, sorting is actually just a series of abstract operations applied on abstract data. The so-called abstract operation means that we only know that these operations are "what", without considering "how to do". These calculations are considered only after the storage structure is determined.

In order to increase the sensibility of the data structure, the concept of the data structure is explained below. [Example 1.1] Student transcript, see the table below. Note: In the table, the concept (1) of the data element, data item, start node, and terminal nodes (1) The logical structure table is a data element (or record, node), which is named, name, Data items such as performance and average score. The logical relationship between the data elements in the table is: one of the nodes in the table, adjacent to it and the nodes in its front (also known as the immediate predecessor); There is only one of the up to one node adjacent and after its subsequent nodes (also known as direct subsequent "). There is only the first node in the table that is not straightforward. The end node is called a terminal. For example, the direct front trend and direct successive nodes of "Mark" in the table are the nodes of "Ding Yi" and "Zhang San". The relationship between the above nodes constitutes this student score. The logical structure of the table. (2) Storage structure The storage structure of the table refers to this relationship between the computer language, that is, the nodes in the table are in the order, and in a continuous unit, or use a pointer to Node link together? (3) The operation of the data is in the student transcript of the above, it may always look at a score of a student; when the student is eating out of school; to delete the corresponding node; enter the new student to increase the node. How to find, delete, insert, this is the operational problem of data. I figured out the above three issues, and I figure out the data structure of the student transcript. 2. The logical structure of the data is classified without confusion, and the logical structure of the data is often referred to as the data structure. There are two major categories: (1) logical features of linear structure linear structure: if the structure is non-empty set, there is only one start node and a terminal node, and all nodes have only one Directly and a direct successor. Linear table is a typical linear structure. Stack, queue, string are both linear structure. (2) Nonlinear structural nonlinear structures are: a node may have multiple direct and direct successive. The data structures such as arrays, generalized forms, trees and maps are non-linear structures.

3. The storage structure of the four basic storage method data of the data can be obtained by following four basic storage methods: (1) Sequential storage method This method stores logically adjacent nodes in the physical location adjacent storage unit, node The logical relationship is reflected by the neighboring relationship of the storage unit. The resulting store representation is called a sequential storage structure, typically description by array of program languages. This method is primarily applied to a linear data structure. Nonlinear data structures can also be stored in a certain linearization method. (2) Link storage method This method does not require logically adjacent nodes in the physical location, and the logical relationship between nodes is represented by an additional pointer field. The resulting store is referred to as a chain storage structure, typically description by means of a programs. (3) Index storage method This method usually establishes an additional index table while saving node information. The index table consists of several index items. If each node has an index item in the index table, the index table is called a dense index. If a set of nodes correspond to only one index item in the index table, the index table is called a spars index. The general form of the index is: (keyword, address) keyword is the only data item that can uniquely identify a node. The address indication of the address indication of the dedicated index neutralization node; the address of the sparse index neutral index indicates the start storage location of a set of nodes. (4) The basic idea of this method is that the storage address of the node is directly calculated according to the keywords of the node. Four basic storage methods may be used singly or in combination. Different storage structures can be obtained by different storage methods. Which storage structure is selected to indicate the corresponding logical structure, depending on the specific requirements, mainly considering the convenience of operation and the time and space requirements of the algorithm.

4. The logical structure of the three aspects of the data structure, the data storage structure, and the data of the data of the data is a whole. Isolate an aspect without paying attention to the connection between them. The storage structure is an indispensable data structure: the different storage structures of the same logical structure can be identified by different data structural names. [Example] Linear European is a logical structure. If the storage representation of the sequential method, it can be called a sequence table; if a chain storage method is used, it can be called a linked list; if the hash storage method, it can be called For a lague table. The operation of the data is also an aspect of the data structure inseparable. After a given logical structure and storage structure, the characterization of the calculation and its operations may also result in a completely different data structure according to the defined calculation set and its operation. [Example] If the insertion of the linear table is limited to one end of the table, the linear representation is stack; if the insert limit is limited to one end of the table, the deletion is limited to the other end of the table. This linear representation is a queue. Further, if the linear table uses a sequence table or a linked list as a storage structure, after the insertion and deletion operations are described above, the order stack or chain stack, sequential queue or chain queue can be obtained separately. Data Type The so-called data type is a collection of values and a general name of a set of operations defined on these values. The usual data type can be considered as the data structure that has been implemented in the programming language. [Example 1.2] "Integer Type" in the C language defines a range of integer values (whose maximum int-max is dependent on the specific machine) and the addition, minus, multiplication, division of the integer. Wave operation. Press "Value" to decompose, you can divide the data type into two categories: 1 Atomic type: its value is not decomposed. It is usually provided directly by the language. [Example] Simple export type of standard type and character type and other standard types and pointers; 2 Structural types: It can be broken down into several components (or component). It is user defined by the user by means of the language provided by the language, which is usually derived from a standard type, so it is also an export type. [Example] Ar groups, structures, etc. of C. Abstract Data Type (ABSTRACT TYPE ADT) ADT refers to an organization of abstract data and related operations. It can be seen as the logical structure of the data and its operations defined on the logical structure. An ADT can be described as: ADT ADT-NAME {data: // Description of the logical relationship between data elements Operations: // Operating Instructions Operation1: // Operation 1, it usually uses C or C function prototypes Description INPUT: Description of the input data preconDitions: Perform the status // of the system before this action // can be seen as the initial condition process: Operation of the data Output: Instructions for returning data Postconditions: Perform the status of the system after this operation / / "System" can be seen as a data structure Operation2: // operation 2 ...} // ADT abstract data type can be seen as a model that describes the problem, which is independent of the specific implementation. It has the advantage that the data and operation are packaged together, so that the user program can only access the data in some operations defined in the ADT, thereby implementing information hide. In C , we can use the description of the class (including template class) to represent the ADT, use the implementation of the class to implement ADT [see [10]]. Therefore, the classes implemented in C correspond to the storage structure of the data and its operations for data implemented on the storage structure.

The concept of ADT and class actually reflects two layers of abstraction of programs or software designs: ADT is equivalent to describing problems on a concept layer (or abstraction layer), and the class is equivalent to describing problems on the implementation layer. In addition, classes in C are just a common type defined by the user, which can be used to define variables (called objects or instances). Therefore, in C , it is ultimately to solve the actual problem by operating the object, so we can regard this level as an application layer. For example, the main program can be seen as a user's application. Due to the "class" data type in the C language, the ADT cannot be implemented so that we do not describe the data structure in the form of ADT to save space. As long as you remember, it is actually equivalent to the logical structure of our defined data and an abstract operation defined on a logical structure. The meaning data structure of learning data structure is one of the core courses of computer software and computer application, and various data structures are used in numerous computer system software and application software. Therefore, only some computer languages are difficult to cope with many complex topics. To effectively use a computer, you must also learn about the knowledge of the data structure. Select the appropriate data structure to solve the application problem 1. Classification of Computer Process (1) Numerical Calculation Problem In the early stages of computer development, people use computers primarily to process numerical calculations. [Example 2.1] Solving the linear equation This problem involved in the calculation object involved is simple integer, real or Boolean data. The main energy of the program designer focuses on the skill of programming, and does not require the data structure. (2) Non-numerical problems With the expansion and soft and hardware development in the field of computer applications, "non-numerical issues" are increasingly important. According to statistics, today's processing non-numerical problems take up more than 90% of machine time, which involves more complex data structure, and the interrelationship between data elements is generally not described in mathematical equation. Therefore, the key to solving such problems is no longer analyzing mathematics and calculation methods, but to design a suitable data structure to effectively solve the problem. 2. Narre-numerical issues Solving the famous Swiss Computer Scientist Worth (N.WIRTH) has proposed: Algorithm Data Structure = Program Data Structure: It is a logical structure and storage structure algorithm for data: it is the essential description of data operations. It is a good data structure for practical issues, and a good algorithm is a good algorithm to a large extent, depending on the data structure describing the actual problem. [Example 2.2] Phone number query problem. Compare a program that ques a private phone number in a city or unit. Require a name given, if the person has a phone number, quickly find its phone number; otherwise, the person does not have a phone number. To solve this problem, first construct a phone number registration form. Each junction in the table stores two data items: Names and phone numbers. To write a good lookup algorithm, depending on the structure and storage of this table. The simplest way is to store nodes in the table in the computer. When you look up, you will check your name from the beginning until you find the correct name or to find the entire table. This finding algorithm may be viable for a small unit, but it is not practical to a city with thousands of private calls. If this table is arranged in a last name, a last name index table can be used, and the storage structure shown in the following figure is employed. Then the lookup process is first checking the last name in the index table, and then verify the name in the phone number registration form according to the address in the index table, so that you don't need to find the name of the other last name. Therefore, the finding algorithm generated in this new structure is more effective. [Example 2.3] Time schedule for track and field competition.

Assuming that a school's track and field selection is set up a game, that is, high jump, long jump, jump, shot, 100 meters, and 200 meters running, which requires every player to participate in the game. There are five player registration competitions, and the items selected by the player are shown in the contest game project table. Now ask a contest schedule to arrange the competition in a short period of time. (1) In order to solve this problem better, you should first select a suitable data structure to represent it. 2 Indicates the data structure model of the problem. This picture is as follows (the vertex represents the competition project in the figure, and there is one side of the items that cannot be played at the same time). It is obvious that several items selected by the same player cannot be played in the same time, so the items selected by the player should be connected two or two. (2) The time schedule issue of the competition can be abstracted to "color" for the non-directional map: that is, use as few colors as possible to color each of the vertices, so that any two adjacent vertices of any two Different colors. Each color represents a game time, with the vertices of the same color can be scheduled to race in the same time. This is available: Just schedule 4 different time competitions. In the time 1, you can jump high (a) and javelin (c), time 2 can be raised (b) and lead ball (D), time 3 and time 4, 200 meters, 200 meters, respectively. A key step in solving the problem is that selecting the appropriate data structure represents the problem before you can write a valid algorithm. The description of the algorithm and the calculation of the analysis data is described by algorithm, and the discussion algorithm is one of the important contents of data structural courses. 1. The algorithm is in the form of an algorithm is an algorithm is a calculation process of any good definition. It uses one or more values as input and generates one or more values as output. (1) An algorithm can be considered to be a tool for solving a calculation problem. (2) An algorithm is a series of calculations that convert the input to the output. [Example 3.1] There is such a sorting problem: sort a digital sequence as non-descending order. The form of this problem is defined consisting of an input and output sequence that meets the following relationship: Enter: Number Sequence . Output: An enumeration of the output sequence makes A1'≤A2' ≤ ... ≤a3' for an input instance <31, 41, 59, 26, 41, 58>, sort The algorithm should return to the output sequence <26, 31, 41, 41, 58, 59>. (1) Enter an instance input instance: An input instance of a problem is to meet the restrictions given in the problem statement, which is constituted for all input required to calculate the problem. (2) The correct algorithm and incorrect algorithm are referred to if one algorithm can terminate and give the correct result for each input instance, the algorithm is called correct. The correct algorithm solves a given calculation problem. An incorrect algorithm refers to the answer to some input instances, or although the result gives the result is not eager to get the answer, the correct algorithm is generally considered. 2. The description of the algorithm an algorithm can be described in a natural language, a computer programming language or other language, and the only requirement is that the description must accurately describe the calculation process. In general, the most appropriate language describing algorithms is a pseudo language between natural language and programming languages.

Its control structure is often similar to PASCAL, C, etc., but there is any way to use any expressive ability to make algorithms more clear and concise, and not to fall into certain details of specific program languages. The C language describes the algorithm from the easy-to-machine verification algorithm and improve the actual program design capabilities. [Example 3.2] Define an error handling function that exits the run after output error message, which will simplify processing code in many programs. # include // The description of Exit # include //, standard error stderr's description void error (Char * Message) {fprintf (stderr, "error:% s / n" , message; // Output Error message exit (1); // termination program, return 1 to the operating system} algorithm analysis 1. The quality of the evaluation algorithm solves the same calculation problem may have many different algorithms. How do you evaluate the quality of these algorithms to select better algorithms? The algorithm selected should first be "correct". In addition, mainly consider the following three points: 1 Execute the time spent on the algorithm; 2 Execute the storage space consumed by the algorithm, which mainly consider the auxiliary storage space; 3 algorithm should be readily understood, easy to encode, easy to debug, etc. 2. Algorithm selection is difficult to do with small storage space, short run, and other performance. The reason is that the above requirements sometimes conflict with each other: the execution time to save algorithms is often the cost of sacrificing more space; and in order to save space, more calculation time may be taken. Therefore, we can only focus on the specific situation: 1 If the number of uses is small, the algorithm is simple and easy to understand; 2 For procedures for repeated multiple times, the fast algorithm should be used as possible; 3 If the problem is resolved The amount of data is extremely large, the machine has a small storage space, and the corresponding algorithm mainly considers how to save space. 3. Performance Analysis of Algorithm (1) Time and Statement Frequency A Algorithm Time = Execution Time of Performance of Each State in Algorithm and Perform Time of Performance Time (ie, Frequency) COUNT)) × Statement Executes a desired time algorithm to convert to a program, each statement executing the time required to perform a function of the machine's instruction performance, speed, and code quality of code generated by compilation. To analyze the flexible, hardware system to analyze the time spending of the algorithm, set the time required for each statement to execute, and an algorithm is the frequency of all statements in the algorithm.

[Example 3.3] Since the product of two N-class arrays C = a × b, the algorithm is as follows: # Define n 100 // n As needed, it is assumed to be 100void MatrixMultiply (int A [A], INT B [ n] [n], int C [n] [n]) {// The right side is listed as the frequency INT I, J, K; (1) for (i = 0; i T2 (N), and the latter takes less time. (2) As the problem N increases, the time overhead of the two algorithms is increased by 5 N3 / 100N2 = N / 20. That is, when the problem is large, the algorithm A1 is more effective than algorithm A2. Their gradual time complexity O (N2) and O (N3) evaluated these two algorithms in time quality from macro.

At algorithm analysis, it is often not distinguished to the time complexity of the algorithm and the complexity of the gradual time. F (n) is typically the frequency of frequency in the algorithm. [Example 3.8] The time complexity of the algorithm Matrixmultiply is typically T (N) = O (N3), f (n) = N3 is the frequency of the statement (5) in the algorithm. Another example of how to find the time complexity of the algorithm is exemplified below. [Example 3.9] Exchange the contents of I and J. Temp = i; i = j; j = TEMP; the frequency of the above three single statements is 1, the execution time of the block is a constant that is independent of the problem N N. The time complexity of the algorithm is constant order, which is recorded as T (N) = O (1). If the execution time of the algorithm does not increase as the problem N increase, even if there are thousands of statements in the algorithm, its execution time is just a large constant. The time complexity of such algorithms is O (1). [Example 3.10] One of the variable counts. (1) x = 0; y = 0; (2) for (k-1; k <= n; k ) (3) x ; (4) for (i = 1; i <= n; i ) (5) For (j = 1; j <= n; j ) (6) y ; in general, the step circulating statement only considers the number of statements in the cyclic body, ignoring the steps in the statement plus 1, final value Discriminate, control the transfer and other components. Therefore, the frequency of the frequency in the above program segment is (6), which is f (n) = N2, so the time complexity of the block is T (n) = O (N2). When there are several cyclic statements, the time complexity of the algorithm is determined by the frequency f (n) of the most innermost statement in the cycle statement of the number of nested layers. [Example 3.11] The two variable counts. (1) x = 1; (2) for (i = 1; i <= n; i ) (3) for (j = 1; j <= i; j ) (4) for (k = 1; k < = J; K ) (5) x ; the maximum frequency of the frequency in the block is (5), although the number of internal cycles is not directly related to the problem N, but is related to the variable value of the outer layer cycle, The number of outermost cycles is directly related to N, so the number of execution times can be analyzed from the inner layer to the outer layer analysis statement (5): the time complexity of the block is T (N) = O (N3 / 6 Low item) = O (N3). (4) The time complexity of the algorithm does not only depend on the scale of the problem, but also related to the initial state of the input instance. [Example 3.12] The algorithm for finding a given value k in value a [0..n-1] is as follows: (1) i = n-1; (2) while (i> = 0 && (a [i ]! = k)) (3) I -; (4) RETURN I; the frequency of the statement (3) in this algorithm is not only related to the problem N, but also values the elements of A in the input instance and K The value is related to: 1 If there is no element with k, the frequency f (n) = n of the statement (3) is = N; 2 If the last element of A is equal to K, the frequency f of the statement (3) (n) is constant 0. (5) The time complexity of the worst time complexity and the average time complexity is the worst time complexity. It is generally not specified that the time complexity of discussion is the time complexity of the worst case. The reason is that the time complexity in the worst case is the upper bound of the algorithm running time on any input instance, which ensures that the runtime of the algorithm is not longer than any longer.

转载请注明原文地址:https://www.9cbs.com/read-67242.html

9cbs

New Post(0)