3. Case Analysis
To illustrate the role of the above-mentioned feature functions in accelerated query processing, let us analyze an example.
Try to examine a table Students (Name, Status, ParentIncome,) (Name, Status, ParentIncome,
Where Name is the primary key, attribute from the parent, when the status is 0, when the value is 0, the Name, INCOME is 0 value) or the receiver from the parent is a label amount, when STATUS, indicating that the student's income is completely The results of your own query, where INCOME is educated (when the value is 1 when the corresponding status is 1 value, it shows that the students' income is completely income. For this table, suppose we want to be born with your own income (when corresponding STATUS taken).
Semantic analysis from the structure and query results of the table students, the general method of completing the query should be
SELECT Name, INCOME = ParentIncome
From student
Where student = 1 Union (9)
Select Name, INCOME = SelfIncome
From student
Where student = 0
This is a very natural, very straightforward query expression to perform this query general process is: First, the intermediate results are placed inquiry and two sub-sequencing to eliminate possible repetition values. At this point, STUDENTS should be traversed twice and to work on the intermediate results. Query (9) The only advantage is that it is expressing, but it is also a very low-efficiency and very expensive expression. The row is queried by the operator Union, then generating a query is stored in this temporary table, and the third step is to obtain the final query result. In such a process, in addition to the sequencing of the entire table, the troubles and resources of the resource is obviously straightforward, no one wants to get.
For this example, there is a more compact and more effective query expression. For example, it is not difficult to verify the following query
Seiect name, income = ParentIncome * Status SelfIncome * (1-status)
From students; (10)
From the semantics and query (9), because it only traverses a table STUD fruit, different queries are expressed in processing, not only necessary and complete equivalent. However, the query (10) not only consumes ENTS but also avoids terrible sorting operation efficiency and resource consumption may be derived. It takes less storage and more effective. This example shows that the same query is far away. Therefore, seeking effective query expressions
The query expression (10) and the image (clause and operator Union explicitly give the query expression in what form, this expression (10), it is not difficult to find that only the problem is slightly changed (for example,, so, wait) If you do not express any explicit expression in the WHERE case, this is our routine expression in 9). The first person indirectly hides the query class in the case of "Condition Search". In order to give the value of such a simple and correct resequent resequent, it is simple to take 0 and 1. Therefore, there is a very clause and the selection conditions in the relevant operator to find the "Feature Function Method" to be described below. Then, the latter's query conditions are from the arithmetic expressions of two WHERESELECT clauses. Embider. If you have a query requirement and query, it is really a bit "thing." If you have more than two lack values, you can make me an arithmetic expression that is equivalent to the semantic equivalent? Answer 4. Several characteristic function solutions for several typical queries
As mentioned above, the feature function can implement me, the most direct and easy application of the feature function is the need for a complicated interview with a complicated example. For some instances, we will also show that the feature function of its application is not expanded with a meta function. Therefore, (5), (6) replaces the feature function here. Their desires are to convert explicit Boolean conditions into scalar expressions. For conditions for conditional search, it does not only stop here. The role of the inquiry, this section will be easily introduced and analyzed in several typical fields. In order to express more compact, all appears in the scalar expression, if you want to verify the instance here, you must first take the same
4.1 Condition Retrieval
The query given by (10) can be used by means of the characteristic function
SELECT NAME,
INCOME = ParentIncome * D [status = 1] selfincome * d [status = 0]
From student (11)
If the search criterion is only more than this retrieval condition is much more complicated, plus the students' students relying on the parents of the students, all three groups. In the results of the query result, the income of the income and the students' own income are required to use the conventional method to process the inquiry. Requirements, (11) replacing (10) does nothing. For example, if the requirements of the above examples have different (INCOME) columns in the age of 19 and 23 have the age of more than 23 years old. The third group of students, corresponding to In the first two, such conditions appear to be very complex extension. Control the expression of inquiry in nature. However, the modifications in the actual problem are reserved in the STATUS's original semantic, and the age is not more than 19 years old and the students are the second group. All other students are the first two groups of students, respectively correspond to their parent group students. The arithmetic mean of income. Has habits. In fact, this is a natural requirement, equation (11), not difficult to verify
SELECT NAME,
INCOME = ParentIncome * D [ATATUS = 1] * D [age <= 19] selfincome * sign (D [status = o] D [AGE> 23]) (ParentinceOMe Selfinco Me) / 2.0* (1 D [status = 1]) * D [age <= 19] -sign (D [SIGN = 0] D [AGE> 23]) from students; (12)
It is the effective expression of the above query, which has a typical cascaded IF & # 0; no matter how complicated inquiry conditions (for example, more benefits), the conditional search types have For example, the more levels, the more cascades, but the regular arithmetic expression is only traversed to the table student. On the contrary, if you answer, the final result is that all child query results are brought out .. Expression from INCOME In the form, the requirements with the query conditions are completely THEN & # 0; ELSE structure. In general, in the condition of the participation of the feature function, the same property value is divided into more sections,), (12) That is a typical structure. Different are only the logical structure of the strip is the same. All such queries are expressed, in the execution, the method is solved, in principle, each classification condition requires a sub-query Union operation. Two kinds of effects of expressing two effects, the best
4. 2 histogram problems
Ask the histogram is statistical applications, solving the process of conventional methods is not a process of processing is very efficient and very straight in Table Employee (Name, Age, Few, there is a child, there are two or often solved The problem is a very relaxed task. However, borrowing aid. To illustrate this, let's see the statistics of the DEPT, KIDS, where Kid is also the statistical results of Kids, Manykids, three children and three The above children can successfully solve problems smoothly, not only a specific example. Assume that the statistics indicate the number of children of each employee. It is required to give an exhibiting the total number of employees with no children in all employees, respectively.
If you use a conventional method, you need to find the value of EmployManykids, and then divided into 4 segments but 8 or more sections by 3 UNION operations, and the solution is clearly EE four times, each calculates Nokids, Onekids, Fewkids and the final result. If the original problem is not more obvious, it is more obviously nothing to minimize the child's number of people's number of people. Use features, the above
Select Nokids = Sijm (D [KIDS = 0]),
Onekids = SUM (D [kids = 1])
Fewkids = SUM (D [2 <= kids <= 3])
Manykids = SUM [KIDS> = 4]) from Employee; (13)
The correctness of this query is easy to verify: for any line in the table, if kids = 0, D [kids = 0] = 1 and D [kids = L] = D [kids> = 4] = D [2 < = KIDS <= 3] = 0, so the row is in the section Nokids and is not in any three segments in any other three segments. For the other values of KIDS, this indicates that (13) is exactly The results required for the original issue. It is important that this result is not only correct, and the way to get this result is very effective because of the traversal of the table. If the employee is divided into more value segments, the query processing using the feature function is still only traversed once, not more, and the difference is only in the selection of the calculation in the table, the query is expressed Logical complexity has not increased.
The same basic problem can also be difficult. Guide different variations. If there is no basic solution foundation, these variations are solved directly
One of the variations: Get the same requirements (DEPT, NOKIDS meter Employee, the same part of the employee is not in the same), and the same sector is calculated by the same sector, namely the result of the child number.
The solution of this problem is clearly the following query expression.
SELECT DEPT.
Nokids = SUM (D [kids = 0]),
Onekid = SUM (D [kids = 1])
Fewkids = SUM (D [2 <= kids <= 3]
Manykids = SUM (D [kids> = 4])
From Employee; Group by DEPT; (14)
Two variations: In accordance with the age section, the employee is asked for a segment, namely, less than 25 years old, greater than 45 years old. This problem is actually the result of requesting the table Empfewkids, Manykids, wherein the histogram of the child is distributed. For sure, employees are divided into three and ages between 25 and 45 years old, respectively, respectively, respectively, such as (Agecategoy, Nokids, ONEKIDS).
1 if agn <25
D (a) = 2 If 25 <= age <= 45 3 If AGE> 45
(15)
Although this problem has quite difficult Sybase's Transact SQL is a needed answer, but for the allowable expression in GR), the answer is also directly when it is. The system in the OUP BY sentence (for example, it is not difficult to verify the following query expression is our
SELECT AGECATEGORY = 1XD [AGE <25] 2 × D [25 <= age <= 45] 3 × D [AGE> 45], Nokids = SUM (D [kids = 0]),
Onekids = SUM (D [kids = 1]),
Onekid = SUM (D [kids <= 3 and kids> = 2])
Manykids = SUM (D [kids> = 4])
From Employee
Group by 1 'd [age <25] 2 × d [25 <= age <= 45] 3 × D [AGE> 45]; (16)
This problem is only in Agecaegory, according to the definition of (15), it is easy to select the table and the Group By clause (16) is indeed the query we need.
Going down along this idea, you can also deal with more effectiveness. complicated question. When the histogram is getting more and more "wide", the characteristic function
4.3 table transposition
Table transformation is a problem that is often encountered in the transformation process design. CDATE is shown in the previous form of Chinese. Given the intention of the set function of the set function of SQL. It transforms a narrow and long table into one. J Date has long noticed this "column" as a "column" representation, and the top-oriented representation rather than the advantages of routine. Therefore, the basic principle of the base table is given, which is the general principle of processing this problem in the database application. The latter form "is referred to as" row "expression, so the column indicates that it is convenient to operate more considers to adopt a column representation.
A table Bonus (NAM shows, for example, you can write a Bonus bonus list for the base table indicated by the column,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, The feature function is to implement a table transposition E, MONTH, AMOUNT. This episode '(Name, Janamount, ..., DECAM is the easiest. But this line is represented by the table. In contrast, the column of the Bonus represents the powerful master indicating the feature function of the above requirements. For example, inspection A record is a column representation, relative to its row table Ount). If you want to get every employee, it is essentially effective to this query, but also can answer the above queries but also
SELECT NAME,
Janamount = SUM (Amount × D [MONTH = 1]),
Febamount = SUM (Amount × D [Month = 2]),
.
Decamount = SUM (Amount × D [MONTH = 12]) from Bonus
GROUP BY Name; (17)
Readers may wish to think about how to meet the characteristic function for the same query for Bonus tables.
4.4 Search Number
In experimental data processing, there is often a number of sets of digits, there are two definitions, namely, statistical definitions, and one of this group. Therefore, when there is even several, it is necessary to depend on the specific application background. The number of latter numbers of the median is odd (set to n), then the median definition, the median number of the number of numbers is used to handle this problem, but also write this A very simple solution for the problem. There is no repetition value in this set of data for determination. In addition, in addition, the median of the data set DATA is the "median" (MEDIAN requirements. It is well known, regarding the "financial" definition. According to the statistical definition, When the number of bits must be a number, you must make a selection in two numbers, or the arithmetic average of two numbers (when there are even numbers) are defined. For example, the number is the array (N 1) / 2 numbers. No matter what is used, there is a problem that is a trained SQL programmer, it is easy to think of a complicated process. However, borrowing the characteristic function, it is easy to get, consider such an experimental data Data (Value). This shows that we have to assume that all data is non-empty data. The result of the inquiry statement:
Select X.Value from Data X, Data Y
GROUP BY X.Value
Having Sum (D [Y.Value <= x.Value]) = (count (×) 1) / 2 (18)
Because for each X. Value, the result of the expression sum (D [Y.Value <= x.Value] is a number of data sets less than or equal to the value data, so the desired median is selected by this haVing clause ( The careful readers are not difficult to find that we use the result of the Sybase two integers to remove, and the value obtained after the actual removal is taken again. In addition, when the data set contains even elements, the number is two numbers. A smaller one, this is in line with the statistical definition of the meditile.)
The method of the middle border is very examined in the data set Data 2 (Partit is any data type, the whole number is the median of each subset: easy to extend to the data set to some attributes and ion, value), where Value Still segmentation to the case where it is divided into several subsets with this attribute. For example, an empty number value, but attribute partition can dry a subset. The result of the following query statement
SELECT X. Partition, x. Value
From data2 x, data2 y
WHERE x. Partition = y. Partition
GROUP BY X. Partition, x. Value
Having SUM (D [Y.Value <= x.Value]) = (count (x) 1) / 2 (19) The self-linking in the above formula is the need for the table segmentation, in addition to the other two methods Difference.
4.5 Didowing value
In some practical problems, the maximum SAT2 in these data items will be obtained, of which the list of SAT 1 and SAT 2 scores, that is, as well as (the best grades. The line data of the design contains several or The smallest called "end value problem". Example represents the results of the student's two exams. Assuming the results of name, bestsat, where B can compare the analysis of the data items. As, in the exam, the best Estsat in each of the two exams is required to give each student twice a score.
Some database systems (such as oracle) have internal functions Greatest (Value 1, Value 2 ...) for direct solutions. In systems that do not have such functions (such as Sybase, etc.), general solution is that the first traversal full table can meet the conditions, SAT 1> = SAT 2 SAT 1, the second traversal full table is satisfied with the condition SAT 2> SAT 2 SAT 2, then the intermediate result is in the UNION operation to obtain the final result.
With the feature function, it is only necessary to scan the table over and the query expression is very simple, namely:
SELECT NAME,
Bestsat = SAT 1% D [SAT 1> = SAT 2] SAT 2 × D [SAT 2> SAT1]
From score; (20)
Assuming that we don't just get everything, it is only necessary to choose the best results in the two exams in (20), namely: and I want to know which test is.
Select Name
Bestsat = SAT L% D [SAT1> = SAT 2] SAT 2% D [SAT 2> SAT 1] Whichsat = 1% D [SAT 1> = SAT 2] 2% D [SAT 2> Sat L]
From score; (21)
This result is only obtained in SAT 1 = S). In addition, readers who don't have to be interested may consider this AT2 a bit ambiguous but not bad (there is any explanation in this. The above only considers a variant of a problem and its solution, (21) believes It is the maximum test for the first time, and the minimum value can be applied.
5 issues worthy of further thinking
The feature function given by (5) is improved in the computational complexity. Second, it is to ensure the metafunist ABS (in the intensity of this condition), it is not the only form, which is not the only form. Therefore, the choice has a lower plan, this article begins with the beginning of this article In the characteristic function) and Sign () have definition and entire (5 energy. Considering almost all mainstreams must be. Other expressions. Different element calculation complex functions, it is possible to further The attribute must be the assumption of non-empty quantities). Therefore, the traffic library system supports three-value logic (3VL), although this article only considers the characteristic inspection in the query. In addition, in E. Birger, etc. (for example, the properties in the feature function, like data type, etc.). All of this has the role of it. We believe that the same idea should also be in the framework of the database update people, and there is also a significant expandable can be either an ordinary table property, and it should be permissible. Limited to our further work to excavate or make an analysis judgment.