SQL Story excerpt (4) ---- Information Mining Preliminary

zhaozj2021-02-11  193

?

The initial idea of ​​writing this book comes from the communication and learning of colleagues when working in the first company. However, the fuse of this book is to read a post on the latest information on the latest information on the latest information on 9CBS. The problem in the post can be done in two ways of subquery and join. Due to the limited conditions, I can't answer them in detail, and the friends who posted can't understand my original intention, let my heart regret. So decided to put the idea of ​​the book into action, and put this book on 9CBS, communicate with everyone, and make progress together. Today, I just see the problem similar to the day, my heart is touching, and I decided to discuss it in detail here.

In actual work, we sometimes need to establish a data sheet to store the data of the changes, and statistics our information we need by these data. It is characterized by a class of problems in the final result of filtering conditions from packet statistics. Such applications are common in website databases, accounting systems, real-time systems, data warehouses and data mining. In fact, this proposition itself has included data mining. Now let's see the example below

Example 4-4-1: Latest offer

Netizen Kikilyq asked:

I have a table: computer_price, the format is as follows:

Goods price dates

--------------------------------

HP Computer 20000 5.21

HP Computer 20050 5.23

NEC 31200 5.3

NEC 32000 5.5

Query results Requirements: To find out the latest price of each computer;

The result of the above table is:

Goods price dates

---------------------------------

HP Computer 20050 5.23

NEC 32000 5.5

Help, get this problem?

According to the problem, we first establish a data sheet. After analysis, the data should be marked by the goods name and date, so these two fields are the main keyword:

SQL Server version is as follows

Create Table [DBO]. [Goods] (

[Goods] [25) Collate Chinese_PRC_CI_AS Not Null,

[Price] [Money] Not null,

[DATE_TIME] [datetime] Not null,

PRIMARY Key (Goods, Date_Time)

) On [primary]

The Interbase version is as follows

CREATE TABLE GOODS

(

Goods char (10) Not null,

Price Numeric (15, 4) Not null,

Date_time TimeStamp Not Null,

PRIMARY Key (Goods, Date_Time)

)

After the establishment, the reader will insert the data itself.

In this problem, the price of the computer in the final report depends on its subsequent quotation, which is the latest line of quotation. Typically belongs to the type mentioned earlier. Initial, I will try to select the latest offer date for each computer, this is relatively simple:

SELECT Goods, Max (Date_Time)

From goods

GROUP BY Goods

Return the result

Good

---------- -------------------------------------------------------------------------------------------------------------------------------- ----------------

HP 2002-05-23 00: 00: 00.000

NEC 2002-05-05 00: 00: 00.000

Obviously, as long as each brand of computers are displayed in the above date, it is what we have to have. So how do you write directly?

Select Goods, Price, Max (Date_Time)

From goods

GROUP BY Goods

I think this statement doesn't have to tried. The slight experience will find that the Price column is not in the statistical function, nor the Group By, the database system cannot perform such a statement. Such columns must be removed from another data set, so I first thought of self-join. But I believe that most friends will first think of subquery. Now let's take a look at how the child is doing, after all, this is more intuitive. The most lazy way is to express the price to an inserted standard query:

SELECT L.GOODS,

(Select R.Price from Goods R where r.goods = l.goods and r.date_time = max (l.date_time)) AS Price,

Max (l.date_time) AS current_date_time

From Goods L

Group by l.goods

But unfortunately, this statement can only be executed in SQL Server, and Interbase's prompt is the Invalid Column Reference (invalid column reference). However, you can write a universal version for a thinking. Now, we say the system, I have to take some lines from the table, each brand of computer, the date is the latest offer date of this brand:

Select L.Goods, L.PRice, L.Date_time

From Goods L

WHERE L.DATE_TIME =

(SELECT)

Max (r.date_time)

From Goods R

Where r.goods = L.GOODS

GROUP BY R.GOODS)

This one

I think I can still write a few different subquery variants, but the same size is different, I don't try it. This version seems to be a little unreliable, because only DATE_TIME in the WHERE condition of the primary query does not seem to have an accurate area to separate each row of data. But rest assured, here has a "strange" mutual reference, the record of the primary query is to meet the date equal to the return value of the child query, while the subquery of the goods name (goods column) depends on the goods name of the primary query (goods column). In this way, the subquery will return its correct date for the current brand, which is the stunts of the associated subqueries, which also caused the reason why it is more efficient in many occasions. My interest in the child is here, I said in the previous article, the connection query is a very good technology, then is this inquiry that may be implemented? The analysis of the subquery in front will help us think here. Now that we have two datasets, a maximum date, a price, join them, can n't you? The two result sets are in the subquery above, the current problem is how we join them, clearly, there is a joint condition is R.Goods = L.Goods, which also determines the unique identity of the final result set. One - the goods column, and the filter criteria of the date is l.date_time = max (r.date_time), plus the column in the Group BY result, so there is:

Select L.Goods, L.PRice, L.Date_time

From Goods L

Join Goods R

On L.Goods = R.GOODS

Group by l.price, l.goods, l.date_time

Having l.date_time = max (r.date_time)

Some friends may not understand why this group By is coming, simply. We did the self-heaters of a Goods table (two datasets were L and R), they generated a Dikar. Some people call it data explosion. However, after the ore blows, it is convenient for us to find gold. Using the Join Conditions, the L Data Concentration is in the R data set, all of its own quotation date. Now we have to group the R data collection group, select the latest quotation date of each brand, with its filter L, the data set should use group by r.date_time

Having l.date_time = max (r.date_time)

Since we have to take out the three columns in the D data concentration, you have to list them in Group By, because there is already L.Goods = R.Goods, R.Goods is not written in Group By. Since I mentioned the reasons I mentioned in the previous article, I personally prefer to use join. Of course, in terms of efficiency, when the data of each group in the table is much more, the efficiency of the connection query is offset by huge Dikar, and the app (if this is a table that records the production line instant work). Statistics Its latest production situation, usually this); there is very little data in the table, but there are many data groups (such as large online bookstores, users may be an astronomical figure, relatively speaking, the book bought Not much), the child query still needs to traverse the entire table, the efficiency is too low, and the Dikar, which is the Dikar, which is successful, and exchanges in space. Advantage At this time, you should use a join query.

The return result set of the above programs is:

Goods Price Date_time

---------- --------------------------------------- -----------------------------------

HP 20050.0000 2002-05-23 00: 00: 00.000

NEC 31000.0000 2002-05-5 00: 00: 00.000

In addition to specifically indicated, the above scripts can be universal in MS SQL Server7.0 or higher and interbase6.0.1 or higher. Mastering the way to design the above scripts can be widely used in the various SQL programming of the query condition. From this, we can also discuss two interesting aspects: join queries and data packets.

Attachment: This connection query script should also be posted behind Kikilyq, but because I stupidly posted two not the post (a wrong, a stupid), 9CBS webpage does not allow me to go again Kikilyq's post is followed. Please kikilyq to read it here. Sorry.

About "SQL Story": This book is in proposing some common SQL programming problems, summarizes some relational database design and SQL language programming mode, through practical problems, helps readers to improve database programming capabilities, this The book will also involve the theory of relationship algebra. I plan to collect some typical SQL programming issues on 9CBS, as an important part of this book. The author guarantees that no one's labor results, all answers and analyzes are never faked. If you have a master pointing, I will definitely point out and give you a thank you. Welcome friends from all users to provide materials, first express gratitude. In addition, I hope that there is an Oracle master to work with me to complete the Oracle section of this book. Share successful joy and labor hardship.

The following is the "SQL Story" outline I have developed, that is, now the example of each article will have a certain position of this book at that time, and the current number will now be changed. Foreword

Chapter 1 Basic Principles

First, before the encoding

Second, the coding principle

Third, SQL script design principle

Chapter II Truth of Relational Database

First, the collection of vector space

Second, the information in the collection

Third, the collection operation

Fourth, the real world

Chapter 3 is not simple and simple query

First, choose information

Second, the conditions filter

Third, order and disorder

Fourth, information statistics and data mining

Chapter 4 Space "Multiplication" - Database & Sql Blog Articles

First, Dikal Consultation and Information Explosion

Second, a higher dimensional space

Third, join inquiry

Four, Join and WHERE

Chapter 5 SQL "Sentence" - subquery

First, from sentence syntax

Second, the cost assessment

Third, live subscription

Chapter VI Collection Writes - Ninert, Delete and Update

I. Design method

Second, delete

Third, insert

Fourth, change

Chapter VII Ultimate Weapon - Cursor

First, why

Second, how to do the cursor

Third, the tag special

Fourth, procedural coding

Chapter 8 Impossible Tasks

I. Funny problem highlights

Second, no limit

Third, unlimited information set

Fourth, explore collection

Chapter 9 is not just SQL - database management

First, reasonable index

Second, the code for lasting - stored procedures, storage functions, and triggers

Third, view usage

Fourth, the reasonable way of data storage

postscript

Attached: Alternative sound

转载请注明原文地址:https://www.9cbs.com/read-4602.html

New Post(0)