In "database principle", the interpretation of the clustered index is that the order of clustered indexes is the physical storage order of data, and the interpretation of the non-clustered index is that the index order is independent of the order of the data. It is formally because of this, so a table can only have a clustered index.
However, this definition is too abstract. In SQL Server, the index is described by the data structure of the binary tree, and we can understand the clustered index: the indexed leaf node is the data node. The leaf node of the non-clustered index is still an index node, but there is only one pointer to the corresponding data block. As shown below:
Non-clustered index
Clustering index
What is the essence of clustered indexes and non-clustered indexes? When is the clustered index, when is the non-clustered index?
This is a very complicated issue, it is difficult to say clearly in three words. I am here to simply talk about the perspective of SQL Server index optimization query (if you are interested in this, you can read the data structure of the "Microsoft SQL Server 2000 Database Programming" third unit of Microsoft publishing, and 6th, 13, 14 units).
First, the difference between index block and data block
Everyone knows that the index can improve the search efficiency because its binary tree structure is small, so the speed block is accessed. Let us count a mathematical question: If a record in the table occupies 1000 bytes on the disk, we establish an index for a field of 10 bytes, then the record corresponding to the index block is only 10 bytes. We know that SQL Server's Minimal Spatial Assignment Unit is "page", a page occupies 8K space on disk, then this page can store 8 records above, but can store 800 indexes. Now we have to retrieve a record in a table with 8,000 records. If there is no index, we may need to traverse 8000 × 1000 bytes / 8k bytes = 1000 pages to find results. If there is the above index on the search field, then we can retrieve the index block that meets the conditions in 8000 × 10 bytes / 8k bytes = 10 pages, and then find the result data block according to the pointer on the index block. This is much more likely to have a much less access.
Second, index optimization technology
Is there an index that must be retrieved? Whether the answer is. Sometimes it is better to use an index. For example, we want to retrieve all records in the above table. If you do not need to index, you need to access 8000 × 1000 bytes / 8K bytes = 1000 pages. If you use an index, first retrieve an index, access 8000 × 10 bytes / 8k byte = 10 pages Get the index retrieval result, then go to the corresponding data page according to the index search results, because all data is retrieved, so you need to access 8000 × 1000 bytes / 8K bytes = 1000 pages to read all data Take it out, a total of 1010 pages have been visited, which is obviously better than indexing.
SQL Server has a complete set of data retrieval optimization techniques, in which case SQL Server's query plan automatically retrieves data from table scans without any indexes. So how do SQL Server know when using an index, when do you not use indexes? In addition to everyday maintenance data information, SQL Server also maintains data statistics, the following figure is a screenshot of the database property page:
From the figure, we can see that SQL Server automatically maintains statistics, including data density information and data distribution information, which help SQL Server determines how to use the query plan and whether the search index is used and what kind of index is used ( I will no longer explain how they help SQL Server build a query plan). We still come to do an experiment. Establish a table: TabTest (ID, unqValue, intValue), where ID is the shaped automatic number main index, unqValue is a UNIQUEIDENTIFIER type, established a normal index above, intValue is plastic, and does not establish an index. The reason why there is no indexed intValue field is to prevent SQL Server from using index override query optimization techniques, so that the experiment does not work. Enter 10,000 random records in the table, the code is as follows: CREATE
TABLE
[
DBO
]
.
[
TabTest
]
(
[
Id
]
[
int
]
Identity
(
1
,
1
)
NOT
NULL
,
[
UnqValue
]
[
UniqueIdentifier
]
NOT
NULL
,
[
INTVALUE
]
[
int
]
NOT
NULL
)
On
[
Primary
]
Go
ALTER
TABLE
[
DBO
]
.
[
TabTest
]
WITH
Nocheck
Add
Constraint
[
PK_TABTEST
]
Primary
Key
Clustered
(
[
Id
]
)
On
[
Primary
]
Go
ALTER
TABLE
[
DBO
]
.
[
TabTest
]
Add
Constraint
[
DF_TABTEST_UNQVALUE
]
DEFAULT
(
NewID
())
For
[
UnqValue
]
Go
Create
Index
[
IX_TABTEST_UNQVALUE
]
On
[
DBO
]
.
[
TabTest
]
(
[
UnqValue
]
)
On
[
Primary
]
Go
Declare
@i
int
Declare
@V
int
set
@i
=
0
While
@i
<
10,000
Begin
set
@V
=
Rand
()
*
1000
insert
INTO
TabTest
[
INTVALUE
]
)
Values
(@V)
set
@i
=
@i
1
end
Then we execute two queries and view the execution plan, as shown: (You can open the query plan in the query analyzer's query menu, the first query on the map is what I found from the database, everyone is experimenting You can set according to the value in your own database:
As can be seen from the figure, in the first query, SQL Server uses the IX_TABTEST_UNQVALUE index. According to the arrow direction, the computer is found within the index range. After found, use BookMark Lookup to map the index node to the data node, finally Select results. In the second query, the system is directly traversed, but it uses a clustered index, why? Don't forget, the page node of clustered index is data node! This uses the cluster index faster (not affected by data deletion, updated storage, direct traversal data is to skip these caves). Below, we change the clustered index of the ID field in SQL Server to non-clustered indexes, then execute Select * from tabtest, this time we see the execution plan becomes:
SQL Server did not use any index, but directly executed Table Scan, because only this is the highest.
Third, the essence of clustered indexes and non-clustered indexes
It is now possible to discuss the essence of clustered indexes and non-clustered indexes. As shown in the two figures mentioned above, the leaf node of the clustered index is the data node, and the page node of the non-clustered index is still an index detection and retains a link to point to the corresponding data block.
Did you look at their differences by a mathematical question: Suppose there is a table of 8,000 records, each record in the table occupies 1000 bytes on the disk, and if the non-clustered index primary key is established on a 10-byte length field Need 16,000 binary tree nodes (8000 leaf nodes in this 16000 nodes, each page node points to a data record), so data will occupy 8000 × 1000 bytes / 8K bytes = 1000 pages; index will Take up 16000 nodes × 10 bytes / 8k bytes = 20 pages, a total of 1020 pages.
The same table, if we establish a clustered index primary key on the corresponding field, because the page node of the clustered index is the data node, the index node has only 8,000, occupying 10 pages, and the data still has 1000 pages.
Let's take a look at why the primary key of the non-clustered index is fast than the clustered index when performing insertion operation. The primary key constraint requires that the primary key cannot be repeated, then how does SQL Server know that there is no repetition? The only way is to retrieve. For non-clustered indexes, you only need to retrieve the 16,000 nodes in 20 pages, you know if there is any duplication because all primary key values are included in this 16,000 index nodes. However, for clustered indexes, index nodes only contain 8,000 intermediate nodes, as for repetitions that must be retrieved must retrieve another 8,000 page data nodes, which is equivalent to retrieving 10 1000 = 1010 pages to know if there is repetition. Therefore, the insertion speed of the clustered index primary key is much slower than the insertion speed of the non-clustered index primary key.
Let's take a look at the efficiency of data retrieval. If you retrieve the above two tables, in the case of using the index (sometimes the SQL Server executive plan will choose not to use index, but we will assume a certain index, for clusters Index retrieval, we may access 10 index page and add 1000 data pages to get results (actual situation is better than this), and for non-clustered indexes, the system will find qualified nodes from 20 pages, and map it to On 1000 data pages (this is also the worst case), compares a 1010 pages and another visit 1020 pages, which can be seen that the detection efficiency is not very big. Therefore, regardless of the non-clustered index is also a clustered index, it is suitable for sorting, and the clustered index is only a bit higher than the non-clustered index. Conclusion
Ok, I wrote for a long time, my hand is tired. The experiment regarding clustered indexes and non-clustered index efficiency issues is not done. If you are interested, you can use the query analyzer to analyze the query plan. SQL Server is a very complex system, especially indexing, and query optimization technology, Oracle is more complicated. Understanding the index and what is not bad behind, it can help us understand our system more profound. Reprinted · http://www.cnblogs.com/zhenyulu/articles/25794.html