As per the promise I have made in “Some
basics about Index in SQL Server” article, I am coming up with each Indexes
of SQL Server and first and most important is “Clustered Index”.
Understanding of
Clustered Index:
As I told in previous article too that Index is the key of
performance, good and managed Index could boost up your speed of retrieving of
data from table.
Clustered Index contains actual data of the table in leaf
level pages in logically sorted order (to understand root and leaf level page, click
here). Since it is logically sorted, it doesn’t need to go for all data
present there in Index.
For example if you are looking for the phone number of “Ritesh
Shah” in telephone directory, you can move to a page which has phone number of the
person whose first name starts with “RI” and once all instance of “RI” over in
directory, you don’t need to move on as there is NO CHANCE, you will get “Ritesh”
anywhere in directory except those particular pages.
BTW, if you don’t have clustered index on your table, than
your table would be called “HEAP”,
which wouldn’t not have logically sorted data so if you are searching for “Ritesh
Shah”, you can to check complete table as you never know, where you will find “Ritesh
Shah”, just one method, go each and every row of table and check for matching
criteria.
Like any other columns, you can define clustered index in
more than one field too and all the columns covered up under the index, called
key column.
While choosing a prime candidate for Clustered Index column
in your table, you have to select the columns which meet few of the general
criteria defined below. (you can say following criteria as a best practice
while choosing index candidate)
--> You key
column or combination of key columns should be unique and not null. If your You
key column or combination of key columns are not unique than SQL Server has to
add one more hidden column of 4-byte INT to make it unique. However, you can’t
see that hidden column neither can query it directly; it would be purely for
SQL Server’s internal use.
--> It should
be short as wide key value would increase the depth of Clustered Index and will
reduce the performance a bit and also increase the size of non-clustered index
as it is being there as a reference in all non-clustered index.
--> Select
less changing or no changing fields for you clustered index as Key value
indicates the location of page where actual data resides, if you change this
key value, row has to be deleted from that page and has to move to another
appropriate page which reduces the performance and increase unnecessary
overhead to IO.
Generally whenever you make Primary Key in any of your
table, SQL Server itself create clustered index on it but if you want to keep
clustered index on any other column(s) due to high selectivity on those
column(s), you can do it.
Have you got bored of so long theory? Let us do some practical
and check it out?
--create one
database which you can delete after running this example
create database SQLHub
GO
USE SQLHub
GO
--if orders
table is already there. you can delete it than create new one with name
"Orders"
IF OBJECT_ID('orders', 'U') IS NOT NULL BEGIN
DROP TABLE
orders
END
GO
--creating table
CREATE TABLE orders (OrderID INT IDENTITY, OrderDate DATETIME, Amount MONEY, Refno INT)
GO
--inserting
100000 fack rows into table (BTW, thank to Jacob Sebastian, copy in INSERT
script from one of his demo to generate big table)
INSERT INTO orders (OrderDate, Amount, Refno)
SELECT TOP 100000
DATEADD(minute, ABS(a.object_id % 50000 ), CAST('2010-02-01' AS DATETIME)),
ABS(a.object_id % 10),
CAST(ABS(a.object_id) AS VARCHAR)
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
--creating
clustered index, however it is not meeting the criteria I have
--given in
article, but this is ok for example
CREATE CLUSTERED INDEX
idx_refno ON orders(refno)
GO
--run both of
the following query with execution plan and see the results in execution plan
--you can see
execution plan with the following steps
--first select
both of the below given query
--Press Ctrl+M
--press F5
SELECT TOP 10 * from orders where
refno=4
SELECT TOP 10 * from orders where
OrderDate = '2010-02-01
00:04:00.000'
GO
--if you wish,
you can uncomment below code and delete SQLHub database
--use master
--go
--drop database
sqlhub
If you execute both the query with “Actual Execution Plan”,
you will see first query is having Index Seek and second query is having Index
Scan.
Seek and Scan is really interesting topic which I will cover
later but just keep in mind that, Seek is good, Scan is bad as it will check
all records of the index.
Reference: Ritesh Shah
http://www.sqlhub.com
Note: Microsoft Books online is a default reference of all articles but examples and explanations prepared by Ritesh Shah, founder of http://www.SQLHub.com
http://www.sqlhub.com
Note: Microsoft Books online is a default reference of all articles but examples and explanations prepared by Ritesh Shah, founder of http://www.SQLHub.com
Ask me any SQL Server related question at my “ASK Profile”
0 comments:
Post a Comment