Clustered Index in SQL Server

Wednesday, June 15, 2011 |


As per the promise I have made in “Some basics about Index in SQL Server” article, I am coming up with each Indexes of SQL Server and first and most important is “Clustered Index”. 

Understanding of Clustered Index:

As I told in previous article too that Index is the key of performance, good and managed Index could boost up your speed of retrieving of data from table.

Clustered Index contains actual data of the table in leaf level pages in logically sorted order (to understand root and leaf level page, click here). Since it is logically sorted, it doesn’t need to go for all data present there in Index. 

For example if you are looking for the phone number of “Ritesh Shah” in telephone directory, you can move to a page which has phone number of the person whose first name starts with “RI” and once all instance of “RI” over in directory, you don’t need to move on as there is NO CHANCE, you will get “Ritesh” anywhere in directory except those particular pages.

BTW, if you don’t have clustered index on your table, than your table would be called “HEAP”, which wouldn’t not have logically sorted data so if you are searching for “Ritesh Shah”, you can to check complete table as you never know, where you will find “Ritesh Shah”, just one method, go each and every row of table and check for matching criteria.

Like any other columns, you can define clustered index in more than one field too and all the columns covered up under the index, called key column.

While choosing a prime candidate for Clustered Index column in your table, you have to select the columns which meet few of the general criteria defined below. (you can say following criteria as a best practice while choosing index candidate)

--> You key column or combination of key columns should be unique and not null. If your You key column or combination of key columns are not unique than SQL Server has to add one more hidden column of 4-byte INT to make it unique. However, you can’t see that hidden column neither can query it directly; it would be purely for SQL Server’s internal use.

--> It should be short as wide key value would increase the depth of Clustered Index and will reduce the performance a bit and also increase the size of non-clustered index as it is being there as a reference in all non-clustered index.

--> Select less changing or no changing fields for you clustered index as Key value indicates the location of page where actual data resides, if you change this key value, row has to be deleted from that page and has to move to another appropriate page which reduces the performance and increase unnecessary overhead to IO.
Generally whenever you make Primary Key in any of your table, SQL Server itself create clustered index on it but if you want to keep clustered index on any other column(s) due to high selectivity on those column(s), you can do it.

Have you got bored of so long theory? Let us do some practical and check it out?

--create one database which you can delete after running this example
create database SQLHub
GO

USE SQLHub
GO

--if orders table is already there. you can delete it than create new one with name "Orders"
IF OBJECT_ID('orders', 'U') IS NOT NULL BEGIN
      DROP TABLE orders
END
GO

--creating table
CREATE TABLE orders (OrderID INT IDENTITY, OrderDate DATETIME, Amount MONEY, Refno INT)
GO

--inserting 100000 fack rows into table (BTW, thank to Jacob Sebastian, copy in INSERT script from one of his demo to generate big table)
INSERT INTO orders (OrderDate, Amount, Refno)
SELECT TOP 100000
      DATEADD(minute, ABS(a.object_id % 50000 ), CAST('2010-02-01' AS DATETIME)),
      ABS(a.object_id % 10),
      CAST(ABS(a.object_id) AS VARCHAR)
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO

--creating clustered index, however it is not meeting the criteria I have
--given in article, but this is ok for example
CREATE CLUSTERED INDEX idx_refno ON orders(refno)
GO

--run both of the following query with execution plan and see the results in execution plan
--you can see execution plan with the following steps
--first select both of the below given query
--Press Ctrl+M
--press F5
SELECT TOP 10 * from orders where refno=4

SELECT TOP 10 * from orders where OrderDate = '2010-02-01 00:04:00.000'
GO

--if you wish, you can uncomment below code and delete SQLHub database
--use master
--go
--drop database sqlhub

If you execute both the query with “Actual Execution Plan”, you will see first query is having Index Seek and second query is having Index Scan. 

Seek and Scan is really interesting topic which I will cover later but just keep in mind that, Seek is good, Scan is bad as it will check all records of the index.


if you want to refer all other articles related to index, click here.

Reference: Ritesh Shah
http://www.sqlhub.com
Note: Microsoft Books online is a default reference of all articles but examples and explanations prepared by Ritesh Shah, founder of http://www.SQLHub.com

Ask me any SQL Server related question at my “ASK Profile

0 comments: