Introduction

There was a time where many have not realized the importance of indexes in database world. Indexes were predominantly used to decrease the execution time of a query without understanding the side effects it would cause on DML statements. However, in this contemporary world where lots and lots of information are available in the net, many have realized that indexes are to be used carefully and used only when a need arises. This article even though provides information on the basics, we will also delve into the internals on how these indexes react under the cover. People who travel to the world of SQL Server from other databases will obviously have to confine themselves with only two types of indexes viz Clustered Indexes and Non-Clustered indexes. The design and architecture of SQL Server allows the user to play with these indexes alone (even though SQL Server 2008 has provided a new feature called FILTERED INDEXES, which we will see in next article). We will work on the different scenarios, observe the results and provide a logical conclusion towards the end. Please note that this is not a universal conclusion and results may vary depending on the data size, underlying OS, hardware configuration, number of concurrent users etc. I will be using SQL Server 2008 on Windows XP. As and when needed, I will also provide the information on the data size and other variable factors.

Definitions

Before we dive deep into this article, it would be better if we understand the terminologies and their definitions as it would be easy to understand the contents in this article.

Fragmentation A process by which the spaces occur due to frequent updates and deletes in the page. More this fragmentation more the performance issues or maintenance will be.
Index Level This value provides the number of levels from the bottom up of an index. Please note that Leaf Node will always have 0.
Clustered Index Seek Index seek uses a clustered index to look up the rows to be returned directly
Index Depth Number of levels we need to pass to get into the index.
Record Count Number of records available in the page. In general this value at index level 0 will be the number of records in the table. But in case of any forwarding count, then there will be some differences
Extent An extent is the basic unit of storage used to allocate space for tables and indexes within a given data file. It is made up of eight contiguous 64KB data pages
Page A page is the unit of allocation within a specific extent. There are eight pages in every extent. Please note that a page is the last level you reach before hitting at the actual data row.
Page Splits Whenever a page becomes full, half of the data is shifted to the newly created page and pointers are kept in the old page pointing to the new location of the data. This is called Page Split. In case of a clustered index the inserted row which does not have enough space alone is shifted to the newly created page.
Row Offset For every row that we insert into the database, SQL Server will place the “Row Offset” located at the end of the page, to actually indicate where in that page the row data starts. Indirectly it tells us how many spaces we need to leave to read the data from the top.
Index Seek Index seek uses a non-clustered index to look up the rows to be returned directly
Nested Loops Takes the inputs from two data sets and performs the scanning of the outer set for each inner set. Very efficient operation if the data sets are small.
RID Lookup The query optimizer performs a RID LookUp, a type of bookmark lookup that occurs on a heap table and uses a row identifier to find the rows to return.
Key Lookup A Key Lookup is a bookmark lookup on a clustered table and which essentially means that the optimizer cannot retrieve the rows in a single operation, and has to use a clustered key to return the corresponding rows from a clustered index (or from the table itself).
AVERAGE_FRAGMENTATION_ IN_PERCENT The degree of fragmentation in the index tree which is based on pages or extents that are not in sync or out of order (i.e.) the pointer to the logical next page is not the same as the physical next page. Please note that we need to be looking for a low number here, though how low depends on the the scenario and requirements

Clustered Index

Clustered index, as the name suggests, the indexes are clustered in one place with the data. The major advantage (in some cases disadvantage) with clustered index is the arrangement of data in one particular order. Since the data is arranged in one order, we can have only one clustered index in a table. By default, as soon as a primary key for a table is created, internally SQL Server creates a clustered index. However, when we create a unique key, we can specify either clustered or non-clustered index. Creating a clustered index is a time consuming task and in case if this is not defined properly then it may lead to lot of performance issues. In this section, we will see where this can be a gain and where this can lead to performance issues.

Scenario 1: Well defined Cluster Key

Let us create a table called “Accounts” having information related to a company accounts as follows:

CREATE TABLE ACCOUNTS (ACCOUNTNUMBER NUMBER(6), ACCOUNTNAME VARCHAR(50), ACCOUNTDESCRIPTION VARCHAR(50), ACCOUNT_TYPE VARCHAR(1)) 

 

Let us now create a clustered index on this table on the column ACCOUNTNUMBER. CREATE CLUSTERED INDEX ACCNTCLUSTIDX ON ACCOUNTS(ACCOUNTNUMBER);

In order to test various scenarios, let me load 50,000 records into this table using the following script:

BEGIN  DECLARE @COUNTER AS INT  SET @COUNTER=1  
WHILE (@COUNTER <= 50000)  INSERT INTO ACCOUNTS VALUES (@COUNTER, ('ACCOUNT ' + CAST(@COUNTER AS VARCHAR)), ('ACCOUNT DESC ' + CAST(@COUNTER AS VARCHAR)), (CASE (CAST(@COUNTER AS VARCHAR) % 4) WHEN 1 THEN 'A' WHEN 2 THEN 'L' WHEN 3 THEN 'E' WHEN 4 THEN 'I' ELSE NULL END))  END 

We have four different types through which we classify the account as Asset (A), Liability (L), Expense (E) and Income (I). Now to insert these many records using SQL Server Management Studio (SSMS), it took around 23 seconds. The following query is a simple select statement filtering the data on accountnumber. SELECT * FROM ACCOUNTS WHERE ACCOUNTNUMBER < 1000 The above query got executed in no time and returned the 999 rows. The actual execution plan showed that the query took the CLUSTERED INDEX SEEK plan, which actually used the index and other parameters (or some authors term as attributes) are also well within the bounds. Please also note that since the key what we have chosen to index is well organized, whatever be the filter on the clustered key it would use the index. Since this clustered key is well organized, even the page splits and fragmentation should be well within the limits. We can look at this in three different angles. First let us truncate the existing records in this table and load the table with three sets of data. First data will be from 1 to 150,000. Second will be from 300,000 and 500,000. And final set will be between 150,000 and 300,000. At the initial stage before starting this experiment the page splits per second is 18,562 as calculated through the following query: SELECT * FROM SYS.SYSPERFINFO WHERE COUNTER_NAME = ‘PAGE SPLITS/SEC’ AND OBJECT_NAME LIKE ‘%ACCESS METHODS%’ Now the fragmentation for the index on this table is provided below: Now let us start loading the data using the following script.

BEGIN DECLARE @COUNTER AS INT SET @COUNTER=300000 WHILE (@COUNTER <= 500000) begin INSERT INTO ACCOUNTS VALUES (@COUNTER, (‘ACCOUNT ‘ + CAST(@COUNTER AS VARCHAR)), (‘ACCOUNT DESC ‘ + CAST(@COUNTER AS VARCHAR)), (CASE (CAST(@COUNTER AS VARCHAR) % 4) WHEN 1 THEN ‘A’ WHEN 2 THEN ‘L’ WHEN 3 THEN ‘E’ WHEN 4 THEN ‘I’ ELSE NULL END)) set @COUNTER=@COUNTER+1 end END

It took 1 minute to execute the above script and Page Splits per second increased from 18,562 to 19,604. It took just 1042 page splits per second to load 150,000 records. Fragment details are provided below: Now let us start working on the second set of data and observe the page splits and fragmentation.

BEGIN DECLARE @COUNTER AS INT SET @COUNTER=300000 WHILE (@COUNTER <= 500000) begin INSERT INTO ACCOUNTS VALUES (@COUNTER, (‘ACCOUNT ‘ + CAST(@COUNTER AS VARCHAR)), (‘ACCOUNT DESC ‘ + CAST(@COUNTER AS VARCHAR)), (CASE (CAST(@COUNTER AS VARCHAR) % 4) WHEN 1 THEN ‘A’ WHEN 2 THEN ‘L’ WHEN 3 THEN ‘E’ WHEN 4 THEN ‘I’ ELSE NULL END)) set @COUNTER=@COUNTER+1 end END

It took 1 minute and 39 seconds to execute the above script and Page Splits per second increased from 19,604 to 21,030. It took just 1426 page splits per second to load 200,000 records. Fragment details are provided below: The most interesting part is the third set, which actually inserts the data in between the existing data. This will increase the page splits further and cause the cascading effects on the remaining data as well. This should not cause any problem if the clustering key is defined properly. Let us execute the following script to insert the missing data.

BEGIN DECLARE @COUNTER AS INT SET @COUNTER=150001 WHILE (@COUNTER <= 299999) begin INSERT INTO ACCOUNTS VALUES (@COUNTER, (‘ACCOUNT ‘ + CAST(@COUNTER AS VARCHAR)), (‘ACCOUNT DESC ‘ + CAST(@COUNTER AS VARCHAR)), (CASE (CAST(@COUNTER AS VARCHAR) % 4) WHEN 1 THEN ‘A’ WHEN 2 THEN ‘L’ WHEN 3 THEN ‘E’ WHEN 4 THEN ‘I’ ELSE NULL END)) set @COUNTER=@COUNTER+1 end END

It took 1 minute to execute the above script and Page Splits per second increased from 21,030 to 22,099. It took just 1069 page splits per second to load 150,000 records. Fragment details are provided below:

Scenario 2: Not So Well Defined Clustered Key

Now let us change the scenario and use a calculated column and index the same. By calculated column, I mean a column that is derived from another column or a combination of values that can be put in one column. For this, I will create another table called PurchaseMaster as follows:

CREATE TABLE [dbo].[purchasemaster]( [purchasenumber] [varchar](20) NULL, [purchasedescription] [varchar](30) NULL, [vendorcode] [varchar](15) NULL, [dateofpurchase] [datetime] NULL )

A clustered index was also created on the column purchasenumber as follows:

CREATE CLUSTERED INDEX PURCHASEMASTERIDX ON PURCHASEMASTER(PURCHASENUMBER)

The column PURCHASENUMBER is derived on the basis of date of purchase. For example, the following is the sample data available in the PURCHASENUMBER column.

purchasenumber
1\2011-2012
10\2011-2012
100\2011-2012
1000\2011-2012
101\2011-2012
102\2011-2012
103\2011-2012
104\2011-2012
105\2011-2012 106\2011-2012

Please note that each purchase number is appended with the financial year. The purchase number 100 has the financial year 2011-2012 and the same for the other purchase numbers. As soon as the person who is raising the purchase order looks at this, the person will know the financial year details. This is indeed very useful and infact very handy as well. Also, when we issue the following statement, we did not see any anomaly.

SELECT * FROM PURCHASEMASTER WHERE PURCHASENUMBER=’1000\2011-2012′

The execution plan for the above select statement is provided below: Now, if we see a range filter based query as follows, the results were not proper.

SELECT * FROM PURCHASEMASTER WHERE PURCHASENUMBER>
‘1\2011-2012’ AND PURCHASENUMBER <‘101\2011-2012’
purchasenumber            purchasedescription        vendorcode       dateofpurchase
10\2011-2012    Purchase 10                   Vendor 10         2011-05-30 19:08:48.653
100\2011-2012  Purchase 100                 Vendor 100        2011-05-30 19:08:48.687
1000\2011-2012 Purchase 1000               Vendor 1000      2011-05-30 19:08:49.217
Now there are no records in the table and entire data was truncated. We will now study the page splits and the fragmentation details for this clustered index. Initially the page splits is calculated using the following query and was observed to be 9325.
SELECT * FROM SYS.SYSPERFINFO WHERE COUNTER_NAME = ‘PAGE SPLITS/SEC’ AND OBJECT_NAME LIKE ‘%ACCESS METHODS%’
Also, the fragmentation details for this table and index is illustrated below: Now we will insert two sets of data, purchase number from 1 to 150,000 and from 300,000 to 500,000 with a gap of 150,000 between the sets of data. After inserting every set of data, we will observe the corresponding page splits and fragment. The following script will be used to generate these sets of data.
BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=1
WHILE (@COUNTER <= 150000)
BEGIN
INSERT INTO PURCHASEMASTER VALUES (CAST(@COUNTER AS VARCHAR) + ‘\’ + cast(YEAR(GETDATE()) as varchar) + ‘-‘ + cast(YEAR(GETDATE())+1 as varchar), ‘PURCHASE ‘ + CAST(@COUNTER AS VARCHAR), ‘VENDOR ‘ + CAST(@COUNTER AS VARCHAR), GETDATE())
SET @COUNTER=@COUNTER+1
END
END

Inserting 150,000 records took just 1 minute and 3 seconds. The page splits/second and fragmentation after this are provided below: From 9325, page splits/second has increased to 11523 which caused 2198 page splits per second during this insert operation. Also the fragment details are provided below: Please note that the fragmentaion is observed to be high in this case. Now, let us consider the next set, which is inserting next set of 200,000 records from 300,000 to 500,000.

BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=300000
WHILE (@COUNTER <= 500000)
BEGIN
INSERT INTO PURCHASEMASTER VALUES (CAST(@COUNTER AS VARCHAR) + ‘\’ + cast(YEAR(GETDATE()) as varchar) + ‘-‘ + cast(YEAR(GETDATE())+1 as varchar), ‘PURCHASE ‘ + CAST(@COUNTER AS VARCHAR), ‘VENDOR ‘ + CAST(@COUNTER AS VARCHAR), GETDATE())
SET @COUNTER=@COUNTER+1
END
END

Inserting 200,000 records took just 1 minute and 18 seconds. The page splits/second and fragmentation after this are provided below: From 11,523, page splits/second has increased to 14,947 which caused 3,424 page splits per second during this insert operation. Also the fragment details are provided below: Wow, that is a great increase in fragmentation. Now the final is what more important to observe both Page Splits and Fragmentation. We are now going to insert the missing records between 150,000 and 300,000 and observe the page splits. I will also explain diagramatically how this would happen. Now let us execute the same script by changing the values.

BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=150001
WHILE (@COUNTER <= 299999)
BEGIN
INSERT INTO PURCHASEMASTER VALUES (CAST(@COUNTER AS VARCHAR) + ‘\’ + cast(YEAR(GETDATE()) as varchar) + ‘-‘ + cast(YEAR(GETDATE())+1 as varchar), ‘PURCHASE ‘ + CAST(@COUNTER AS VARCHAR), ‘VENDOR ‘ + CAST(@COUNTER AS VARCHAR), GETDATE())
SET @COUNTER=@COUNTER+1
END
END

The third batch took just 57 seconds to complete. Page Splits per second was observed to be 17,520 which caused 2,573 page splits per second during this insert operation. Also the fragment details are provided below: Now having said this on page splits and fragmentation, how do we know if this is good or bad? Let us compare the scenario 1 and 2 in the following table and observe the results.

Description Execution Time (mm:ss) Page Splits Fragmentation (AVG_FRAGMENTATION_ IN_PERCENT FOR INDEX_LEVEL=0)
Scenario 1 1-150,000 01:00 1,042 0.68
300,000-500,000 01:39 1,426 0.57
150,001-299,999 01:00 1,069 0.62
Scenario 2 1-150,000 01:03 2,198 35.21
300,000-500,000 01:18 3,424 41.87
150,001-299,999 00:57 2,573 43.36
This shows in Scenario 2, as the cluster key is not well defined, the inserts to fill the missing gap (150,001 to 299,999) is observed to be high. Also, the fragmentation and Page Splits are quite high since the cluster key is not well defined. Whenever we need to define a clustered index on a key, please make sure the following:
a)      Key should be well defined, satisfying the business requirements and as well as the providing a greater selectivity.
b)      Key should not be defined on derived table and calculated column
c)       Other columns in the table should be completely dependent on the cluster key

Non-Clustered Index

What we saw till now is a very ideal situation where we have a well defined key and can go ahead to create a clustered index. But what if the data is not so organized but we still need to have an index on the column. Enter the world of Non-Clustered Index. Where actually the data is not organized or ordered in any pre-existing condition, but still satisfy the purposes. The major difference is leaf level does not contain data. It includes pointers to the relevant data and the pointer is a simple RID column. RID columns include the information on extent, page and row offset.  Using this RID column it will go and fetch the data from different pages. Another issue with with non-clustered index is it behaves differently for heap and clustered table. Here again we will see two scenarios. One Non-clustered index created on a heap and then created on a clustered table.

Scenario 3: Non-Clustered Index on Heap Table

Let us create a table called PRODUCTS as follows:

CREATE TABLE PRODUCTS (PRODUCTNUMBER NUMERIC(6), PRODUCTNAME VARCHAR(30), PRODUCTDESCRIPTION VARCHAR(50), PRODUCTTYPE VARCHAR(1))

Now let us create a non-clustered index as follows:

CREATE NONCLUSTERED INDEX PRODIDX ON PRODUCTS(PRODUCTNUMBER, PRODUCTNAME)

The following script will insert the data into this table.

BEGIN
DECLARE @COUNTER AS INT
SET @COUNTER=1
WHILE (@COUNTER <= 50000)
BEGIN
INSERT INTO PRODUCTS VALUES (CAST(@COUNTER AS VARCHAR), ‘PRODUCTS ‘ + CAST(@COUNTER AS VARCHAR), ‘DESCRIPTION ‘ + CAST(@COUNTER AS VARCHAR), (case (@COUNTER % 2) when 1 then ‘I’ when 0 then ‘R’ else NULL end))
SET @COUNTER=@COUNTER+1
END
END

Now let us use a simple select clause and observe the execution plan.

SELECT * FROM PRODUCTS WHERE PRODUCTNUMBER=2089
If you see this execution plan carefully, the following can be observed:
a)      Since we do not have clustered index in this table, it has used RID lookup to retrieve other two columns PRODUCTDESCRIPTION and PRODUCTTYPE
b)      It has used Index Seek to retrieve indexed columns PRODUCTNUMBER and PRODUCTNAME
c)       Internally what happens is the optimizer uses NESTED LOOPS to join the data from the results of the two operations viz Index Seek and RID Lookup.
Is this kind of execution plan is bad? Depends on the business needs and execution time. The above query took couple of seconds to execute. Let us now select only the indexed columns and see what is the execution plan for the same query.
SELECT PRODUCTNUMBER, PRODUCTNAME FROM PRODUCTS WHERE PRODUCTNUMBER=2089

Now the execution plan is changed, no RID lookup because we have selected only the indexed columns. What if we wanted to avoid RID lookup and at the same time most of the columns in the table to be selected? Use Include option along with the definition of the index creation as follows:

DROP INDEX PRODUCTS.PRODIDX
CREATE NONCLUSTERED INDEX PRODIDX ON PRODUCTS(PRODUCTNUMBER, PRODUCTNAME) INCLUDE PRODUCTDESCRIPTION, PRODUCTTYPE
The following select statement provides an execution plan
SELECT PRODUCTNUMBER, PRODUCTNAME FROM PRODUCTS WHERE PRODUCTNUMBER=2089

Please note that we did not get RID lookup, which is because of the columns included in the index creation. However, please be careful with INCLUDE column option. Please do benchmark before using it as it occupies more space in LEAF level and there on reducing the number of rows and increasing the number of IOs.

Scenario 4: Non-Clustered Index on Clustered Table

Now, in the same PRODUCTS table I have dropped the indexes and created two indexes as follows:

DROP INDEX PRODUCTS.PRODIDX
CREATE CLUSTERED INDEX PRODCLUSTIDX ON PRODUCTS(PRODUCTNUMBER)
CREATE NONCLUSTERED INDEX PRODIDX ON PRODUCTS(PRODUCTNAME)
Again we have the same number of records viz 50,000 records in the table. The following select statement produces the below execution plan: Select * from products where productname=’PRODUCTS 9201’
If you read the execution plan, the following can be observed:
a)      Since we have clustered index in this table, it has used Key Lookup to retrieve other two columns PRODUCTDESCRIPTION and PRODUCTTYPE
b)      It has used Index Seek to retrieve indexed columns PRODUCTNUMBER and PRODUCTNAME
c)       Internally what happens is the optimizer uses NESTED LOOPS to join the data from the results of the two operations viz Index Seek and Key Lookup.
Further please note that in case if the KEY LOOKUP is to be avoided, use INCLUDE option while creating the index. Non-Clustered index can be used as part of heap table or as part of clustered table. There is no hard and fast rule that it has to be used only with heap or clustered table. Before using the same we need to benchmark and test it completely.
Further Reading Please don’t forget to visit this site to read my next articles on 

  • Joins and Sets
  • Hierarchical Queries

Wow, that is a great increase in fragmentation.

Now the final is what more important to observe both Page Splits and Fragmentation. We are now going to insert the missing records between 150,000 and 300,000 and observe the page splits. I will also explain diagramatically how this would happen. Now let us execute the same script by changing the values.

BEGIN

DECLARE @COUNTER AS INT

SET @COUNTER=150001

WHILE (@COUNTER <= 299999)

BEGIN

INSERT INTO PURCHASEMASTER VALUES

(CAST(@COUNTER AS VARCHAR) + ‘\’ + cast(YEAR(GETDATE()) as varchar) + ‘-‘ + cast(YEAR(GETDATE())+1 as varchar),

‘PURCHASE ‘ + CAST(@COUNTER AS VARCHAR),

‘VENDOR ‘ + CAST(@COUNTER AS VARCHAR),

GETDATE())

SET @COUNTER=@COUNTER+1

END

END

The third batch took just 57 seconds to complete. Page Splits per second was observed to be 17,520 which caused 2,573 page splits per second during this insert operation. Also the fragment details are provided below:

Now having said this on page splits and fragmentation, how do we know if this is good or bad? Let us compare the scenario 1 and 2 in the following table and observe the results.

Description

Execution Time (mm:ss)

Page Splits

Fragmentation (AVG_FRAGMENTATION_ IN_PERCENT

FOR INDEX_LEVEL=0)

Scenario 1

1-150,000

01:00

1,042

0.68

300,000-500,000

01:39

1,426

0.57

150,001-299,999

01:00

1,069

0.62

Scenario 2

1-150,000

01:03

2,198

35.21

300,000-500,000

01:18

3,424

41.87

150,001-299,999

00:57

2,573

43.36

This shows in Scenario 2, as the cluster key is not well defined, the inserts to fill the missing gap (150,001 to 299,999) is observed to be high. Also, the fragmentation and Page Splits are quite high since the cluster key is not well defined. Whenever we need to define a clustered index on a key, please make sure the following:

a) Key should be well defined, satisfying the business requirements and as well as the providing a greater selectivity.

b) Key should not be defined on derived table and calculated column

c) Other columns in the table should be completely dependent on the cluster key

Non-Clustered Index

What we saw till now is a very ideal situtation where we have a well defined key and can go ahead to create a clustered index. But what if the data is not so organized but we still need to have an index on the column. Enter the world of Non-Clustered Index. Where actually the data is not organized or ordered in any pre-existing condition, but still satisfy the purposes. The major difference is leaf level does not contain data. It includes pointers to the relevant data and the pointer is a simple RID column. RID columns include the information on extent, page and row offset. Using this RID column it will go and fetch the data from different pages.

Another issue with with non-clustered index is it behaves differently for heap and clustered table. Here again we will see two scenarios. One Non-clustered index created on a heap and then created on a clustered table.

Scenario 3: Non-Clustered Index on Heap Table

Let us create a table called PRODUCTS as follows:

CREATE TABLE PRODUCTS (PRODUCTNUMBER NUMERIC(6), PRODUCTNAME VARCHAR(30), PRODUCTDESCRIPTION VARCHAR(50), PRODUCTTYPE VARCHAR(1))

Now let us create a non-clustered index as follows:

CREATE NONCLUSTERED INDEX PRODIDX ON PRODUCTS(PRODUCTNUMBER, PRODUCTNAME)

The following script will insert the data into this table.

BEGIN

DECLARE @COUNTER AS INT

SET @COUNTER=1

WHILE (@COUNTER <= 50000)

BEGIN

INSERT INTO PRODUCTS VALUES

(CAST(@COUNTER AS VARCHAR),

‘PRODUCTS ‘ + CAST(@COUNTER AS VARCHAR),

‘DESCRIPTION ‘ + CAST(@COUNTER AS VARCHAR),

(case (@COUNTER % 2) when 1 then ‘I’ when 0 then ‘R’ else NULL end))

SET @COUNTER=@COUNTER+1

END

END

Now let us use a simple select clause and observe the execution plan.

SELECT * FROM PRODUCTS WHERE PRODUCTNUMBER=2089

If you see this execution plan carefully, the following can be observed:

a) Since we do not have clustered index in this table, it has used RID lookup to retrieve other two columns PRODUCTDESCRIPTION and PRODUCTTYPE

b) It has used Index Seek to retrieve indexed columns PRODUCTNUMBER and PRODUCTNAME

c) Internally what happens is the optimizer uses NESTED LOOPS to join the data from the results of the two operations viz Index Seek and RID Lookup.

Is this kind of execution plan is bad? Depends on the business needs and execution time. The above query took couple of seconds to execute. Let us now select only the indexed columns and see what is the execution plan for the same query.

SELECT PRODUCTNUMBER, PRODUCTNAME FROM PRODUCTS WHERE PRODUCTNUMBER=2089

Now the execution plan is changed, no RID lookup because we have selected only the indexed columns. What if we wanted to avoid RID lookup and at the same time most of the columns in the table to be selected? Use Include option along with the definition of the index creation as follows:

DROP INDEX PRODUCTS.PRODIDX

CREATE NONCLUSTERED INDEX PRODIDX ON PRODUCTS(PRODUCTNUMBER, PRODUCTNAME) INCLUDE PRODUCTDESCRIPTION, PRODUCTTYPE

The following select statement provides an execution plan

SELECT PRODUCTNUMBER, PRODUCTNAME FROM PRODUCTS WHERE PRODUCTNUMBER=2089

Please note that we did not get RID lookup, which is because of the columns included in the index creation. However, please be careful with INCLUDE column option. Please do benchmark before using it as it occupies more space in LEAF level and there on reducing the number of rows and increasing the number of IOs.

Scenario 4: Non-Clustered Index on Clustered Table

Now, in the same PRODUCTS table I have dropped the indexes and created two indexes as follows:

DROP INDEX PRODUCTS.PRODIDX

CREATE CLUSTERED INDEX PRODCLUSTIDX ON PRODUCTS(PRODUCTNUMBER)

CREATE NONCLUSTERED INDEX PRODIDX ON PRODUCTS(PRODUCTNAME)

Again we have the same number of records viz 50,000 records in the table. The following select statement produces the below execution plan:

Select * from products where productname=’PRODUCTS 9201’

If you read the execution plan, the following can be observed:

a) Since we have clustered index in this table, it has used Key Lookup to retrieve other two columns PRODUCTDESCRIPTION and PRODUCTTYPE

b) It has used Index Seek to retrieve indexed columns PRODUCTNUMBER and PRODUCTNAME

c) Internally what happens is the optimizer uses NESTED LOOPS to join the data from the results of the two operations viz Index Seek and Key Lookup.

Further please note that in case if the KEY LOOKUP is to be avoided, use INCLUDE option while creating the index.

Non-Clustered index can be used as part of heap table or as part of clustered table. There is no hard and fast rule that it has to be used only with heap or clustered table. Before using the same we need to benchmark and test it completely.

Further Reading

Please don’t forget to visit this site to read my next articles on

Ø Joins and Sets

Ø Hierarchical Queries

Tagged with:
 
Set your Twitter account name in your settings to use the TwitterBar Section.
%d bloggers like this:
Bitnami