I am fairly new to Indexes. I have table following table [FORUM1]
[msg_id] [int] IDENTITY(1,1) NOT NULL,
[cat_id] [int] NULL,
[msg_title] [nvarchar](255) NULL
And have created a non clustered index
CREATE NONCLUSTERED INDEX catindex ON forum1(cat_id)
Now when i run this simple query, i can see index is not being used
SELECT msg_title FROM forum1 where cat_id=4
Index only gets called if i create CI and include the MSG_TITLE fld. But the issue is that i have to run many more similar queries on actually table like date=something, userid=20, status=1. So including columns in every index doesn't good to me .
The msg_title is not contained in the index -> any value found in the non-clustered index will need a key lookup into the actual data pages, which is an expensive operation - so therefore, most likely, a table scan is quicker. Plus: the "table scan" indicates you have a heap - a table without a clustered index - which is a bad thing (most of the time) to begin with. Why don't you have a clustered index?
You can fix this by e.g. including the msg_title in your index:
CREATE NONCLUSTERED INDEX catindex
ON forum1(cat_id) INCLUDE(msg_title)
and now, I'm pretty sure, SQL Server will use that index (since it can find all the data needed for the query in the index structure - the index is said to be a covering index). The benefit here is: the extra column is only included in the leaf level of the index, so it makes the index only minimally bigger. Yet, it can lead to the index being used just all that more often. Well worth it!
Related
Please explain why the below differences between non clustered and clustered index.
First I am running the below two select statements.
select *
from [dbo].[index_test2]
where id = 1 -- Nonclustered index on id column
select *
from [dbo].[index_test1]
where id = 1 -- Clustered index on id column
Execution plan shows "Table scan" for the first query and "Clustered index seek (clustered)" for the second query.
Then I am running below two statements.
select id
from [dbo].[index_test2]
where id = 1 -- Nonclustered index on id column
select id
from [dbo].[index_test1]
where id = 1 -- Clustered index on id column
Execution plan shows "Index seek (NonClustered)" for the first query and "Clustered index seek (Clustered)" for the second query.
You can see from the above two cases, when using clustered index it is going for "Index seek" but for in case of NonClustered index it shows "Table scan" (executed with *) and it shows "Index seek (NonClustered)" (executing with index applied column-id).
Can any one clarify why the NonClustered index reacting differently on both cases?
A clustered index defines the order in which data is physically stored in a table but A non-clustered index doesn’t sort the physical data inside the table.In fact, a non-clustered index is stored at one place and table data is stored in another place.
If you use an Non-Clustered Index it works in Index seek (NonClustered) mode when you call it property,but If you put where in Non-Clustered Index mode but call in select more expressions that are not Cover index change mode to Table scan
Indexes with included columns provide the greatest benefit when covering the query. This means that the index includes all columns referenced by your query, as you can add columns with data types, number or size not allowed as index key columns
But in Clustered Index, since the actual sorting is done by it, you do both in Clustered index seek (clustered) mode.
Let's say a query is filtering on two fields and returning primary key values.
SELECT RowIdentifier
FROM Table
WHERE QualifierA = 'exampleA' AND QualifierB = 'exampleB'
Assuming the clustered index is not the PrimaryKey would a non-unique index that contains QualifierA and QualiferB be best served via the addition of the RowIdentifier(Scenario A & Scenario B). Or would it be more appropriate to simply include it(Scenario C)?
Scenario A: Non-Unique, Non-Clustered
CREATE NONCLUSTERED INDEX IX_Table_QualifierA
ON [dbo].[Table] ([QualifierA],[QualifierB],[RowIdentifier])
Scenario B: Unique, Non-Clustered
CREATE UNIQUE NONCLUSTERED INDEX IX_Table_QualifierA
ON [dbo].[Table] ([QualifierA],[QualifierB],[RowIdentifier])
Scenario C:
CREATE NONCLUSTERED INDEX IX_Table_QualifierA
ON [dbo].[Table] ([QualifierA],[QualifierB])
INCLUDE ([RowIdentifier])
Finally I'm assuming that if the PrimaryKey were the clustered index that neither is necessary, is this accurate?
If there is a CLUSTERED index, it is automatically included in all indexes on the table. You can explicitly include it but it is not required.
The UNIQUE index simply enforces uniqueness. The PK should already have this constraint. You do not need to re-enforce it in every index.
If you are including the PK in your where clause, it will almost certainly use the PK index to find that row because it is guaranteed to return the fewest results, so including in your index gains you nothing for lookups. It could also potentially skew the cardinality engine and make SQL think the index is more distinct than it really is.
For the above reasons, I would select Option C
CREATE NONCLUSTERED INDEX IX_Table_QualifierA
ON [dbo].[Table] ([QualifierA],[QualifierB])
INCLUDE ([RowIdentifier])
I would use this regardless of what column is clustered. This will give you the performance, insure the index will continue to perform regardless of the CLUSTERED INDEX, and make it explicit what the index is used for.
I'm wondering what's more appropriate? A non-clustered unique index incorporating all three fields, or a non-clustered non-unique index incorporating just the two fields(QualifierA & QualifierB) but including the PrimaryKey.
There's a third option. A non-clustered, non-unique index incorporating all three fields.
When you make an index, the fields in the index are duplicated to another place in memory so the server can go after those fields with ease. If you only have QualiferA and Qualifier B in the index it will find the rows in that index that meet your criteria and then go back to the main table to pick up the RowIdentifier. Instead, include all three in there to improve performance.
Remember, make sure you put QualifierA and QualifierB before RowIdentifier in your index. The order of the columns determine how the data is ordered.
Try it out with some test data if you like, and look at the query plan to see what it's doing.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Just wanted to know what happens internally in SQL Server when we create,
Non Clustered over a Clustered Index.
Clustered Index over Non Clustered Index.
Non Clustered Index Over Non Clustered Index.
Please comment.
For all of the execution plans I have used the same base table filled with meaningless data:
CREATE TABLE dbo.T (ID INT NOT NULL, Filler CHAR(200) NULL);
INSERT dbo.T (ID, Filler)
SELECT ROW_NUMBER() OVER(ORDER BY Object_ID),
CAST(NULLIF(ABS(Object_ID) % 10, 0) AS CHAR(200))
FROM sys.all_objects
1. What happens internally in SQL Server when we create a Non Clustered over a Clustered Index.
This is just a normal create index process, so sorts the data in the manor specified and creates the relevant nodes up from the leaf nodes. The leaf node will store a "pointer" back to the clustered index to allow for key lookups.
So once the clustered index is in place on dbo.T (ID) the execute plan for creating the non-clustered index shows the sort:
And hovering over the sort shows that it is ordering by Filler, then adding it's own sort to ensure the sort is deterministic:
2. What happens internally in SQL Server when we create a Clustered Index over Non Clustered Index.
I think to explain this properly I need to explain how a clustered index on a table with no clustered index would work. A table with no clustered index is called a "Heap" table, this just means that the data is stored in no specific order and is usually just stored in the order it is inserted in. Essentailly SQL Server builds its own clustered index clustered on an internal column RowID, however without the constraints of an explicit clustered index it is free to move the data around if and when it sees fit (More reading on Forwarding records), the non-clustered index will then store a the rowID at the leaf level so it has a way of performing lookups.
When you then create a clustered index on a heap table the table has to be rebuilt, ordering by the columns you have specified, this means that all indexes are also dropped and rebuilt. to show this I first added the non clustered index to dbo.T:
CREATE NONCLUSTERED INDEX IX_T_Filler ON dbo.T (Filler);
Unlike above you can see that a table scan is done as there is no clustered index to use, and the sort done when creating the index does not include the ID column as it did above:
Then afterwards add the clustered index:
CREATE CLUSTERED INDEX IX_T_ID ON dbo.T (ID);
You can see in the execution plan that the nonclustered index is also rebuilt so the leaf will point to the new clustered index rather than the row ID as it did previously. (Note the second query is the same as in the first part when the nonclustered index was built on the clustered index)
3. What happens internally in SQL Server when we create a Non Clustered Index Over Non Clustered Index.
Nonclustered indexes are completely independent of each other, so this is the same as 1 (or the first part of 2 if there is no clustered key), i.e. It does not matter how many non clustered indexes already exist, the method of creating a new one remains the same.
I have following table schema -
CREATE TABLE [dbo].[TEST_TABLE]
(
[TEST_TABLE_ID] [int] IDENTITY(1,1) NOT NULL,
[NAME] [varchar](40) NULL,
CONSTRAINT [PK_TEST_TABLE] PRIMARY KEY CLUSTERED
(
[TEST_TABLE_ID] ASC
)
)
I have inserted huge data in TEST_TABLE.
As I have marked TEST_TABLE_ID column as primary key, clustered index will be created on TEST_TABLE_ID.
When I am running following query, execution plan is showing Clustered Index Scan which is expected.
SELECT * FROM TEST_TABLE WHERE TEST_TABLE_ID = 34
But, when I am running following query I was expecting Table Scan as NAME column does not have any index:
SELECT * FROM TEST_TABLE WHERE NAME LIKE 'a%'
But in execution plan it is showing Clustered Index Scan.
As NAME column does not have any index why it is accessing the clustered index?
I believe, this is happening as clustered index resides on data pages.
Can anyone tell me if my assumption is correct? Or is there any other reason?
A clustered index is the index that stores all the table data. So a table scan is the same as a clustered index scan.
In a table without a clustered index (a "heap"), a table scan requires crawling through all data pages. That is what the query optimizer calls a "table scan".
As others explained already, for a table that has a clustered index, a Clustered Index Scan means a Table Scan.
In other words, the table is the clustered index.
What you have wrong is your first query execution plan:
SELECT *
FROM TEST_TABLE
WHERE TEST_TABLE_ID = 34 ;
It does a Clustered Index Seek and not a Scan. It doesn't have to search (scan) the whole table (clustered index), it goes directly to the point (seeks) and checks if a row with id=34 exists.
You can see a simple test in SQL-Fiddle, and how the two execution plans differ.
The table is stored as a clustered index. The only way to scan the table is to scan the clustered index. Only tables with no clustered index can have a "table scan" per se.
It is because this table has a clustered index and it will scan the entire clustered index to return all the rows base on the where clause. How ever you should be seeing a missing index message.
When you build a Clustered Index on a table, then SQL Server logically orders the rows of that table based on the Clustered Index Key, which in your case is Test_Table_ID.
However, when you see the Clustered Index Scan operator, this COULD be a little misleading. If certain conditions are met, (which equate to SQL Server not caring about the order of the data) then SQL Server is still able to perform an unordered allocation scan, which is more similar to a table scan than an clustered index scan, as it actually reads the leaf level of the CI (the tables data pages) in allocation order, based on the IAM chain, as opposed to following the pointers in the index. This can potentially give you a performance improvement, as fragmentation (pages being out of physical order) does not decrease performance
To see if this is happening, look at the Ordered property in the execution plan. If this is set to False, then you have an unordered allocation scan.
i have got a table (stores data of forum, means normally no edit and update just insert) on which i have a primary key column which is as we know a clustered index.
please tell me, will i get any advantage if i creates a non-clustered index on that column (primary key column)?
EDIT: my table has got currently around 60000 records, what will be better to place non-clustered index on it or create a same new table and create index and then copy records from old to new table.
Thanks
Every table should have a clustered index
Non-clustered indexes allow INCLUDEs which is very useful
Non-clustered indexes allow filtering in SQL Server 2008+
Notes:
Primary key is a constraint which happens to be a clustered index by default
One clustered index only, many non-clustered indexes
One advantage: you can INCLUDE other columns in the index.
A clustered index specifies the physical storage order of the table data (this is why there can only be one clustered index per table).
If there is no clustered index, inserts will typically be faster since the data doesn't have to be stored in a specific order but can just be appended at the end of the table.
On the other hand, index searches on the key column will typically be slower, since the searches cannot use the advantages of the clustered index.
The only possible advantage that I can see could be from the fact that the entries on leaf pages of nonclustered index are not as wide. They only contain index columns while the clustered index' leaf pages are the actual rows of data. Therefore, if you need something like select count(your_column_name) from your_table then scanning the nonclustered index will involve considerably smaller number of data pages. Or if the number of index columns is greater than one and you run any query which does not need data from non-indexed columns then again, nonclustered index scan will be faster.