I have created a sample table with the clustered index as below and inserted 1500 records.
CREATE CLUSTERED INDEX IX_mytable_myid ON dbo.MyTable(myid)
When I execute the below query, I could see the execution plan having Clustered Index Scan instead of seek. I am not sure why the index table is scanned.
SELECT myid FROM dbo.MyTable WHERE myid=1666
Apologies. I identified that through the warning symbol in execution plan and found that myid field is actually a varchar and an implicit conversion happens, which forces to do a scan and not a seek.
Upon querying like this
SELECT myid FROM dbo.MyTable WHERE myid='1666'
it does the seek.
Related
If a table has 'id' (the primary key) column as its clustered index, is there any benefit in adding 'id' column as an included column in any other non-clustered index in Microsoft SQLServer?
eg:- Table 'xyz'
id
name
status
date
1
abc
active
2021-06-23
CREATE NONCLUSTERED INDEX [NonClusteredIndex_status_Date]
ON [xyz]
(
[status] ASC,
[date] ASC
)
INCLUDE
( [id],
[name]
)
And this non-clustered index is targeted for a query similar to bellow on a large data set. In the actual case there could be some other queries as well.
select * from xyz where status='active' and date > '2021-06-20'
The answer to your question is essentially No, there is no benefit.
When you create a non-clustered index on a table, each row in the index needs to be able to point to the row in the base table.
If the base table is a heap, each row in the index will contain a pointer to the rid (row identifier) which is what SQL Server uses to uniquely identify each row.
When the table is defined with a clustered index, every non-clustered index will automatically contain the clustered index column(s) as keys to the row in the base table.
You can see this in an execution plan, where a non-clustered index is used and SQL Server has to retrieve additional columns from the base table; if the table is a heap, it will be an RID lookup, if the table has a clustered index it will be a Key lookup.
Additionally, if the clustered index is not unique, SQL Server adds its own uniquifier value to ensure uniqueness, and this is also included in non-clustered indexes.
So when it comes to non-clustered indexes, it does not matter if you specify the clustered index column(s) - you can, and there is no harm in doing so, but it/they are always included.
This answer assumes you are planning to run the following query:
SELECT * FROM xyz WHERE status = 'active' AND date > '2021-06-20';
If you only created a non clustered index on (status, date), then it would cover the WHERE clause, but not the SELECT clause. What this means is that SQL Server might choose to use the index to find the matching records in the query. But when it gets to evaluating the SELECT clause, it would be forced to seek back to the clustered index to find the values for the columns not included in the index, other than the id clustered index column (which includes the name column in this case). There is performance penalty in doing this, and SQL Server might, depending on your data, even choose to not use the index because it does not completely cover the query.
To mitigate this, you can define the index in your question, where you include the name value in the leaf nodes. Note that id will be included by default in the index, so we do not need to explicitly INCLUDE it. By following this approach, the index itself is said to completely cover the query, meaning that SQL Server may use the index for the entire query plan. This can lead to fast performance in many cases.
Please explain why the below differences between non clustered and clustered index.
First I am running the below two select statements.
select *
from [dbo].[index_test2]
where id = 1 -- Nonclustered index on id column
select *
from [dbo].[index_test1]
where id = 1 -- Clustered index on id column
Execution plan shows "Table scan" for the first query and "Clustered index seek (clustered)" for the second query.
Then I am running below two statements.
select id
from [dbo].[index_test2]
where id = 1 -- Nonclustered index on id column
select id
from [dbo].[index_test1]
where id = 1 -- Clustered index on id column
Execution plan shows "Index seek (NonClustered)" for the first query and "Clustered index seek (Clustered)" for the second query.
You can see from the above two cases, when using clustered index it is going for "Index seek" but for in case of NonClustered index it shows "Table scan" (executed with *) and it shows "Index seek (NonClustered)" (executing with index applied column-id).
Can any one clarify why the NonClustered index reacting differently on both cases?
A clustered index defines the order in which data is physically stored in a table but A non-clustered index doesn’t sort the physical data inside the table.In fact, a non-clustered index is stored at one place and table data is stored in another place.
If you use an Non-Clustered Index it works in Index seek (NonClustered) mode when you call it property,but If you put where in Non-Clustered Index mode but call in select more expressions that are not Cover index change mode to Table scan
Indexes with included columns provide the greatest benefit when covering the query. This means that the index includes all columns referenced by your query, as you can add columns with data types, number or size not allowed as index key columns
But in Clustered Index, since the actual sorting is done by it, you do both in Clustered index seek (clustered) mode.
I have following table schema -
CREATE TABLE [dbo].[TEST_TABLE]
(
[TEST_TABLE_ID] [int] IDENTITY(1,1) NOT NULL,
[NAME] [varchar](40) NULL,
CONSTRAINT [PK_TEST_TABLE] PRIMARY KEY CLUSTERED
(
[TEST_TABLE_ID] ASC
)
)
I have inserted huge data in TEST_TABLE.
As I have marked TEST_TABLE_ID column as primary key, clustered index will be created on TEST_TABLE_ID.
When I am running following query, execution plan is showing Clustered Index Scan which is expected.
SELECT * FROM TEST_TABLE WHERE TEST_TABLE_ID = 34
But, when I am running following query I was expecting Table Scan as NAME column does not have any index:
SELECT * FROM TEST_TABLE WHERE NAME LIKE 'a%'
But in execution plan it is showing Clustered Index Scan.
As NAME column does not have any index why it is accessing the clustered index?
I believe, this is happening as clustered index resides on data pages.
Can anyone tell me if my assumption is correct? Or is there any other reason?
A clustered index is the index that stores all the table data. So a table scan is the same as a clustered index scan.
In a table without a clustered index (a "heap"), a table scan requires crawling through all data pages. That is what the query optimizer calls a "table scan".
As others explained already, for a table that has a clustered index, a Clustered Index Scan means a Table Scan.
In other words, the table is the clustered index.
What you have wrong is your first query execution plan:
SELECT *
FROM TEST_TABLE
WHERE TEST_TABLE_ID = 34 ;
It does a Clustered Index Seek and not a Scan. It doesn't have to search (scan) the whole table (clustered index), it goes directly to the point (seeks) and checks if a row with id=34 exists.
You can see a simple test in SQL-Fiddle, and how the two execution plans differ.
The table is stored as a clustered index. The only way to scan the table is to scan the clustered index. Only tables with no clustered index can have a "table scan" per se.
It is because this table has a clustered index and it will scan the entire clustered index to return all the rows base on the where clause. How ever you should be seeing a missing index message.
When you build a Clustered Index on a table, then SQL Server logically orders the rows of that table based on the Clustered Index Key, which in your case is Test_Table_ID.
However, when you see the Clustered Index Scan operator, this COULD be a little misleading. If certain conditions are met, (which equate to SQL Server not caring about the order of the data) then SQL Server is still able to perform an unordered allocation scan, which is more similar to a table scan than an clustered index scan, as it actually reads the leaf level of the CI (the tables data pages) in allocation order, based on the IAM chain, as opposed to following the pointers in the index. This can potentially give you a performance improvement, as fragmentation (pages being out of physical order) does not decrease performance
To see if this is happening, look at the Ordered property in the execution plan. If this is set to False, then you have an unordered allocation scan.
I am fairly new to Indexes. I have table following table [FORUM1]
[msg_id] [int] IDENTITY(1,1) NOT NULL,
[cat_id] [int] NULL,
[msg_title] [nvarchar](255) NULL
And have created a non clustered index
CREATE NONCLUSTERED INDEX catindex ON forum1(cat_id)
Now when i run this simple query, i can see index is not being used
SELECT msg_title FROM forum1 where cat_id=4
Index only gets called if i create CI and include the MSG_TITLE fld. But the issue is that i have to run many more similar queries on actually table like date=something, userid=20, status=1. So including columns in every index doesn't good to me .
The msg_title is not contained in the index -> any value found in the non-clustered index will need a key lookup into the actual data pages, which is an expensive operation - so therefore, most likely, a table scan is quicker. Plus: the "table scan" indicates you have a heap - a table without a clustered index - which is a bad thing (most of the time) to begin with. Why don't you have a clustered index?
You can fix this by e.g. including the msg_title in your index:
CREATE NONCLUSTERED INDEX catindex
ON forum1(cat_id) INCLUDE(msg_title)
and now, I'm pretty sure, SQL Server will use that index (since it can find all the data needed for the query in the index structure - the index is said to be a covering index). The benefit here is: the extra column is only included in the leaf level of the index, so it makes the index only minimally bigger. Yet, it can lead to the index being used just all that more often. Well worth it!
I'm executing the following statement:
UPDATE TOP(1) dbo.userAccountInfo
SET Flags = Flags | #AddValue
WHERE ID = #ID;
The column 'ID' is an INT PRIMARY KEY with IDENTITY constraints.
Flags is a BIGINT NOT NULL.
The execution path indicates that a Clustered Index Update is occurring. A very expensive operation.
There's no indexes covering Flags or ID, except for the primary key.
I feel like the actual execution path should be:
Clustered Index Seek => Update
Tables come in two flavors: clustered indexes and heaps. You have a PRIMARY KEY constraint so you have created implicitly a clustered index. You'd have to go to extra length during the table create for this not to happen. Any update of the 'table' is an update of the clustered index, since the clustered index is the table.
As for the clustered index update being a 'very expensive operation', now that is an urban legend surrounding basic misinformation about how a database works. The correct statement is 'a clustered index update that affects the clustered key has to update the all non-clustered indexes'.
The clustered index is the physical table, so whenever you update any row, you're updating the clustered index.
See this MSDN article