I have a following table in SQL Server 2008 database:
CREATE TABLE [dbo].[Actions](
[ActionId] [int] IDENTITY(1,1) NOT NULL,
[ActionTypeId] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Description] [nvarchar](1000) NOT NULL,
[Comment] [nvarchar](500) NOT NULL,
[Created] [datetime] NOT NULL,
[Executed] [datetime] NULL,
[DisplayText] [nvarchar](1000) NULL,
[ExecutedBy] [int] NULL,
[Result] [int] NULL
)
CONSTRAINT [PK_Actions] PRIMARY KEY CLUSTERED
(
[CaseActionId] ASC
)
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Actions_Executed] ON [dbo].[Actions]
(
[Executed] ASC,
[ExecutedBy] ASC
)
There are 20 000 rows which has Executed date equal to '2500-01-01' and 420 000 rows which has Executed date < '2500-01-01'.
When I execute a query
select CaseActionId, Executed, ExecutedBy, DisplayText from CaseActions
where Executed='2500-01-01'
the query plans shows that the clustered index scan on PK_Actions is performed and the index IX_Actions_Executed is not used at all.
What funny I got missing index hint which says
/* The Query Processor estimates that implementing the following index could improve the query cost by 99.9901%.
*/
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Actions] ([Executed])
But the index is already there.
Why the index is not used if it would select 5% of the data ?
Most likely, the query optimizer just sees that you're selecting DisplayText as well - so for each of the 20'000 rows found in the NC index, there would need to be a key lookup into the clustered index to get that data - and key lookups are expensive operations! So in the end, it might just be easier and more efficient to scan the clustere index right away.
I bet if you run this query here:
select CaseActionId, Executed, ExecutedBy
from CaseActions
where Executed='2500-01-01'
then the NC index will be used
If you really need the DisplayText and that's a query you'll run frequently, maybe you should include that column in the index as an extra column in the leaf level:
DROP INDEX [IX_Actions_Executed]
CREATE NONCLUSTERED INDEX [IX_Actions_Executed]
ON [dbo].[Actions]([Executed] ASC, [ExecutedBy] ASC)
INCLUDE([DisplayText])
This would make your NC index a covering index, i.e. it could return all columns needed for your query. If you run your original query again with this covering index in place, I'm pretty sure SQL Server's query optimizer will indeed use it. The probability that any NC index will be used is significantly increased if that NC index is a covering index, e.g. some queries can get all their columns they need from just the NC index, without key lookups.
The missing index hints are a bit misleading at times - there are also known bugs leading to SQL Server Mgmt Studio to continously recommendation indices that are already in place..... don't bet too much of your money on those index hints!
Related
Could you please have a look at http://sqlfiddle.com/#!18/7ad28/8 and help me in understanding why adding a where condition will bring index on seek from scan? As per my (wrong) understanding, It should not have made any difference since its a greater then condition which should have caused scan.
I am also pasting table script and queries in question below
CREATE TABLE [dbo].[Mappings]
(
[MappingID] [smallint] NOT NULL IDENTITY(1, 1),
[ProductID] [smallint] NOT NULL,
[CategoryID] [smallint] NOT NULL
)
GO
ALTER TABLE [dbo].[Mappings] ADD CONSTRAINT [pk_Mappings_MappingID] PRIMARY KEY CLUSTERED ([MappingID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [nc_Mappings_ProductIDCategoryID] ON [dbo].[Mappings] ([ProductID], [CategoryID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CustomerProducts]
(
[CustomerID] [bigint] NOT NULL,
[ProductID] [smallint] NOT NULL,
[SomeDate] [datetimeoffset] (0) NULL,
[SomeAttribute] [bigint] NULL
)
GO
ALTER TABLE [dbo].[CustomerProducts] ADD CONSTRAINT [pk_CustomerProducts_ProductIDCustomerID] PRIMARY KEY CLUSTERED ([ProductID], [CustomerID]) ON [PRIMARY]
GO
--SCAN [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
Where b.CustomerID = 88;
--SEEK [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
AND b.CustomerID = 88
Where a.[ProductID] > 0;
"It should not have made any difference since its a greater then condition which should have caused scan."
You added an explicit predicate (ProductID > 0) so SQL Server chooses to seek on that value (0) then range scan. To see this, select the Index Seek on Mappings, open the Properties Tab, and look for Seek Predicates, and expand the entire tree of results. You'll see Start and applicable range scan attributes underneath.
So if you had real data (pretend you have ProductIDs from 1-100), and have a WHERE ProductID > 77. You'll seek in the B-Tree to ProductID 77, then RANGE SCAN the remainder of the non-clustered index.
Watch this: this'll help you visualize and understand what happens internally in different index operations (disclaimer: this is me presenting)
https://youtu.be/fDd4lw6DfqU?t=748
Here's what the plans look like:
Hovered in yellow is the information on the clustered index seek from table CustomerProducts. The seek predicate is set to the value of the condition [ProductID] > 0 which is perfectly reasonable as part of the join condition is a.[ProductID] = b.[ProductID] and also a.[ProductID] > 0 in the where clause. This means that b.[ProductID] > 0. As ProductID is the first column on the clustered index, any information that reduces the lookup can be used. The seek operation should be faster than the scan, so the optimizer will try to do that.
I am creating an index on a table and I want to include a covering column: messageText nvarchar(1024)
After insertion, the messageText is never updated, so it's an ideal candidate to include in a covering index to speed up lookups.
But what happens if I update other columns in same index?
Will the entire row in the index need reallocating or will just that data from the updated column be updated in the index?
Simple Example
Imaging the following table:
CREATE TABLE [Messages](
[messageID] [int] IDENTITY(1,1) NOT NULL,
[mbrIDTo] [int] NOT NULL,
[isRead] [bit] NOT NULL,
[messageText] [nvarchar](1024) NOT NULL
)
And the following Index:
CREATE NONCLUSTERED INDEX [IX_messages] ON [Messages] ( [mbrIDTo] ASC, [messageID] ASC )
INCLUDE ( [isRead], [messageText])
When we update the table:
UPDATE Messages
SET isRead = 1
WHERE (mbrIDTo = 6546)
The query plan shows that the index IX_messages is utilized and will also be updated becuase the column isRead is part of the index.
Therefore does including large text fields (such as messageText in the above) as part of a covering column in an index, impact performance when other values, in that same index, are updated?
When a row is updated in SQL Server, the entire row is deleted and a new row with the updated records is inserted. Therefore, even if the messageText field is not changing, it will still have to be re-written to the disk.
Here is a blog post from Paul Randall with a good example: http://www.sqlskills.com/blogs/paul/do-changes-to-index-keys-really-do-in-place-updates/
With the following table and index:
CREATE TABLE [Ticket]
(
[Id] BIGINT IDENTITY NOT NULL,
[Title] CHARACTER VARYING(255) NOT NULL,
[Description] CHARACTER VARYING(MAX) NOT NULL,
[Severity] INTEGER NOT NULL,
[Priority] INTEGER NOT NULL,
[CreatedOn] DATETIMEOFFSET NOT NULL,
PRIMARY KEY([Id])
);
CREATE INDEX [Ticket_Priority_Severity_CreatedOn_IX] ON [Ticket]([Priority], [Severity], [CreatedOn]);
Will the following query:
SELECT [Id]
FROM [Ticket]
WHERE [Priority] = 1
ORDER BY [Severity] DESC, [CreatedOn] ASC
make use of the entire composite index or only utilize the [Priority] part of the index?
I know that for a query that had all of the columns in the WHERE clause, the whole index would be used. I am unsure about the above case though!
Given the actual execution plan below, on a table with no statistics, I am not sure how to interpret it.
It does look like it used the index, but which parts? There is clearly a sort cost, but is that sorting by [Severity] and then [CreatedOn] after doing a seek on [Priority]?
It may use the index, but it will only use the Priority part efficiently since you have the index sorted in a way that is not optimal for the query;
ORDER BY [Severity] DESC, [CreatedOn] ASC
vs
CREATE INDEX [Ticket_Priority_Severity_CreatedOn_IX] ON
[Ticket]([Priority], [Severity], [CreatedOn]);
As you can see in this fiddle if you click the execution plan, the query is split into an index seek and a sort.
Since Severity is sorted ascended, the index won't be (optimally) used for the sort. If you really want an optimal sort, index Severity descending as your query uses it;
CREATE INDEX [Ticket_Priority_Severity_CreatedOn_IX] ON
[Ticket]([Priority], [Severity] DESC, [CreatedOn]);
An SQLfiddle with the fixed index. Note that the whole query is now an index seek.
Note that the plan may look different for you depending on your data, but in general this is true, an index sorted the same way as the query accesses it will use the index better.
I ran a test of a sargable and non-sargable query in Enterprise Manager and was surprised by the Execution Plan. They both used a Clustered Index Scan. I was expecting the sargable query to use a Seek.
I used this table:
CREATE TABLE [dbo].[TestSargable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Created] [datetime] NOT NULL)
GO
ALTER TABLE [dbo].[TestSargable] ADD CONSTRAINT [PK_TestSargable] PRIMARY KEY CLUSTERED ([ID] ASC)
GO
Sargable Query:
SELECT [ID]
,[Created]
FROM [dbo].[TestSargable]
WHERE [Created] > '2014-02-28 23:59:59'
AND [Created] < '2014-04-01 00:00:00'
Non Sargable Query:
SELECT [ID]
,[Created]
FROM [dbo].[TestSargable]
WHERE datediff(MM, [Created], '2014-03-01') = 0
When I viewed the actual execution plan they both used a Clustered Index Scan.
Am I missing something here or is the first query non-sargable also?
This is running on my dev box using SQLExpress 11.0.2100.
You don't have any suitable index to seek into.
CREATE INDEX IX ON [dbo].[TestSargable]([Created])
Then you see an index range seek on the sargable version and scan for the other.
The clustered index key gets added to the non clustered index as a row locator so the index is able to cover both columns in the SELECT without any need for lookups.
I'm still a learning user of SQL-SERVER2005.
Here is my table structure
CREATE TABLE [dbo].[Trn_PostingGroups](
[ControlGroup] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[PracticeCode] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ScanDate] [smalldatetime] NULL,
[DepositDate] [smalldatetime] NULL,
[NameOfFile] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DepositValue] [decimal](11, 2) NULL,
[RecordStatus] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_Trn_PostingGroups_1] PRIMARY KEY CLUSTERED
(
[ControlGroup] ASC,
[PracticeCode] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Scenario 1 : Suppose I have a query like this...
Select * from Trn_PostingGroups where PracticeCode = 'ABC'
Will indexing on Practice Code seperately help me in making my query faster??
Scenario 2 :
Select * from Trn_PostingGroups
where
ControlGroup = 12701
and PracticeCode = 'ABC'
and NameOfFile = 'FileName1'
Will indexing on NameOfFile seperately help me in making my query faster ??
If you were only selecting on the first field (ControlGroup), it is the primary sort of the clustered index and you wouldn't need to index the other field.
If you select on the other primary key fields, then adding a separate index on the other fields should help with such selects.
In general, you should index fields that are commonly used in SORT and WHERE clauses. This of course is over simplified.
See this article for more information about optimizing (statistics and query analyser).
You can only utilize one index per table per query (unless you consider self joins or CTEs). if you have multiple that can be used on the same table in the same query, then SQL Server will use statistics to determine which would be better to use.
In Scenario 1, if you create an index on PracticeCode alone, it will usually be used, as long as you have enough rows that a table scan costs more and that there is a diverse range of values in that column. An index will not be used if there are only a few rows in the table (it is faster to just look at them all). Also, an index will not be used if most of the values in that column are the same. It will not use the PK in this query, it would be like looking for a first name in the phone book, you can't use the index because it is last+first name. You might consider reversing your PK to PracticeCode+ControlGroup if you never search on ControlGroup by itself.
In Scenario 2, if you have an index on NameOfFile it will probably use the PK and ignore the NameOfFile index. Unless you make the NameOfFile index unique, and then it is a tossup. You might try to create an index (in addition to your PK) on ControlGroup+PracticeCode+NameOfFile. if you have many files per ControlGroup+PracticeCode, then it may select that index over the PK index.