SQL Server Sargable Query - sql

I ran a test of a sargable and non-sargable query in Enterprise Manager and was surprised by the Execution Plan. They both used a Clustered Index Scan. I was expecting the sargable query to use a Seek.
I used this table:
CREATE TABLE [dbo].[TestSargable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Created] [datetime] NOT NULL)
GO
ALTER TABLE [dbo].[TestSargable] ADD CONSTRAINT [PK_TestSargable] PRIMARY KEY CLUSTERED ([ID] ASC)
GO
Sargable Query:
SELECT [ID]
,[Created]
FROM [dbo].[TestSargable]
WHERE [Created] > '2014-02-28 23:59:59'
AND [Created] < '2014-04-01 00:00:00'
Non Sargable Query:
SELECT [ID]
,[Created]
FROM [dbo].[TestSargable]
WHERE datediff(MM, [Created], '2014-03-01') = 0
When I viewed the actual execution plan they both used a Clustered Index Scan.
Am I missing something here or is the first query non-sargable also?
This is running on my dev box using SQLExpress 11.0.2100.

You don't have any suitable index to seek into.
CREATE INDEX IX ON [dbo].[TestSargable]([Created])
Then you see an index range seek on the sargable version and scan for the other.
The clustered index key gets added to the non clustered index as a row locator so the index is able to cover both columns in the SELECT without any need for lookups.

Related

Adding dummy where condition brings execution plan to seek

Could you please have a look at http://sqlfiddle.com/#!18/7ad28/8 and help me in understanding why adding a where condition will bring index on seek from scan? As per my (wrong) understanding, It should not have made any difference since its a greater then condition which should have caused scan.
I am also pasting table script and queries in question below
CREATE TABLE [dbo].[Mappings]
(
[MappingID] [smallint] NOT NULL IDENTITY(1, 1),
[ProductID] [smallint] NOT NULL,
[CategoryID] [smallint] NOT NULL
)
GO
ALTER TABLE [dbo].[Mappings] ADD CONSTRAINT [pk_Mappings_MappingID] PRIMARY KEY CLUSTERED ([MappingID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [nc_Mappings_ProductIDCategoryID] ON [dbo].[Mappings] ([ProductID], [CategoryID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CustomerProducts]
(
[CustomerID] [bigint] NOT NULL,
[ProductID] [smallint] NOT NULL,
[SomeDate] [datetimeoffset] (0) NULL,
[SomeAttribute] [bigint] NULL
)
GO
ALTER TABLE [dbo].[CustomerProducts] ADD CONSTRAINT [pk_CustomerProducts_ProductIDCustomerID] PRIMARY KEY CLUSTERED ([ProductID], [CustomerID]) ON [PRIMARY]
GO
--SCAN [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
Where b.CustomerID = 88;
--SEEK [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
AND b.CustomerID = 88
Where a.[ProductID] > 0;
"It should not have made any difference since its a greater then condition which should have caused scan."
You added an explicit predicate (ProductID > 0) so SQL Server chooses to seek on that value (0) then range scan. To see this, select the Index Seek on Mappings, open the Properties Tab, and look for Seek Predicates, and expand the entire tree of results. You'll see Start and applicable range scan attributes underneath.
So if you had real data (pretend you have ProductIDs from 1-100), and have a WHERE ProductID > 77. You'll seek in the B-Tree to ProductID 77, then RANGE SCAN the remainder of the non-clustered index.
Watch this: this'll help you visualize and understand what happens internally in different index operations (disclaimer: this is me presenting)
https://youtu.be/fDd4lw6DfqU?t=748
Here's what the plans look like:
Hovered in yellow is the information on the clustered index seek from table CustomerProducts. The seek predicate is set to the value of the condition [ProductID] > 0 which is perfectly reasonable as part of the join condition is a.[ProductID] = b.[ProductID] and also a.[ProductID] > 0 in the where clause. This means that b.[ProductID] > 0. As ProductID is the first column on the clustered index, any information that reduces the lookup can be used. The seek operation should be faster than the scan, so the optimizer will try to do that.

If I place a composite index on three columns and use them in the same query but in different places, will it still be effective?

With the following table and index:
CREATE TABLE [Ticket]
(
[Id] BIGINT IDENTITY NOT NULL,
[Title] CHARACTER VARYING(255) NOT NULL,
[Description] CHARACTER VARYING(MAX) NOT NULL,
[Severity] INTEGER NOT NULL,
[Priority] INTEGER NOT NULL,
[CreatedOn] DATETIMEOFFSET NOT NULL,
PRIMARY KEY([Id])
);
CREATE INDEX [Ticket_Priority_Severity_CreatedOn_IX] ON [Ticket]([Priority], [Severity], [CreatedOn]);
Will the following query:
SELECT [Id]
FROM [Ticket]
WHERE [Priority] = 1
ORDER BY [Severity] DESC, [CreatedOn] ASC
make use of the entire composite index or only utilize the [Priority] part of the index?
I know that for a query that had all of the columns in the WHERE clause, the whole index would be used. I am unsure about the above case though!
Given the actual execution plan below, on a table with no statistics, I am not sure how to interpret it.
It does look like it used the index, but which parts? There is clearly a sort cost, but is that sorting by [Severity] and then [CreatedOn] after doing a seek on [Priority]?
It may use the index, but it will only use the Priority part efficiently since you have the index sorted in a way that is not optimal for the query;
ORDER BY [Severity] DESC, [CreatedOn] ASC
vs
CREATE INDEX [Ticket_Priority_Severity_CreatedOn_IX] ON
[Ticket]([Priority], [Severity], [CreatedOn]);
As you can see in this fiddle if you click the execution plan, the query is split into an index seek and a sort.
Since Severity is sorted ascended, the index won't be (optimally) used for the sort. If you really want an optimal sort, index Severity descending as your query uses it;
CREATE INDEX [Ticket_Priority_Severity_CreatedOn_IX] ON
[Ticket]([Priority], [Severity] DESC, [CreatedOn]);
An SQLfiddle with the fixed index. Note that the whole query is now an index seek.
Note that the plan may look different for you depending on your data, but in general this is true, an index sorted the same way as the query accesses it will use the index better.

SQL Server why index is not used

I have a following table in SQL Server 2008 database:
CREATE TABLE [dbo].[Actions](
[ActionId] [int] IDENTITY(1,1) NOT NULL,
[ActionTypeId] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Description] [nvarchar](1000) NOT NULL,
[Comment] [nvarchar](500) NOT NULL,
[Created] [datetime] NOT NULL,
[Executed] [datetime] NULL,
[DisplayText] [nvarchar](1000) NULL,
[ExecutedBy] [int] NULL,
[Result] [int] NULL
)
CONSTRAINT [PK_Actions] PRIMARY KEY CLUSTERED
(
[CaseActionId] ASC
)
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Actions_Executed] ON [dbo].[Actions]
(
[Executed] ASC,
[ExecutedBy] ASC
)
There are 20 000 rows which has Executed date equal to '2500-01-01' and 420 000 rows which has Executed date < '2500-01-01'.
When I execute a query
select CaseActionId, Executed, ExecutedBy, DisplayText from CaseActions
where Executed='2500-01-01'
the query plans shows that the clustered index scan on PK_Actions is performed and the index IX_Actions_Executed is not used at all.
What funny I got missing index hint which says
/* The Query Processor estimates that implementing the following index could improve the query cost by 99.9901%.
*/
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Actions] ([Executed])
But the index is already there.
Why the index is not used if it would select 5% of the data ?
Most likely, the query optimizer just sees that you're selecting DisplayText as well - so for each of the 20'000 rows found in the NC index, there would need to be a key lookup into the clustered index to get that data - and key lookups are expensive operations! So in the end, it might just be easier and more efficient to scan the clustere index right away.
I bet if you run this query here:
select CaseActionId, Executed, ExecutedBy
from CaseActions
where Executed='2500-01-01'
then the NC index will be used
If you really need the DisplayText and that's a query you'll run frequently, maybe you should include that column in the index as an extra column in the leaf level:
DROP INDEX [IX_Actions_Executed]
CREATE NONCLUSTERED INDEX [IX_Actions_Executed]
ON [dbo].[Actions]([Executed] ASC, [ExecutedBy] ASC)
INCLUDE([DisplayText])
This would make your NC index a covering index, i.e. it could return all columns needed for your query. If you run your original query again with this covering index in place, I'm pretty sure SQL Server's query optimizer will indeed use it. The probability that any NC index will be used is significantly increased if that NC index is a covering index, e.g. some queries can get all their columns they need from just the NC index, without key lookups.
The missing index hints are a bit misleading at times - there are also known bugs leading to SQL Server Mgmt Studio to continously recommendation indices that are already in place..... don't bet too much of your money on those index hints!

Will creating index help in this case

I'm still a learning user of SQL-SERVER2005.
Here is my table structure
CREATE TABLE [dbo].[Trn_PostingGroups](
[ControlGroup] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[PracticeCode] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ScanDate] [smalldatetime] NULL,
[DepositDate] [smalldatetime] NULL,
[NameOfFile] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DepositValue] [decimal](11, 2) NULL,
[RecordStatus] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_Trn_PostingGroups_1] PRIMARY KEY CLUSTERED
(
[ControlGroup] ASC,
[PracticeCode] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Scenario 1 : Suppose I have a query like this...
Select * from Trn_PostingGroups where PracticeCode = 'ABC'
Will indexing on Practice Code seperately help me in making my query faster??
Scenario 2 :
Select * from Trn_PostingGroups
where
ControlGroup = 12701
and PracticeCode = 'ABC'
and NameOfFile = 'FileName1'
Will indexing on NameOfFile seperately help me in making my query faster ??
If you were only selecting on the first field (ControlGroup), it is the primary sort of the clustered index and you wouldn't need to index the other field.
If you select on the other primary key fields, then adding a separate index on the other fields should help with such selects.
In general, you should index fields that are commonly used in SORT and WHERE clauses. This of course is over simplified.
See this article for more information about optimizing (statistics and query analyser).
You can only utilize one index per table per query (unless you consider self joins or CTEs). if you have multiple that can be used on the same table in the same query, then SQL Server will use statistics to determine which would be better to use.
In Scenario 1, if you create an index on PracticeCode alone, it will usually be used, as long as you have enough rows that a table scan costs more and that there is a diverse range of values in that column. An index will not be used if there are only a few rows in the table (it is faster to just look at them all). Also, an index will not be used if most of the values in that column are the same. It will not use the PK in this query, it would be like looking for a first name in the phone book, you can't use the index because it is last+first name. You might consider reversing your PK to PracticeCode+ControlGroup if you never search on ControlGroup by itself.
In Scenario 2, if you have an index on NameOfFile it will probably use the PK and ignore the NameOfFile index. Unless you make the NameOfFile index unique, and then it is a tossup. You might try to create an index (in addition to your PK) on ControlGroup+PracticeCode+NameOfFile. if you have many files per ControlGroup+PracticeCode, then it may select that index over the PK index.

Increasing performance on a logging table in SQL Server 2005

I have a "history" table where I log each request into a Web Handler on our web site. Here is the table definition:
/****** Object: Table [dbo].[HistoryRequest] Script Date: 10/09/2009 17:18:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[HistoryRequest](
[HistoryRequestID] [uniqueidentifier] NOT NULL,
[CampaignID] [int] NOT NULL,
[UrlReferrer] [nvarchar](512) NOT NULL,
[UserAgent] [nvarchar](512) NOT NULL,
[UserHostAddress] [nvarchar](15) NOT NULL,
[UserHostName] [nvarchar](512) NOT NULL,
[HttpBrowserCapabilities] [xml] NOT NULL,
[Created] [datetime] NOT NULL,
[CreatedBy] [nvarchar](100) NOT NULL,
[Updated] [datetime] NULL,
[UpdatedBy] [nvarchar](100) NULL,
CONSTRAINT [PK_HistoryRequest] PRIMARY KEY CLUSTERED
(
[HistoryRequestID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[HistoryRequest] WITH CHECK ADD CONSTRAINT [FK_HistoryRequest_Campaign] FOREIGN KEY([CampaignID])
REFERENCES [dbo].[Campaign] ([CampaignId])
GO
ALTER TABLE [dbo].[HistoryRequest] CHECK CONSTRAINT [FK_HistoryRequest_Campaign]
GO
37 seconds for 1050 rows on this statement:
SELECT *
FROM HistoryRequest AS hr
WHERE Created > '10/9/2009'
ORDER BY Created DESC
Does anyone have anysuggestions for speeding this up? I have a Clustered Index on the PK and a regular Index on the CREATED column. I tried a Unique Index and it barfed complaining there is a duplicate entry somewhere - which can be expected.
Any insights are welcome!
You are requesting all columns (*) over a non-covering index (created). On a large data set you are guaranteed to hit the Index Tipping Point where the clustered index scan is more efficient than an nonclustered index range seek and bookmark lookup.
Do you need * always? If yes, and if the typical access pattern is like this, then you must organize the table accordingly and make Created the leftmost clustered key.
If not, then consider changing your query to a coverable query, eg. select only HistoryRequestID and Created, which are covered by the non clustered index. If more fields are needed, add them as included columns to the non-clustered index, but take into account that this will add extra strorage space and IO log write time.
Hey, I've seen some odd behavior when pulling XML columns in large sets. Try putting your index on Created back, then specify the columns in your select statement; but omit the XML. See how that affects the return time for results.
For a log table, you probably don't need a uniqueidentifier column. You're not likely to query on it either, so it's not a good candidate for a clustered index. Your sample query is on "Created", yet there's no index on it. If you query frequently on ranges of "Created" values then it would be a good candidate for clustering even though it's not necessarily unique.
OTOH, the foreign key suggests frequent querying by Campaign, in which case having the clustering done by that column could make sense, and would also probably do a better job of scattering the inserted keys in the indexes - both the surrogate key and the timestamp would add records in sequential order, which is net more work over time for insertions because the node sectors are filled less randomly.
If it's just a log table, why does it have update audit columns? It would normally be write-only.
Rebuild indexes. Use WITH (NOLOCK) clause after the table names where appropriate, this probably applies if you want to run long(ish) running queries against table that are heavily used in a live environment (such as a log file). It basically means your query migth miss some of teh very latest records but you also aren't holding a lock open on the table - which creates additional overhead.