Adding dummy where condition brings execution plan to seek - sql

Could you please have a look at http://sqlfiddle.com/#!18/7ad28/8 and help me in understanding why adding a where condition will bring index on seek from scan? As per my (wrong) understanding, It should not have made any difference since its a greater then condition which should have caused scan.
I am also pasting table script and queries in question below
CREATE TABLE [dbo].[Mappings]
(
[MappingID] [smallint] NOT NULL IDENTITY(1, 1),
[ProductID] [smallint] NOT NULL,
[CategoryID] [smallint] NOT NULL
)
GO
ALTER TABLE [dbo].[Mappings] ADD CONSTRAINT [pk_Mappings_MappingID] PRIMARY KEY CLUSTERED ([MappingID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [nc_Mappings_ProductIDCategoryID] ON [dbo].[Mappings] ([ProductID], [CategoryID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CustomerProducts]
(
[CustomerID] [bigint] NOT NULL,
[ProductID] [smallint] NOT NULL,
[SomeDate] [datetimeoffset] (0) NULL,
[SomeAttribute] [bigint] NULL
)
GO
ALTER TABLE [dbo].[CustomerProducts] ADD CONSTRAINT [pk_CustomerProducts_ProductIDCustomerID] PRIMARY KEY CLUSTERED ([ProductID], [CustomerID]) ON [PRIMARY]
GO
--SCAN [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
Where b.CustomerID = 88;
--SEEK [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
AND b.CustomerID = 88
Where a.[ProductID] > 0;

"It should not have made any difference since its a greater then condition which should have caused scan."
You added an explicit predicate (ProductID > 0) so SQL Server chooses to seek on that value (0) then range scan. To see this, select the Index Seek on Mappings, open the Properties Tab, and look for Seek Predicates, and expand the entire tree of results. You'll see Start and applicable range scan attributes underneath.
So if you had real data (pretend you have ProductIDs from 1-100), and have a WHERE ProductID > 77. You'll seek in the B-Tree to ProductID 77, then RANGE SCAN the remainder of the non-clustered index.
Watch this: this'll help you visualize and understand what happens internally in different index operations (disclaimer: this is me presenting)
https://youtu.be/fDd4lw6DfqU?t=748

Here's what the plans look like:
Hovered in yellow is the information on the clustered index seek from table CustomerProducts. The seek predicate is set to the value of the condition [ProductID] > 0 which is perfectly reasonable as part of the join condition is a.[ProductID] = b.[ProductID] and also a.[ProductID] > 0 in the where clause. This means that b.[ProductID] > 0. As ProductID is the first column on the clustered index, any information that reduces the lookup can be used. The seek operation should be faster than the scan, so the optimizer will try to do that.

Related

Optimizing slow query with multiple withs

I have the following query:
with matched_urls as
(
select l.Url, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.Url
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = 3062
group by l.Url
),
all_urls as
(
select l.Url, 5 as Filter
from landingpages l
where 'iylsuqnzukwv0milinztea' in (select domainid
from domainlandingpages dlp
where l.Url = dlp.landingpageid)
and l.Url not in (select Url from matched_urls)
union
select * from matched_urls
)
select l.*
from landingpages l
join all_urls u on l.Url = u.Url
order by u.Filter asc
offset 0 rows fetch next 30 rows only
These are the tables used in the query:
And this is the DDL for the tables:
CREATE TABLE [dbo].[LandingPages]
(
[Url] [nvarchar](448) NOT NULL,
[LastUpdated] [datetime2](7) NOT NULL,
CONSTRAINT [PK_LandingPages]
PRIMARY KEY CLUSTERED ([Url] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DomainLandingPages]
(
[LandingPageId] [nvarchar](448) NOT NULL,
[DomainId] [nvarchar](128) NOT NULL,
CONSTRAINT [PK_DomainLandingPages]
PRIMARY KEY CLUSTERED ([DomainId] ASC, [LandingPageId] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DomainLandingPages] WITH CHECK
ADD CONSTRAINT [FK_DomainLandingPages_Domains_DomainId]
FOREIGN KEY([DomainId]) REFERENCES [dbo].[Domains] ([Id])
GO
ALTER TABLE [dbo].[DomainLandingPages] CHECK CONSTRAINT [FK_DomainLandingPages_Domains_DomainId]
GO
ALTER TABLE [dbo].[DomainLandingPages] WITH CHECK
ADD CONSTRAINT [FK_DomainLandingPages_LandingPages_LandingPageId]
FOREIGN KEY([LandingPageId]) REFERENCES [dbo].[LandingPages] ([Url])
GO
ALTER TABLE [dbo].[DomainLandingPages] CHECK CONSTRAINT [FK_DomainLandingPages_LandingPages_LandingPageId]
GO
CREATE TABLE [dbo].[UrlFilters]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[GroupId] [int] NOT NULL,
[UrlFilterType] [int] NOT NULL,
[Filter] [nvarchar](max) NOT NULL,
CONSTRAINT [PK_UrlFilters]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[UrlFilters] WITH CHECK
ADD CONSTRAINT [FK_UrlFilters_Groups_GroupId]
FOREIGN KEY([GroupId]) REFERENCES [dbo].[Groups] ([Id])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[UrlFilters] CHECK CONSTRAINT [FK_UrlFilters_Groups_GroupId]
GO
CREATE TABLE [dbo].[LandingPageUrlFilters]
(
[LandingPageId] [nvarchar](448) NOT NULL,
[UrlFilterId] [int] NOT NULL,
CONSTRAINT [PK_LandingPageUrlFilters]
PRIMARY KEY CLUSTERED ([LandingPageId] ASC, [UrlFilterId] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] WITH CHECK
ADD CONSTRAINT [FK_LandingPageUrlFilters_LandingPages_LandingPageId]
FOREIGN KEY([LandingPageId]) REFERENCES [dbo].[LandingPages] ([Url])
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] CHECK CONSTRAINT [FK_LandingPageUrlFilters_LandingPages_LandingPageId]
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] WITH CHECK
ADD CONSTRAINT [FK_LandingPageUrlFilters_UrlFilters_UrlFilterId]
FOREIGN KEY([UrlFilterId]) REFERENCES [dbo].[UrlFilters] ([Id])
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] CHECK CONSTRAINT [FK_LandingPageUrlFilters_UrlFilters_UrlFilterId]
GO
Here is the execution plan:
https://www.brentozar.com/pastetheplan/?id=H1tHt5pvP
The query is pulling all urls for a given domain and are then supposed to be ordered by UrlFilterType - however not all landing pages have a match, hence the two with clauses.
As far as I can see from the execution plan it's mainly doing index seeks, so I think I have the right indexes. However, the query takes very long to execute, so I hope there might be a smarter way of doing this.
Any input will be greatly appreciated!
First up, your statistics look wildly out of line. Estimated 3,700 rows, actual 219,000. That suggests, at the very least, a statistics update will possibly change the choices the optimizer is making. Because of those row estimates, the optimizer is choosing a seek and a nested loops operation where, based on the data distribution, you're reading 1/3 of the table, 200k rows of a 600k table. A scan here, probably with a hash join, would be more efficient.
The query itself isn't giving much to filter the [aarhus-cluster-onesearch-staging].[dbo].[LandingPages].[PK_LandingPages] table on. So it's pulling 200k rows in order to filter them down to 30. If you can find a way to additional filtering there, you should see a performance improvement.
OK, the big hit is at the bottom of that query plan - where it's reading from LandingPageUrlFilters where the URLfilterID comes from URLfilters.
It's getting completely the wrong estimates (out by 70x) and then sorts your URLs taking 30 seconds or so.
If you run the first CTE on its own, I think it will take a long time. That's what you need to optimise.
select l.Url, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.Url
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = 3062
group by l.Url
Suggestions
First thing to try is to also add a nonclustered index on LandingPageUrlFilters in the opposite order to your clustered index (e.g., CREATE NONCLUSTERED INDEX myindex ON LandingPageUrlFilters ([UrlFilterId] ASC, [LandingPageId] ASC). Note this will make a full copy of that table which may be rather large. It appears you already have a non-clustered index like this (based on the fact it's referring to IX_LandingPageUrlFilters_UrlFilterId)
The sort on the string field that is nvarchar(448) - which is actually close to 900 bytes per row - will take a much bigger memory grant. Consider adding ID int values as primary keys - it will require less memory and therefore less likely to spill to disk
Consider, instead of a CTE, create a temporary table (with appropriate PKs) of the section and LandingPageURLFilters and urlfilters. However, you will still need to do a sort when inserting these - which is likely to take just as long.
Part of the problem (another 10s or so) is a nested loop join to LandingPages. It was expecting less than 4000 rows (so a nested loop is OK) but had to do 220,000 loops. If necessary, consider a join hint (e.g., INNER HASH JOIN rather than INNER JOIN). However, it appears that landingpages isn't actually required in that query - just remove the table landingpages from the CTE, and use landingpageurlfilters.landingpageid
e.g.,
select lpf.landingpageid AS [Url], min(f.urlfiltertype) as Filter
from landingpageurlfilters lpf
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = 3062
group by lpf.landingpageid
I think that gives the same results as the CTE I copied above.

SQL server query plan

I have 3 tables as listed below
CREATE TABLE dbo.RootTransaction
(
TransactionID int CONSTRAINT [PK_RootTransaction] PRIMARY KEY NONCLUSTERED (TransactionID ASC)
)
GO
----------------------------------------------------------------------------------------------------
CREATE TABLE [dbo].[OrderDetails](
[OrderID] int identity(1,1) not null,
TransactionID int,
OrderDate datetime,
[Status] varchar(50)
CONSTRAINT [PK_OrderDetails] PRIMARY KEY CLUSTERED ([OrderID] ASC),
CONSTRAINT [FK_TransactionID] FOREIGN KEY ([TransactionID]) REFERENCES [dbo].[RootTransaction] ([TransactionID]),
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [ix_OrderDetails_TransactionID]
ON [dbo].[OrderDetails](TransactionID ASC, [OrderID] ASC);
GO
----------------------------------------------------------------------------------------------------
CREATE TABLE dbo.OrderItems
(
ItemID int identity(1,1) not null,
[OrderID] int,
[Name] VARCHAR (50) NOT NULL,
[Code] VARCHAR (9) NULL,
CONSTRAINT [PK_OrderItems] PRIMARY KEY NONCLUSTERED ([ItemID] ASC),
CONSTRAINT [FK_OrderID] FOREIGN KEY ([OrderID]) REFERENCES [dbo].[OrderDetails] ([OrderID])
)
Go
CREATE CLUSTERED INDEX OrderItems
ON [dbo].OrderItems([OrderID] ASC, ItemID ASC) WITH (FILLFACTOR = 90);
GO
CREATE NONCLUSTERED INDEX [IX_Code]
ON [dbo].[OrderItems]([Code] ASC) WITH (FILLFACTOR = 90)
----------------------------------------------------------------------------------------------------
Populated sample data in each table
select COUNT(*) from RootTransaction -- 45851
select COUNT(*) from [OrderDetails] -- 50201
select COUNT(*) from OrderItems --63850
-- Query 1
SELECT o.TransactionID
FROM [OrderDetails] o
JOIN dbo.OrderItems i ON o.OrderID = i.OrderID
WHERE i.Code like '1067461841%'
declare #SearchKeyword varchar(200) = '1067461841'
-- Query 2
SELECT o.TransactionID
FROM [OrderDetails] o
JOIN dbo.OrderItems i ON o.OrderID = i.OrderID
WHERE i.Code like #SearchKeyword + '%'
When running above 2 queries, I could see Query 1 use index seek on OrderDetails, OrderItems which is expected,
However in query 2, query plan use index seek on OrderItems but index scan on OrderDetails.
Only difference in two queries is using direct value vs variable in LIKE and both returns same result.
why the query execution plan change between using direct value vs variable?
I believe the issue is most likely explained through parameter sniffing. SQL Server often identifies and caches query plans for commonly used queries. As part of this caching, it "sniffs" the parameters you use on the most common queries to optimize the creation of the plan.
Query 1 shows a direct string, so SQL creates a specific plan. Query 2 uses an intermediate variable, which is one of the techniques that actually prevents parameter sniffing (often used to provide more predictable performance to stored procs or queries where the parameters have significant variance. These are considered 2 completely different queries to SQL despite the obvious similarities. The observed differences are essentially just optimization.
Furthermore, if your tables had different distributions of row counts, you'd likely potential differences from those 2 scenarios based on existing indexes and potential optimizations. On my server with no sample data loaded, the query 1 and query 2 had same execution plans since the optimizer couldn't find any better paths for the parameters.
For more info: http://blogs.technet.com/b/mdegre/archive/2012/03/19/what-is-parameter-sniffing.aspx
Below queries show similar plan though WHERE clause is different.
select Code from OrderItems WHERE Code like '6662225%'
declare #SearchKeyword varchar(200) = '6662225'
select Code from OrderItems WHERE Code like #SearchKeyword + '%'
The following post/answers offer a good explanation as to why performance is better with hard coded constants than variables, along with a few suggestions you could possibly try out:
Alternative to using local variables in a where clause

SQL Server Sargable Query

I ran a test of a sargable and non-sargable query in Enterprise Manager and was surprised by the Execution Plan. They both used a Clustered Index Scan. I was expecting the sargable query to use a Seek.
I used this table:
CREATE TABLE [dbo].[TestSargable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Created] [datetime] NOT NULL)
GO
ALTER TABLE [dbo].[TestSargable] ADD CONSTRAINT [PK_TestSargable] PRIMARY KEY CLUSTERED ([ID] ASC)
GO
Sargable Query:
SELECT [ID]
,[Created]
FROM [dbo].[TestSargable]
WHERE [Created] > '2014-02-28 23:59:59'
AND [Created] < '2014-04-01 00:00:00'
Non Sargable Query:
SELECT [ID]
,[Created]
FROM [dbo].[TestSargable]
WHERE datediff(MM, [Created], '2014-03-01') = 0
When I viewed the actual execution plan they both used a Clustered Index Scan.
Am I missing something here or is the first query non-sargable also?
This is running on my dev box using SQLExpress 11.0.2100.
You don't have any suitable index to seek into.
CREATE INDEX IX ON [dbo].[TestSargable]([Created])
Then you see an index range seek on the sargable version and scan for the other.
The clustered index key gets added to the non clustered index as a row locator so the index is able to cover both columns in the SELECT without any need for lookups.

SQL Server why index is not used

I have a following table in SQL Server 2008 database:
CREATE TABLE [dbo].[Actions](
[ActionId] [int] IDENTITY(1,1) NOT NULL,
[ActionTypeId] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Description] [nvarchar](1000) NOT NULL,
[Comment] [nvarchar](500) NOT NULL,
[Created] [datetime] NOT NULL,
[Executed] [datetime] NULL,
[DisplayText] [nvarchar](1000) NULL,
[ExecutedBy] [int] NULL,
[Result] [int] NULL
)
CONSTRAINT [PK_Actions] PRIMARY KEY CLUSTERED
(
[CaseActionId] ASC
)
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Actions_Executed] ON [dbo].[Actions]
(
[Executed] ASC,
[ExecutedBy] ASC
)
There are 20 000 rows which has Executed date equal to '2500-01-01' and 420 000 rows which has Executed date < '2500-01-01'.
When I execute a query
select CaseActionId, Executed, ExecutedBy, DisplayText from CaseActions
where Executed='2500-01-01'
the query plans shows that the clustered index scan on PK_Actions is performed and the index IX_Actions_Executed is not used at all.
What funny I got missing index hint which says
/* The Query Processor estimates that implementing the following index could improve the query cost by 99.9901%.
*/
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Actions] ([Executed])
But the index is already there.
Why the index is not used if it would select 5% of the data ?
Most likely, the query optimizer just sees that you're selecting DisplayText as well - so for each of the 20'000 rows found in the NC index, there would need to be a key lookup into the clustered index to get that data - and key lookups are expensive operations! So in the end, it might just be easier and more efficient to scan the clustere index right away.
I bet if you run this query here:
select CaseActionId, Executed, ExecutedBy
from CaseActions
where Executed='2500-01-01'
then the NC index will be used
If you really need the DisplayText and that's a query you'll run frequently, maybe you should include that column in the index as an extra column in the leaf level:
DROP INDEX [IX_Actions_Executed]
CREATE NONCLUSTERED INDEX [IX_Actions_Executed]
ON [dbo].[Actions]([Executed] ASC, [ExecutedBy] ASC)
INCLUDE([DisplayText])
This would make your NC index a covering index, i.e. it could return all columns needed for your query. If you run your original query again with this covering index in place, I'm pretty sure SQL Server's query optimizer will indeed use it. The probability that any NC index will be used is significantly increased if that NC index is a covering index, e.g. some queries can get all their columns they need from just the NC index, without key lookups.
The missing index hints are a bit misleading at times - there are also known bugs leading to SQL Server Mgmt Studio to continously recommendation indices that are already in place..... don't bet too much of your money on those index hints!

Will creating index help in this case

I'm still a learning user of SQL-SERVER2005.
Here is my table structure
CREATE TABLE [dbo].[Trn_PostingGroups](
[ControlGroup] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[PracticeCode] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ScanDate] [smalldatetime] NULL,
[DepositDate] [smalldatetime] NULL,
[NameOfFile] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DepositValue] [decimal](11, 2) NULL,
[RecordStatus] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_Trn_PostingGroups_1] PRIMARY KEY CLUSTERED
(
[ControlGroup] ASC,
[PracticeCode] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Scenario 1 : Suppose I have a query like this...
Select * from Trn_PostingGroups where PracticeCode = 'ABC'
Will indexing on Practice Code seperately help me in making my query faster??
Scenario 2 :
Select * from Trn_PostingGroups
where
ControlGroup = 12701
and PracticeCode = 'ABC'
and NameOfFile = 'FileName1'
Will indexing on NameOfFile seperately help me in making my query faster ??
If you were only selecting on the first field (ControlGroup), it is the primary sort of the clustered index and you wouldn't need to index the other field.
If you select on the other primary key fields, then adding a separate index on the other fields should help with such selects.
In general, you should index fields that are commonly used in SORT and WHERE clauses. This of course is over simplified.
See this article for more information about optimizing (statistics and query analyser).
You can only utilize one index per table per query (unless you consider self joins or CTEs). if you have multiple that can be used on the same table in the same query, then SQL Server will use statistics to determine which would be better to use.
In Scenario 1, if you create an index on PracticeCode alone, it will usually be used, as long as you have enough rows that a table scan costs more and that there is a diverse range of values in that column. An index will not be used if there are only a few rows in the table (it is faster to just look at them all). Also, an index will not be used if most of the values in that column are the same. It will not use the PK in this query, it would be like looking for a first name in the phone book, you can't use the index because it is last+first name. You might consider reversing your PK to PracticeCode+ControlGroup if you never search on ControlGroup by itself.
In Scenario 2, if you have an index on NameOfFile it will probably use the PK and ignore the NameOfFile index. Unless you make the NameOfFile index unique, and then it is a tossup. You might try to create an index (in addition to your PK) on ControlGroup+PracticeCode+NameOfFile. if you have many files per ControlGroup+PracticeCode, then it may select that index over the PK index.