SQL Server Index Usage with an Order By - sql

I have a table named Workflow. There are 38M rows in the table. There is a PK on the following columns:
ID: Identity Int
ReadTime: dateTime
If I perform the following query, the PK is not used. The query plan shows an index scan being performed on one of the nonclustered indexes plus a sort. It takes a very long time with 38M rows.
Select TOP 100 ID From Workflow
Where ID > 1000
Order By ID
However, if I perform this query, a nonclustered index (on LastModifiedTime) is used. The query plan shows an index seek being performed. The query is very fast.
Select TOP 100 * From Workflow
Where LastModifiedTime > '6/12/2010'
Order By LastModifiedTime
So, my question is this. Why isn't the PK used in the first query, but the nonclustered index in the second query is used?

Without being able to fish around in your database, there are a few things that come to my mind.
Are you certain that the PK is (id, ReadTime) as opposed to (ReadTime, id)?
What execution plan does SELECT MAX(id) FROM WorkFlow yield?
What about if you create an index on (id, ReadTime) and then retry the test, or your query?

Since Id is an identity column, having ReadTime participate in the index is superfluous. The clustered key already points to the leaf data. I recommended you modify your indexes
CREATE TABLE Workflow
(
Id int IDENTITY,
ReadTime datetime,
-- ... other columns,
CONSTRAINT PK_WorkFlow
PRIMARY KEY CLUSTERED
(
Id
)
)
CREATE INDEX idx_LastModifiedTime
ON WorkFlow
(
LastModifiedTime
)
Also, check that statistics are up to date.
Finally, If there are 38 million rows in this table, then the optimizer may conclude that specifying criteria > 1000 on a unique column is non selective, because > 99.997% of the Ids are > 1000 (if your identity seed started at 1). In order for an index to considered helpful, the optimizer must conclude that < 5% of the records would be selected. You can use an index hint to force the issue (as already stated by Dan Andrews). What is the structure of the non-clustered index that was scanned?

Related

MS SQL: Performance for querying ID descending

This question relates to a table in Microsoft SQL Server which is usually queried with ORDER BY Id DESC.
Would there be a performance benefit from setting the primary key to PRIMARY KEY CLUSTERED (Id DESC)? Or would there be a need for an index? Or is it as fast as it gets without any of it?
Table:
CREATE TABLE [dbo].[Items] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[Category] INT NOT NULL,
[Name] NVARCHAR(255) NULL,
CONSTRAINT [PK_Items] PRIMARY KEY CLUSTERED ([Id] ASC)
)
Query:
SELECT TOP 1 * FROM [dbo].[Items]
WHERE Catgory = 123
ORDER BY [Id] DESC
Would there be a performance benefit from setting the primary key to PRIMARY KEY
CLUSTERED (Id DESC)?
Given as you show is: IT DEPENDS.
The filter is on Category = 123. To find all entries of Category 123, because there is NO INDEX defined, the server has to do a table scan. Unless you havea VERY large result set, and / or some awfully comically bad configured tempdb and very low memory (because disc is only used when running out of memory for tempdb) the sorting of hte result will be irrelevant compared to the table scan.
You are literally following the wrong tail. You are WAY more likely to speed up the query by adding a non-unique index to Cateogory so that the query can prefilter the data fast based on your query condition.
If you would analzy the query plan for this query (which you should - technically we should not even ANSWER this quesstion without you showing SOME effort, and a look at the query plan is like the FIRST thing you do) you would very likely see that the time is spent on on the query, NOT the result sort.
Creating an index in asc or desc order does not make a big difference in “ORDER BY” when there is only one column, but when there is a need to sort data in two different directions one column in ascending order and the other column in descending order the way the index is created does make a big difference.
Look this article that do many example:
https://www.mssqltips.com/sqlservertip/1337/building-sql-server-indexes-in-ascending-vs-descending-order/
In your scenario I advise you to create an index on Category Column without include “Id” because the clustered index is always included in non-clustered index.
There is no difference according to the following
I'd suggest defining an index on (category, id desc).
It will give you best performance for your query.
As others have indicated, an index on Category (assuming you don't have one) is the biggest performance boost possible here.
But as for your actual question. For a single order by query like you have, it does not matter if the query/index is ordered by desc or asc as far as performance goes. SQL Server can swap those easily (starting a the beginning or the end of the data structure)
Where performance becomes an issue for performance is when you:
Have more than order by column
Your index has more than one column
Your order by is opposing the order on the index.
So, say your Primary Key had ID asc and Category asc, and then you query by ID asc and Category desc. Then SQL Server can't use the order on the index to do the search.
There are a few caveats and gotchas. After searching a bit, this answer seems to have them listed:
SQL Server indexes - ascending or descending, what difference does it make?

Execution plan showing missing non-clustered index on already partitioned clustered indexes

We have a query where the table is partitioned on column Adate.
Row count: 56595943, partition scheme - yearly, no of partitions - 300
Clustered index columns : empid, Adate
Query :
select top 1 Adate
from emp
where empid = 134556 and Adate <= {ts '7485-09-01 00:00:00.0'}
order by Adate desc
The actual execution plan returns a clustered index seek operation with 93% of the total query cost on clustered index key.
But why is the optimizer recommending a missing index with 92% of cost?
missing index details: Improve query cost:92%
create nonclustered index IDX_NC on dbo.emp([empid], [Adate])
The missing index has an improvement measure of 14755268, as per Microsoft the improvement measure baseline is 1,000,000
Why is this happening? Do you recommend to have a nonclustered index on already clustered index columns?
Well - consider this:
you do have the clustered index on (empid, adate)
the clustered index contains the whole data, e.g. the leaf level pages of the clustered index contain the whole data records (all the columns in your table)
If you are searching and the query uses the clustered index, it might still need to load much more data than is actually needed.... the whole record, as many times as your criteria is found.
If you have a non-clustered index on just (empid, Adate), and your query really only requires Adate (in its SELECT list of columns), then this index will be much smaller - it contains only those two columns (none of the overhead of all the other columns, which are not needed for your current query). So scanning this index, or loading these index pages, will load much less data compared to the clustered index.
From that point of view, yes, even having a nonclustered index on the same columns that make up your clustered index can be beneficial for certain query scenarios - that's probably what the SQL Server query optimizer picks up here.

How SQL Server indexing can help using BETWEEN for given query

Does index helps in BETWEEN clause in SQL Server?
If I have table with 20000 rows and query is:
select *
from employee
where empid between 10001 and 20000
Yes. This is sargable.
It can do a range seek on an index with leading column empid. Navigating the B-tree to find the first row >= 10001 and then reading all rows in key order until the end of the range is reached.
You might not get this plan unless the index is covering though. Your query has select * so an index that only contains empid may potentially need to do 10,000 lookups to get the missing columns.
If empid is the primary key of employee then by default it will be the clustered index key (unless you specified otherwise) so this will automatically be covering and you should expect to see a clustered index seek.

If I have a single nonclustered index on a table, will the number of columns I include change the slow down when writing to it?

On the exact same table, if I was to put one index on it, either:
CREATE INDEX ix_single ON MyTable (uid asc) include (columnone)
or:
CREATE INDEX ix_multi ON MyTable (uid asc) include (
columnone,
columntwo,
columnthree,
....
columnX
)
Would the second index cause an even greater lag on how long it takes to write to the table than the first one? And why?
Included columns will need more diskspace as well as time on data manipulation...
If there is a clustered index on this table too (ideally on a implicitly sorted column like an IDENTITY column to avoid fragmentation) this will serve as fast lookup on all columns (but you must create the clustered index before the other one...)
To include columns into an index is a usefull approach in extremely performance related issues only...

Correct SQL index for Partition + Order to remove SORT

I have a SQL Statement which i am trying to optimise to remove the sort operator
SELECT *,ROW_NUMBER() OVER (
PARTITION BY RuleInstanceId
ORDER BY [Timestamp] DESC
) AS rn
FROM RuleInstanceHistoricalMembership
Everything I have read (eg. Optimizing SQL queries by removing Sort operator in Execution plan) suggests this is the correct index to add however it appears to have no effect at all.
CREATE NONCLUSTERED INDEX IX_MyIndex ON dbo.[RuleInstanceHistoricalMembership](RuleInstanceId, [Timestamp] DESC)
I must be missing something as I have read heaps of articles which all seem to sugguest an index spanning both columns should solve this issue
Technically the index you have added does allow you to avoid a sort.
However the index you have created is non covering so SQL Server would then also need to perform 60 million key lookups back to the base table.
Simply scanning the clustered index and sorting it on the fly is costed as being considerably cheaper than that option.
In order to get the index to be used automatically you would need to either.
Remove columns from the query SELECT list so the index covers it.
Add INCLUDE-d columns to the index.
BTW: For a table with 60 million rows you may well find that even if you were to try and force the issue with an index hint on the non covering index you still don't get the desired results of avoiding a sort.
CREATE TABLE RuleInstanceHistoricalMembership
(
ID INT PRIMARY KEY,
Col2 INT,
Col3 INT,
RuleInstanceId INT,
[Timestamp] INT
)
CREATE NONCLUSTERED INDEX IX_MyIndex
ON dbo.[RuleInstanceHistoricalMembership](RuleInstanceId, [Timestamp] DESC)
/*Fake small table*/
UPDATE STATISTICS RuleInstanceHistoricalMembership
WITH ROWCOUNT = 600,
PAGECOUNT = 10
SELECT *,
ROW_NUMBER() OVER ( PARTITION BY RuleInstanceId
ORDER BY [Timestamp] DESC ) AS rn
FROM RuleInstanceHistoricalMembership WITH (INDEX = IX_MyIndex)
Gives the plan
With no sort but up the row and page count
/*Fake large table*/
UPDATE STATISTICS RuleInstanceHistoricalMembership
WITH ROWCOUNT = 60000000,
PAGECOUNT = 10000000
And try again and you get
Now it has two sorts!
The scan on the NCI is in RuleInstanceId, Timestamp DESC order but then SQL Server reorders it into clustered index key order (Id ASC) per Optimizing I/O Performance by Sorting.
This step is to try and reduce the expected massive cost of 60 million random lookups into the clustered index. Then it gets sorted back into the original RuleInstanceId, Timestamp DESC order that the index delivered it in.