SQL server query plan - sql

I have 3 tables as listed below
CREATE TABLE dbo.RootTransaction
(
TransactionID int CONSTRAINT [PK_RootTransaction] PRIMARY KEY NONCLUSTERED (TransactionID ASC)
)
GO
----------------------------------------------------------------------------------------------------
CREATE TABLE [dbo].[OrderDetails](
[OrderID] int identity(1,1) not null,
TransactionID int,
OrderDate datetime,
[Status] varchar(50)
CONSTRAINT [PK_OrderDetails] PRIMARY KEY CLUSTERED ([OrderID] ASC),
CONSTRAINT [FK_TransactionID] FOREIGN KEY ([TransactionID]) REFERENCES [dbo].[RootTransaction] ([TransactionID]),
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [ix_OrderDetails_TransactionID]
ON [dbo].[OrderDetails](TransactionID ASC, [OrderID] ASC);
GO
----------------------------------------------------------------------------------------------------
CREATE TABLE dbo.OrderItems
(
ItemID int identity(1,1) not null,
[OrderID] int,
[Name] VARCHAR (50) NOT NULL,
[Code] VARCHAR (9) NULL,
CONSTRAINT [PK_OrderItems] PRIMARY KEY NONCLUSTERED ([ItemID] ASC),
CONSTRAINT [FK_OrderID] FOREIGN KEY ([OrderID]) REFERENCES [dbo].[OrderDetails] ([OrderID])
)
Go
CREATE CLUSTERED INDEX OrderItems
ON [dbo].OrderItems([OrderID] ASC, ItemID ASC) WITH (FILLFACTOR = 90);
GO
CREATE NONCLUSTERED INDEX [IX_Code]
ON [dbo].[OrderItems]([Code] ASC) WITH (FILLFACTOR = 90)
----------------------------------------------------------------------------------------------------
Populated sample data in each table
select COUNT(*) from RootTransaction -- 45851
select COUNT(*) from [OrderDetails] -- 50201
select COUNT(*) from OrderItems --63850
-- Query 1
SELECT o.TransactionID
FROM [OrderDetails] o
JOIN dbo.OrderItems i ON o.OrderID = i.OrderID
WHERE i.Code like '1067461841%'
declare #SearchKeyword varchar(200) = '1067461841'
-- Query 2
SELECT o.TransactionID
FROM [OrderDetails] o
JOIN dbo.OrderItems i ON o.OrderID = i.OrderID
WHERE i.Code like #SearchKeyword + '%'
When running above 2 queries, I could see Query 1 use index seek on OrderDetails, OrderItems which is expected,
However in query 2, query plan use index seek on OrderItems but index scan on OrderDetails.
Only difference in two queries is using direct value vs variable in LIKE and both returns same result.
why the query execution plan change between using direct value vs variable?

I believe the issue is most likely explained through parameter sniffing. SQL Server often identifies and caches query plans for commonly used queries. As part of this caching, it "sniffs" the parameters you use on the most common queries to optimize the creation of the plan.
Query 1 shows a direct string, so SQL creates a specific plan. Query 2 uses an intermediate variable, which is one of the techniques that actually prevents parameter sniffing (often used to provide more predictable performance to stored procs or queries where the parameters have significant variance. These are considered 2 completely different queries to SQL despite the obvious similarities. The observed differences are essentially just optimization.
Furthermore, if your tables had different distributions of row counts, you'd likely potential differences from those 2 scenarios based on existing indexes and potential optimizations. On my server with no sample data loaded, the query 1 and query 2 had same execution plans since the optimizer couldn't find any better paths for the parameters.
For more info: http://blogs.technet.com/b/mdegre/archive/2012/03/19/what-is-parameter-sniffing.aspx

Below queries show similar plan though WHERE clause is different.
select Code from OrderItems WHERE Code like '6662225%'
declare #SearchKeyword varchar(200) = '6662225'
select Code from OrderItems WHERE Code like #SearchKeyword + '%'

The following post/answers offer a good explanation as to why performance is better with hard coded constants than variables, along with a few suggestions you could possibly try out:
Alternative to using local variables in a where clause

Related

SQL Server Indexing and Composite Keys

Given the following:
-- This table will have roughly 14 million records
CREATE TABLE IdMappings
(
Id int IDENTITY(1,1) NOT NULL,
OldId int NOT NULL,
NewId int NOT NULL,
RecordType varchar(80) NOT NULL, -- 15 distinct values, will never increase
Processed bit NOT NULL DEFAULT 0,
CONSTRAINT pk_IdMappings
PRIMARY KEY CLUSTERED (Id ASC)
)
CREATE UNIQUE INDEX ux_IdMappings_OldId ON IdMappings (OldId);
CREATE UNIQUE INDEX ux_IdMappings_NewId ON IdMappings (NewId);
and this is the most common query run against the table:
WHILE #firstBatchId <= #maxBatchId
BEGIN
-- the result of this is used to insert into another table:
SELECT
NewId, -- and lots of non-indexed columns from SOME_TABLE
FROM
IdMappings map
INNER JOIN
SOME_TABLE foo ON foo.Id = map.OldId
WHERE
map.Id BETWEEN #firstBatchId AND #lastBatchId
AND map.RecordType = #someRecordType
AND map.Processed = 0
-- We only really need this in case the user kills the binary or SQL Server service:
UPDATE IdMappings
SET Processed = 1
WHERE map.Id BETWEEN #firstBatchId AND #lastBatchId
AND map.RecordType = #someRecordType
SET #firstBatchId += 4999
SET #lastBatchId += 4999
END
What are the best indices to add? I figure Processed isn't worth indexing since it only has 2 values. Is it worth indexing RecordType since there are only about 15 distinct values? How many distinct values will a column likely have before we consider indexing it?
Is there any advantage in a composite key if some of the fields are in the WHERE and some are in a JOIN's ON condition? For example:
CREATE INDEX ix_IdMappings_RecordType_OldId
ON IdMappings (RecordType, OldId)
... if I wanted both these fields indexed (I'm not saying I do), does this composite key gain any advantage since both columns don't appear together in the same WHERE or same ON?
Insert time into IdMappings isn't really an issue. After we insert all records into the table, we don't need to do so again for months if ever.

Adding dummy where condition brings execution plan to seek

Could you please have a look at http://sqlfiddle.com/#!18/7ad28/8 and help me in understanding why adding a where condition will bring index on seek from scan? As per my (wrong) understanding, It should not have made any difference since its a greater then condition which should have caused scan.
I am also pasting table script and queries in question below
CREATE TABLE [dbo].[Mappings]
(
[MappingID] [smallint] NOT NULL IDENTITY(1, 1),
[ProductID] [smallint] NOT NULL,
[CategoryID] [smallint] NOT NULL
)
GO
ALTER TABLE [dbo].[Mappings] ADD CONSTRAINT [pk_Mappings_MappingID] PRIMARY KEY CLUSTERED ([MappingID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [nc_Mappings_ProductIDCategoryID] ON [dbo].[Mappings] ([ProductID], [CategoryID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CustomerProducts]
(
[CustomerID] [bigint] NOT NULL,
[ProductID] [smallint] NOT NULL,
[SomeDate] [datetimeoffset] (0) NULL,
[SomeAttribute] [bigint] NULL
)
GO
ALTER TABLE [dbo].[CustomerProducts] ADD CONSTRAINT [pk_CustomerProducts_ProductIDCustomerID] PRIMARY KEY CLUSTERED ([ProductID], [CustomerID]) ON [PRIMARY]
GO
--SCAN [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
Where b.CustomerID = 88;
--SEEK [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
AND b.CustomerID = 88
Where a.[ProductID] > 0;
"It should not have made any difference since its a greater then condition which should have caused scan."
You added an explicit predicate (ProductID > 0) so SQL Server chooses to seek on that value (0) then range scan. To see this, select the Index Seek on Mappings, open the Properties Tab, and look for Seek Predicates, and expand the entire tree of results. You'll see Start and applicable range scan attributes underneath.
So if you had real data (pretend you have ProductIDs from 1-100), and have a WHERE ProductID > 77. You'll seek in the B-Tree to ProductID 77, then RANGE SCAN the remainder of the non-clustered index.
Watch this: this'll help you visualize and understand what happens internally in different index operations (disclaimer: this is me presenting)
https://youtu.be/fDd4lw6DfqU?t=748
Here's what the plans look like:
Hovered in yellow is the information on the clustered index seek from table CustomerProducts. The seek predicate is set to the value of the condition [ProductID] > 0 which is perfectly reasonable as part of the join condition is a.[ProductID] = b.[ProductID] and also a.[ProductID] > 0 in the where clause. This means that b.[ProductID] > 0. As ProductID is the first column on the clustered index, any information that reduces the lookup can be used. The seek operation should be faster than the scan, so the optimizer will try to do that.

SQL Server index and poor execution plan

I have an existing SQL Server database where I cannot modify the structure or the queries ran and am facing an issue with poor execution impacting performance and ultimately cloud database cost.
Kindly note my experience with SQL is quite limited and after multiple googling and trial and errors, still did not achieve acceptable result. Any tips or help is much appreciated, thank you all in advance. If you would like me to provide more information, feel free to comment and I will update the post accordingly.
The issue
I have two tables: Table1 and Table2. Table2 references Table1 via TABLE1_ID field and we run a SQL query extracting info from Table2 while filtering on Table1 ( INNER JOIN I believe).
Using the following query:
DECLARE #P1 datetime
DECLARE #P2 datetime
SELECT
dbo.Table2.VALUE
FROM
dbo.Table2,
dbo.Table1
WHERE
-- joins Table1/Table2
dbo.Table1.ID = dbo.Table2.TABLE1_ID
-- filters on Table1
AND dbo.Table1.TIMESTAMP between #P1 and #P2
My understanding would be that the database engine would first filter on Table1 then do the join with Table2, however, the execution plan I am seeing is using a Merge Join implying Table2 is fully scanned then joined with filtered results from Table1.
What I have tried
I have tried the following, attempting to identify the problem or optimize performance:
Optimization attempt Creating an FK constraint
Optimization attempt Creating other indexes with/without include columns
Issue identification Changing the query to select value from both Table1 and Table2 and see the difference
Re-creating the issue
The following script could allow you to re-create the database structure (please note it will insert 1M records into both tables):
CREATE TABLE [dbo].[Table1] (
[ID] [decimal](10, 0) IDENTITY(1,1) NOT NULL,
[VALUE] [nchar](10) NULL,
[TIMESTAMP] [datetime] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Table1] ADD CONSTRAINT [DF_Table1_TIMESTAMP] DEFAULT (sysdatetime()) FOR [TIMESTAMP]
GO
CREATE UNIQUE CLUSTERED INDEX [IX_Table1_ID] ON [dbo].[Table1]
(
[ID] ASC
)
GO
CREATE NONCLUSTERED INDEX [IX_Table1_TIMESTAMP] ON [dbo].[Table1]
(
[TIMESTAMP] ASC
)
INCLUDE ([ID])
GO
CREATE TABLE [dbo].[Table2] (
[ID] [int] IDENTITY(1,1) NOT NULL,
[TABLE1_ID] [decimal](10, 0) NOT NULL,
[VALUE] [nchar](10) NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Table2_TABLE1_ID] ON [dbo].[Table2]
(
[TABLE1_ID] ASC
) INCLUDE ([VALUE])
GO
Declare #Id decimal(10,0) = 1
DECLARE #Now datetime = SYSDATETIME()
While #Id <= 1000000
Begin
Insert Into dbo.Table1 values ('T1_' + CAST(#Id as nvarchar(10)), DATEADD (ss, #Id, #Now))
Insert Into dbo.Table2 values (#Id, 'T2_' + CAST(#Id as nvarchar(10)))
Print #Id
Set #Id = #Id + 1
End
GO
Then you can try to run the following query:
DECLARE #P1 datetime
DECLARE #P2 datetime
SELECT
dbo.Table2.VALUE
FROM
dbo.Table2,
dbo.Table1
WHERE
dbo.Table1.ID = dbo.Table2.TABLE1_ID
AND dbo.Table1.TIMESTAMP between #P1 and #P2
My understanding would be that the database engine would first filter on Table1 then do the join with Table2,
Wrong. SQL is a descriptive language, not a procedural language. A SQL query describes the result set, not the methods used for creating it.
The SQL parser and optimizer are responsible for generating the execution plan. The only requirement is that the results from the execution plan match the results described by the query.
If you want to control the execution plan, then SQL Server offers hints, so you can require a nested loop join. In general, such hints are used to avoid nested loop joins.
Actually, your query is reading the index. This is a more efficient way of "filtering" the data than actually reading the data and filtering. This looks like an optimal execution plan.
Further, don't use commas in the FROM clause. Use proper, explicit, standard, readable JOIN syntax.

Optimize a stored procedure that accepts table parameters as filters against a view

I'm looking for an efficient way to filter a view with optional table parameters.
Examples are best so here is a sample situation:
-- database would contain a view that I want to be able to filter
CREATE VIEW [dbo].[MyView]
AS
BEGIN
-- maybe 20-40 columns
SELECT Column1, Column2, Column3, ...
END
I have user-defined table types like so:
-- single id table for joining purposes (passed from code)
CREATE TYPE [dbo].[SingleIdTable] AS TABLE (
[Id] INT NOT NULL,
PRIMARY KEY CLUSTERED ([Id] ASC) WITH (IGNORE_DUP_KEY = OFF));
-- double id table for joining purposes (passed from code)
CREATE TYPE [dbo].[DoubleIdTable] AS TABLE (
[Id1] INT NOT NULL,
[Id2] INT NOT NULL,
PRIMARY KEY CLUSTERED ([Id1] ASC, [Id2] ASC) WITH (IGNORE_DUP_KEY = OFF));
And I want to create a stored procedure that basically looks like this:
CREATE PROCEDURE [dbo].[FilterMyView]
#Parameter1 dbo.SingleIdTable READONLY,
#Parameter2 dbo.DoubleIdTable READONLY,
#Parameter3 dbo.SingleIdTable READONLY
AS
BEGIN
SELECT *
FROM MyView
INNER-JOIN-IF-NOT-EMPTY #Parameter1 p1 ON p1.Id = MyView.Column1 AND
INNER-JOIN-IF-NOT-EMPTY #Parameter2 p2 ON p2.Id1 = MyView.Column5 AND
p2.Id2 = MyView.Column6 AND
INNER-JOIN-IF-NOT-EMPTY #Parameter3 p3 ON p3.Id = MyView.Column8
END
Now I believe I can do this with WHERE EXISTS but I want to make sure that I am doing this in the most efficient way for the SQL engine. I've always personally felt that the INNER JOIN semantic creates the most optimized execution plans, but I don't actually know.
I also know that I can do this using dynamic SQL, but I always leave this as a last option.

Query against 250k rows taking 53 seconds

The box this query is running on is a dedicated server running in a datacenter.
AMD Opteron 1354 Quad-Core 2.20GHz
2GB of RAM
Windows Server 2008 x64 (Yes I know I only have 2GB of RAM, I'm upgrading to 8GB when the project goes live).
So I went through and created 250,000 dummy rows in a table to really stress test some queries that LINQ to SQL generates and make sure they're not to terrible and I noticed one of them was taking an absurd amount of time.
I had this query down to 17 seconds with indexes but I removed them for the sake of this answer to go from start to finish. Only indexes are Primary Keys.
Stories table --
[ID] [int] IDENTITY(1,1) NOT NULL,
[UserID] [int] NOT NULL,
[CategoryID] [int] NOT NULL,
[VoteCount] [int] NOT NULL,
[CommentCount] [int] NOT NULL,
[Title] [nvarchar](96) NOT NULL,
[Description] [nvarchar](1024) NOT NULL,
[CreatedAt] [datetime] NOT NULL,
[UniqueName] [nvarchar](96) NOT NULL,
[Url] [nvarchar](512) NOT NULL,
[LastActivityAt] [datetime] NOT NULL,
Categories table --
[ID] [int] IDENTITY(1,1) NOT NULL,
[ShortName] [nvarchar](8) NOT NULL,
[Name] [nvarchar](64) NOT NULL,
Users table --
[ID] [int] IDENTITY(1,1) NOT NULL,
[Username] [nvarchar](32) NOT NULL,
[Password] [nvarchar](64) NOT NULL,
[Email] [nvarchar](320) NOT NULL,
[CreatedAt] [datetime] NOT NULL,
[LastActivityAt] [datetime] NOT NULL,
Currently in the database there is 1 user, 1 category and 250,000 stories and I tried to run this query.
SELECT TOP(10) *
FROM Stories
INNER JOIN Categories ON Categories.ID = Stories.CategoryID
INNER JOIN Users ON Users.ID = Stories.UserID
ORDER BY Stories.LastActivityAt
Query takes 52 seconds to run, CPU usage hovers at 2-3%, Membery is 1.1GB, 900MB free but the Disk usage seems out of control. It's # 100MB/sec with 2/3 of that being writes to tempdb.mdf and the rest is reading from tempdb.mdf.
Now for the interesting part...
SELECT TOP(10) *
FROM Stories
INNER JOIN Categories ON Categories.ID = Stories.CategoryID
INNER JOIN Users ON Users.ID = Stories.UserID
SELECT TOP(10) *
FROM Stories
INNER JOIN Users ON Users.ID = Stories.UserID
ORDER BY Stories.LastActivityAt
SELECT TOP(10) *
FROM Stories
INNER JOIN Categories ON Categories.ID = Stories.CategoryID
ORDER BY Stories.LastActivityAt
All 3 of these queries are pretty much instant.
Exec plan for first query.
http://i43.tinypic.com/xp6gi1.png
Exec plans for other 3 queries (in order).
http://i43.tinypic.com/30124bp.png
http://i44.tinypic.com/13yjml1.png
http://i43.tinypic.com/33ue7fb.png
Any help would be much appreciated.
Exec plan after adding indexes (down to 17 seconds again).
http://i39.tinypic.com/2008ytx.png
I've gotten a lot of helpful feedback from everyone and I thank you, I tried a new angle at this. I query the stories I need, then in separate queries get the Categories and Users and with 3 queries it only took 250ms... I don't understand the issue but if it works and at 250ms no less for the time being I'll stick with that. Here's the code I used to test this.
DBDataContext db = new DBDataContext();
Console.ReadLine();
Stopwatch sw = Stopwatch.StartNew();
var stories = db.Stories.OrderBy(s => s.LastActivityAt).Take(10).ToList();
var storyIDs = stories.Select(c => c.ID);
var categories = db.Categories.Where(c => storyIDs.Contains(c.ID)).ToList();
var users = db.Users.Where(u => storyIDs.Contains(u.ID)).ToList();
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
Try adding an index on Stories.LastActivityAt. I think the clustered index scan in the execution plan may be due to the sorting.
Edit:
Since my query returned in an instant with rows just a few bytes long, but has been running for 5 minutes already and is still going after I added a 2K varchar, I think Mitch has a point. It is the volume of that data that is shuffled around for nothing, but this can be fixed in the query.
Try putting the join, sort and top(10) in a view or in a nested query, and then join back against the story table to get the rest of the data just for the 10 rows that you need.
Like this:
select * from
(
SELECT TOP(10) id, categoryID, userID
FROM Stories
ORDER BY Stories.LastActivityAt
) s
INNER JOIN Stories ON Stories.ID = s.id
INNER JOIN Categories ON Categories.ID = s.CategoryID
INNER JOIN Users ON Users.ID = s.UserID
If you have an index on LastActivityAt, this should run very fast.
So if I read the first part correctly, it responds in 17 seconds with an index. Which is still a while to chug out 10 records. I'm thinking the time is in the order by clause. I would want an index on LastActivityAt, UserID, CategoryID. Just for fun, remove the order by and see if it returns the 10 records quickly. If it does, then you know it is not in the joins to the other tables. Also it would be helpful to replace the * with the columns needed as all 3 table columns are in the tempdb as you are sorting - as Neil mentioned.
Looking at the execution plans you'll notice the extra sort - I believe that is the order by which is going to take some time. I'm assuming you had an index with the 3 and it was 17 seconds... so you may want one index for the join criteria (userid, categoryID) and another for lastactivityat - see if that performs better. Also it would be good to run the query thru the index tuning wizard.
My first suggestion is to remove the *, and replace it with the minimum columns you need.
second, is there a trigger involved? Something that would update the LastActivityAt field?
Based on your problem query, try add a combination index on table Stories (CategoryID, UserID, LastActivityAt)
You are maxing out the Disks in your hardware setup.
Given your comments about your Data/Log/tempDB File placement, I think any amount of tuning is going to be a bandaid.
250,000 Rows is small. Imagine how bad your problems are going to be with 10 million rows.
I suggest you move tempDB onto its own physical drive (preferable a RAID 0).
Ok, so my test machine isn't fast. Actually it's really slow. It 1.6 ghz,n 1 gb of ram, No multiple disks, just a single (read slow) disk for sql server, os, and extras.
I created your tables with primary and foreign keys defined.
Inserted 2 categories, 500 random users, and 250000 random stories.
Running the first query above takes 16 seconds (no plan cache either).
If I index the LastActivityAt column I get results in under a second (no plan cache here either).
Here's the script I used to do all of this.
--Categories table --
Create table Categories (
[ID] [int] IDENTITY(1,1) primary key NOT NULL,
[ShortName] [nvarchar](8) NOT NULL,
[Name] [nvarchar](64) NOT NULL)
--Users table --
Create table Users(
[ID] [int] IDENTITY(1,1) primary key NOT NULL,
[Username] [nvarchar](32) NOT NULL,
[Password] [nvarchar](64) NOT NULL,
[Email] [nvarchar](320) NOT NULL,
[CreatedAt] [datetime] NOT NULL,
[LastActivityAt] [datetime] NOT NULL
)
go
-- Stories table --
Create table Stories(
[ID] [int] IDENTITY(1,1) primary key NOT NULL,
[UserID] [int] NOT NULL references Users ,
[CategoryID] [int] NOT NULL references Categories,
[VoteCount] [int] NOT NULL,
[CommentCount] [int] NOT NULL,
[Title] [nvarchar](96) NOT NULL,
[Description] [nvarchar](1024) NOT NULL,
[CreatedAt] [datetime] NOT NULL,
[UniqueName] [nvarchar](96) NOT NULL,
[Url] [nvarchar](512) NOT NULL,
[LastActivityAt] [datetime] NOT NULL)
Insert into Categories (ShortName, Name)
Values ('cat1', 'Test Category One')
Insert into Categories (ShortName, Name)
Values ('cat2', 'Test Category Two')
--Dummy Users
Insert into Users
Select top 500
UserName=left(SO.name+SC.name, 32)
, Password=left(reverse(SC.name+SO.name), 64)
, Email=Left(SO.name, 128)+'#'+left(SC.name, 123)+'.com'
, CreatedAt='1899-12-31'
, LastActivityAt=GETDATE()
from sysobjects SO
Inner Join syscolumns SC on SO.id=SC.id
go
--dummy stories!
-- A Count is given every 10000 record inserts (could be faster)
-- RBAR method!
set nocount on
Declare #count as bigint
Set #count = 0
begin transaction
while #count<=250000
begin
Insert into Stories
Select
USERID=floor(((500 + 1) - 1) * RAND() + 1)
, CategoryID=floor(((2 + 1) - 1) * RAND() + 1)
, votecount=floor(((10 + 1) - 1) * RAND() + 1)
, commentcount=floor(((8 + 1) - 1) * RAND() + 1)
, Title=Cast(NEWID() as VARCHAR(36))+Cast(NEWID() as VARCHAR(36))
, Description=Cast(NEWID() as VARCHAR(36))+Cast(NEWID() as VARCHAR(36))+Cast(NEWID() as VARCHAR(36))
, CreatedAt='1899-12-31'
, UniqueName=Cast(NEWID() as VARCHAR(36))+Cast(NEWID() as VARCHAR(36))
, Url=Cast(NEWID() as VARCHAR(36))+Cast(NEWID() as VARCHAR(36))
, LastActivityAt=Dateadd(day, -floor(((600 + 1) - 1) * RAND() + 1), GETDATE())
If #count % 10000=0
Begin
Print #count
Commit
begin transaction
End
Set #count=#count+1
end
set nocount off
go
--returns in 16 seconds
DBCC DROPCLEANBUFFERS
SELECT TOP(10) *
FROM Stories
INNER JOIN Categories ON Categories.ID = Stories.CategoryID
INNER JOIN Users ON Users.ID = Stories.UserID
ORDER BY Stories.LastActivityAt
go
--Now create an index
Create index IX_LastADate on Stories (LastActivityAt asc)
go
--With an index returns in less than a second
DBCC DROPCLEANBUFFERS
SELECT TOP(10) *
FROM Stories
INNER JOIN Categories ON Categories.ID = Stories.CategoryID
INNER JOIN Users ON Users.ID = Stories.UserID
ORDER BY Stories.LastActivityAt
go
The sort is definitely where your slow down is occuring.
Sorting mainly gets done in the tempdb and a large table will cause LOTS to be added.
Having an index on this column will definitely improve performance on an order by.
Also, defining your Primary and Foreign Keys helps SQL Server immensly
Your method that is listed in your code is elegant, and basically the same response that cdonner wrote except in c# and not sql. Tuning the db will probably give even better results!
--Kris
Have you cleared the SQL Server cache before running each of the query?
In SQL 2000, it's something like DBCC DROPCLEANBUFFERS. Google the command for more info.
Looking at the query, I would have an index for
Categories.ID
Stories.CategoryID
Users.ID
Stories.UserID
and possibly
Stories.LastActivityAt
But yeah, sounds like the result could be bogus 'cos of caching.
When you have worked with SQL Server for some time, you will discover that even the smallest changes to a query can cause wildly different response times. From what I have read in the initial question, and looking at the query plan, I suspect that the optimizer has decided that the best approach is to form a partial result and then sort that as a separate step. The partial result is a composite of the Users and Stories tables. This is formed in tempdb. So the excessive disk access is due to the forming and then sorting of this temporary table.
I concur that the solution should be to create a compound index on Stories.LastActivityAt, Stories.UserId, Stories.CategoryId. The order is VERY important, the field LastActivityAt must be first.