Optimizing slow query with multiple withs - sql

I have the following query:
with matched_urls as
(
select l.Url, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.Url
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = 3062
group by l.Url
),
all_urls as
(
select l.Url, 5 as Filter
from landingpages l
where 'iylsuqnzukwv0milinztea' in (select domainid
from domainlandingpages dlp
where l.Url = dlp.landingpageid)
and l.Url not in (select Url from matched_urls)
union
select * from matched_urls
)
select l.*
from landingpages l
join all_urls u on l.Url = u.Url
order by u.Filter asc
offset 0 rows fetch next 30 rows only
These are the tables used in the query:
And this is the DDL for the tables:
CREATE TABLE [dbo].[LandingPages]
(
[Url] [nvarchar](448) NOT NULL,
[LastUpdated] [datetime2](7) NOT NULL,
CONSTRAINT [PK_LandingPages]
PRIMARY KEY CLUSTERED ([Url] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DomainLandingPages]
(
[LandingPageId] [nvarchar](448) NOT NULL,
[DomainId] [nvarchar](128) NOT NULL,
CONSTRAINT [PK_DomainLandingPages]
PRIMARY KEY CLUSTERED ([DomainId] ASC, [LandingPageId] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DomainLandingPages] WITH CHECK
ADD CONSTRAINT [FK_DomainLandingPages_Domains_DomainId]
FOREIGN KEY([DomainId]) REFERENCES [dbo].[Domains] ([Id])
GO
ALTER TABLE [dbo].[DomainLandingPages] CHECK CONSTRAINT [FK_DomainLandingPages_Domains_DomainId]
GO
ALTER TABLE [dbo].[DomainLandingPages] WITH CHECK
ADD CONSTRAINT [FK_DomainLandingPages_LandingPages_LandingPageId]
FOREIGN KEY([LandingPageId]) REFERENCES [dbo].[LandingPages] ([Url])
GO
ALTER TABLE [dbo].[DomainLandingPages] CHECK CONSTRAINT [FK_DomainLandingPages_LandingPages_LandingPageId]
GO
CREATE TABLE [dbo].[UrlFilters]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[GroupId] [int] NOT NULL,
[UrlFilterType] [int] NOT NULL,
[Filter] [nvarchar](max) NOT NULL,
CONSTRAINT [PK_UrlFilters]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[UrlFilters] WITH CHECK
ADD CONSTRAINT [FK_UrlFilters_Groups_GroupId]
FOREIGN KEY([GroupId]) REFERENCES [dbo].[Groups] ([Id])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[UrlFilters] CHECK CONSTRAINT [FK_UrlFilters_Groups_GroupId]
GO
CREATE TABLE [dbo].[LandingPageUrlFilters]
(
[LandingPageId] [nvarchar](448) NOT NULL,
[UrlFilterId] [int] NOT NULL,
CONSTRAINT [PK_LandingPageUrlFilters]
PRIMARY KEY CLUSTERED ([LandingPageId] ASC, [UrlFilterId] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] WITH CHECK
ADD CONSTRAINT [FK_LandingPageUrlFilters_LandingPages_LandingPageId]
FOREIGN KEY([LandingPageId]) REFERENCES [dbo].[LandingPages] ([Url])
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] CHECK CONSTRAINT [FK_LandingPageUrlFilters_LandingPages_LandingPageId]
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] WITH CHECK
ADD CONSTRAINT [FK_LandingPageUrlFilters_UrlFilters_UrlFilterId]
FOREIGN KEY([UrlFilterId]) REFERENCES [dbo].[UrlFilters] ([Id])
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] CHECK CONSTRAINT [FK_LandingPageUrlFilters_UrlFilters_UrlFilterId]
GO
Here is the execution plan:
https://www.brentozar.com/pastetheplan/?id=H1tHt5pvP
The query is pulling all urls for a given domain and are then supposed to be ordered by UrlFilterType - however not all landing pages have a match, hence the two with clauses.
As far as I can see from the execution plan it's mainly doing index seeks, so I think I have the right indexes. However, the query takes very long to execute, so I hope there might be a smarter way of doing this.
Any input will be greatly appreciated!

First up, your statistics look wildly out of line. Estimated 3,700 rows, actual 219,000. That suggests, at the very least, a statistics update will possibly change the choices the optimizer is making. Because of those row estimates, the optimizer is choosing a seek and a nested loops operation where, based on the data distribution, you're reading 1/3 of the table, 200k rows of a 600k table. A scan here, probably with a hash join, would be more efficient.
The query itself isn't giving much to filter the [aarhus-cluster-onesearch-staging].[dbo].[LandingPages].[PK_LandingPages] table on. So it's pulling 200k rows in order to filter them down to 30. If you can find a way to additional filtering there, you should see a performance improvement.

OK, the big hit is at the bottom of that query plan - where it's reading from LandingPageUrlFilters where the URLfilterID comes from URLfilters.
It's getting completely the wrong estimates (out by 70x) and then sorts your URLs taking 30 seconds or so.
If you run the first CTE on its own, I think it will take a long time. That's what you need to optimise.
select l.Url, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.Url
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = 3062
group by l.Url
Suggestions
First thing to try is to also add a nonclustered index on LandingPageUrlFilters in the opposite order to your clustered index (e.g., CREATE NONCLUSTERED INDEX myindex ON LandingPageUrlFilters ([UrlFilterId] ASC, [LandingPageId] ASC). Note this will make a full copy of that table which may be rather large. It appears you already have a non-clustered index like this (based on the fact it's referring to IX_LandingPageUrlFilters_UrlFilterId)
The sort on the string field that is nvarchar(448) - which is actually close to 900 bytes per row - will take a much bigger memory grant. Consider adding ID int values as primary keys - it will require less memory and therefore less likely to spill to disk
Consider, instead of a CTE, create a temporary table (with appropriate PKs) of the section and LandingPageURLFilters and urlfilters. However, you will still need to do a sort when inserting these - which is likely to take just as long.
Part of the problem (another 10s or so) is a nested loop join to LandingPages. It was expecting less than 4000 rows (so a nested loop is OK) but had to do 220,000 loops. If necessary, consider a join hint (e.g., INNER HASH JOIN rather than INNER JOIN). However, it appears that landingpages isn't actually required in that query - just remove the table landingpages from the CTE, and use landingpageurlfilters.landingpageid
e.g.,
select lpf.landingpageid AS [Url], min(f.urlfiltertype) as Filter
from landingpageurlfilters lpf
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = 3062
group by lpf.landingpageid
I think that gives the same results as the CTE I copied above.

Related

Adding dummy where condition brings execution plan to seek

Could you please have a look at http://sqlfiddle.com/#!18/7ad28/8 and help me in understanding why adding a where condition will bring index on seek from scan? As per my (wrong) understanding, It should not have made any difference since its a greater then condition which should have caused scan.
I am also pasting table script and queries in question below
CREATE TABLE [dbo].[Mappings]
(
[MappingID] [smallint] NOT NULL IDENTITY(1, 1),
[ProductID] [smallint] NOT NULL,
[CategoryID] [smallint] NOT NULL
)
GO
ALTER TABLE [dbo].[Mappings] ADD CONSTRAINT [pk_Mappings_MappingID] PRIMARY KEY CLUSTERED ([MappingID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [nc_Mappings_ProductIDCategoryID] ON [dbo].[Mappings] ([ProductID], [CategoryID]) WITH (DATA_COMPRESSION = PAGE) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CustomerProducts]
(
[CustomerID] [bigint] NOT NULL,
[ProductID] [smallint] NOT NULL,
[SomeDate] [datetimeoffset] (0) NULL,
[SomeAttribute] [bigint] NULL
)
GO
ALTER TABLE [dbo].[CustomerProducts] ADD CONSTRAINT [pk_CustomerProducts_ProductIDCustomerID] PRIMARY KEY CLUSTERED ([ProductID], [CustomerID]) ON [PRIMARY]
GO
--SCAN [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
Where b.CustomerID = 88;
--SEEK [tempdb].[dbo].[Mappings].[nc_Mappings_ProductIDCategoryID].(NonClustered)
SELECT b.[SomeDate],
b.[SomeAttribute]
FROM dbo.[Mappings] a
INNER JOIN dbo.CustomerProducts b
ON a.[ProductID] = b.[ProductID]
AND b.CustomerID = 88
Where a.[ProductID] > 0;
"It should not have made any difference since its a greater then condition which should have caused scan."
You added an explicit predicate (ProductID > 0) so SQL Server chooses to seek on that value (0) then range scan. To see this, select the Index Seek on Mappings, open the Properties Tab, and look for Seek Predicates, and expand the entire tree of results. You'll see Start and applicable range scan attributes underneath.
So if you had real data (pretend you have ProductIDs from 1-100), and have a WHERE ProductID > 77. You'll seek in the B-Tree to ProductID 77, then RANGE SCAN the remainder of the non-clustered index.
Watch this: this'll help you visualize and understand what happens internally in different index operations (disclaimer: this is me presenting)
https://youtu.be/fDd4lw6DfqU?t=748
Here's what the plans look like:
Hovered in yellow is the information on the clustered index seek from table CustomerProducts. The seek predicate is set to the value of the condition [ProductID] > 0 which is perfectly reasonable as part of the join condition is a.[ProductID] = b.[ProductID] and also a.[ProductID] > 0 in the where clause. This means that b.[ProductID] > 0. As ProductID is the first column on the clustered index, any information that reduces the lookup can be used. The seek operation should be faster than the scan, so the optimizer will try to do that.

How to accommodate Azure SQL query that runs infrequently but is very resource intensive

NOTE: I give details of my Azure setup here, but I'm not sure the solution will be an Azure-based one. This may be a problem that can be resolved at the C#, Entity Framework, or SQL level.
I have a .NET web application running on Azure App Service using Entity Framework to access an Azure SQL DB at pricing level Standard S1 (20 DTU). 99% of the time, the app is utilizing less than 1% of DTU on the SQL DB. However, when someone logs into the Admin Portal of the app and runs a particular report, it executes a query that is very resource intensive and takes a very long time - over a minute - which we can't live with. This report is run only a few times a week. I've tried scaling the SQL DB up and have found - unsurprisingly - that at higher plans, the execution time gets to a somewhat reasonable level. At Standard S4 (200 DTU), the execution time drops to 20 seconds, which is not ideal but I can live with for now. However, it doesn't make sense to pay for S4-level when 99% of the time it will be using only a fraction of a percent of DTU. Any ideas on how I can either reduce the query execution time or only scale when needed?
The Entity Framework code used for this report is:
class MyAppModelContainer : DbContext
{
public virtual ObjectResult<GetOrganizationList_Result> GetOrganizationList()
{
return ((IObjectContextAdapter)this).ObjectContext.ExecuteFunction<GetOrganizationList_Result>("GetOrganizationList");
}
}
The model used to retrieve the results is:
public partial class GetOrganizationList_Result
{
public int id { get; set; }
public string Name { get; set; }
public Nullable<int> DeviceCounts { get; set; }
public Nullable<int> EmailCounts { get; set; }
}
The stored procedure is:
CREATE PROCEDURE [dbo].[GetOrganizationList]
AS
BEGIN
SELECT o.Id,o.Name,COUNT(distinct s.DeviceId) as DeviceCounts, COUNT(distinct d.userid) as EmailCounts
FROM Sessions s
INNER JOIN Devices d on d.Id = s.DeviceId
RIGHT OUTER JOIN Organizations o on o.id=s.OrganizationId
GROUP BY o.Id,Name
END
The approximate number of rows in each of the joined tables:
Sessions table: 2 million rows
Devices table: 166,000 rows
Users table: 88,000 rows
Here are the table definitions and indexes:
CREATE TABLE [dbo].[Sessions] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[DeviceId] INT NULL,
[StartTime] DATETIME NOT NULL,
[OrganizationId] INT NOT NULL,
CONSTRAINT [PK_Sessions] PRIMARY KEY CLUSTERED ([Id] ASC),
CONSTRAINT [FK_DeviceSession] FOREIGN KEY ([DeviceId]) REFERENCES [dbo].[Devices] ([Id]),
CONSTRAINT [FK_OrganizationSession] FOREIGN KEY ([OrganizationId]) REFERENCES [dbo].[Organizations] ([Id])
);
CREATE NONCLUSTERED INDEX [IX_FK_DeviceSession]
ON [dbo].[Sessions]([DeviceId] ASC);
CREATE NONCLUSTERED INDEX [IX_FK_OrganizationSession]
ON [dbo].[Sessions]([OrganizationId] ASC);
CREATE NONCLUSTERED INDEX [IX_Sessions_OrganizationId_Include_DeviceId]
ON [dbo].[Sessions]([OrganizationId] ASC)
INCLUDE([DeviceId]);
CREATE NONCLUSTERED INDEX [IX_Sessions_OrganizationId_DeviceId] ON [dbo].[Sessions]
(
[DeviceId] ASC,
[OrganizationId] ASC,
[StartTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE TABLE [dbo].[Devices] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[UserId] INT NULL,
[MACAddress] NCHAR (12) NOT NULL,
CONSTRAINT [PK_Devices] PRIMARY KEY CLUSTERED ([Id] ASC),
CONSTRAINT [FK_UserDevice] FOREIGN KEY ([UserId]) REFERENCES [dbo].[Users] ([Id]),
CONSTRAINT [IX_Unique_MACAddress] UNIQUE NONCLUSTERED ([MACAddress] ASC)
);
CREATE NONCLUSTERED INDEX [IX_FK_UserDevice]
ON [dbo].[Devices]([UserId] ASC);
CREATE TABLE [dbo].[Users] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[Email] NVARCHAR (250) NOT NULL,
[Sex] TINYINT NOT NULL,
[Age] SMALLINT NOT NULL,
[PhoneNumber] NCHAR (10) NOT NULL DEFAULT '' ,
[Name] NVARCHAR(100) NOT NULL DEFAULT '',
CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ([Id] ASC),
CONSTRAINT [IX_Unique_Email_PhoneNumber] UNIQUE NONCLUSTERED ([Email] ASC, [PhoneNumber] ASC)
);
I rebuild indexes and update statistics on a weekly basis. Azure SQL DB has no performance recommendations.
Any ideas on how to solve this problem without simply throwing more Azure hardware at it? I'm open to anything including Azure-level changes, SQL changes, code changes. It doesn't appear that there's a consumption model of pricing for Azure SQL DB, which may help me if it existed.
I would suggest creating following indexes or add the missing columns into your exiting Indexes.
CREATE NONCLUSTERED INDEX [NIX_Session_Device_OrganizationId]
ON [dbo].[Sessions] ([DeviceId] , [OrganizationId]);
CREATE NONCLUSTERED INDEX [NIX_Device_ID_UserID]
ON [dbo].[Devices] ([Id], [userid]);
CREATE NONCLUSTERED INDEX [NIX_Organizations]
ON [dbo].[Organizations] ([Id] , [Name]);
200 DTUs isn't a big number, 2oo DTUs mean you are already on S4 service level, anything above will put you in S6.
First try to tune your query with appropriate indexes, once that done then start looking at DTUs, and really for a mission critical system I would prefer to go with vCore pricing model rather than juggling with the blackbox of DTUs.
I would create a nonclustered columnstore index. You're doing aggregate queries. That is a perfect fit for your situation. It's going to affect inserts & updates somewhat, so you'll want to test it over time, but it's the right way to go to make that query run much faster:
CREATE NONCLUSTERED COLUMNSTORE INDEX ixtest
ON dbo.Organizations
(
id,
Name --plus whatever other columns are in the table
);
I set up a small test using your scripts and the query went from 17ms to 6ms. The reads dropped from several thousand to about twelve.
You didn't include a definition of Organizations, so I just dummied it out. You'll want to be sure to include all the columns in the columnstore index (that's the best practice).

How to best create an index for a large table with no primary key?

First off, I am not a database programmer.
I have built the following table for stock market tick data:
CREATE TABLE [dbo].[Tick]
(
[trade_date] [int] NOT NULL,
[delimiter] [tinyint] NOT NULL,
[time_stamp] [int] NOT NULL,
[exchange] [tinyint] NOT NULL,
[symbol] [varchar](10) NOT NULL,
[price_field] [tinyint] NOT NULL,
[price] [int] NOT NULL,
[size_field] [tinyint] NOT NULL,
[size] [int] NOT NULL,
[exchange2] [tinyint] NOT NULL,
[trade_condition] [tinyint] NOT NULL
) ON [PRIMARY]
GO
The table will store 6 years of data to begin with. At an average of 300 million ticks per day that would be about 450 billion rows.
Common query on this table is to get all the ticks for some symbol(s) over a date range:
SELECT
trade_date, time_stamp, symbol, price, size
WHERE
trade_date > 20160101 and trade_date < 20170101
AND symbol = 'AAPL'
AND price_field = 0
ORDER BY
trade_date, time_stamp
This is my first attempt at an index:
CREATE UNIQUE CLUSTERED INDEX [ClusteredIndex-20180324-183113]
ON [dbo].[Tick]
(
[trade_date] ASC,
[symbol] ASC,
[time_stamp] ASC,
[price_field] ASC,
[delimiter] ASC,
[exchange] ASC,
[price] ASC,
[size_field] ASC,
[size] ASC,
[exchange2] ASC,
[trade_condition] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
First, I put date before symbol because there's less days than symbol so the shorter path is to get to date first.
I have included all the columns I would potentially need to retrieve. When I tested building it for one day's worth of data the size of the index was relatively quite large, about 4gb for a 20gb table.
Two questions:
Is my not including a primary key to save space a wise choice assuming my query requirements don't change?
Would I save space if I only include trade_date and symbol in the index? How would that affect performance, because I've been told I need to include all the columns I need in the index otherwise retrieval would be very slow because it would have to go back to the primary key to find the values of columns not included in the index. If this is true, how would that even work when my table doesn't have a primary key?
Your unique clustered index should contain the minimum amount of columns necessary to uniquely identify a row in your table. If that means almost every column in your table, I would think you should add an artificial primary key. Cutting an artificial primary key to save space is a poor decision IMO, only cut it if you can create a natural primary key out of your data.
The clustered index is essentially where all your data is stored. The leaf nodes of the index contain all the data for that row, the columns that make up the index determine how to reach those leaf nodes.
Including extra columns in your index to speed up queries only applies to NONCLUSTERED indexes, as there the leaf node generally only contains a lookup value. For these indexes, the way to include extra columns is to use the INCLUDE clause, not just list them all as part of the index. For example.
CREATE NONCLUSTERED INDEX [IX_TickSummary] ON [dbo].[Tick]
(
[trade_date] ASC,
[symbol] ASC
)
INCLUDE (
[time_stamp],
[price],
[size],
[price_field]
)
This is a concept known as creating a covering index, where the index itself contains all the columns needed to process your query so no additional lookup into the data table is needed. The up side of this is increased speed. The down side is that those INCLUDE'ed columns are essentially duplicated resulting in a large index and eating more space.
Include columns that are used very frequently, such as those used to generate summary listings. Columns that are queried infrequently, such as those only needed in detailed views, should be left out of the index to save space.
Potentially helpful reading: Using Covering Indexes to Improve Query Performance
Looking at your most common query, you should create a composite index based first on the columns involved in the where clause:
trade_date, simbol, price_field
then in select
time_stamp, symbol, price, size
This way, you can use the index for where and select column retrieving avoiding access to the data table
trade_date, simbol, price_field, time_stamp, symbol, price, size
In your sequence you have time_stamp before price_field .. a select column before a where column this don't let the db engine use completely the power of index

Adding WHERE adds 20+ seconds to SQL Azure query

I'm looking for some advice on speeding up queries in SQL Azure. This is an example of the two queries we're running, when we add a WHERE clause on there, the queries grind to a halt.
Both columns, theTime and orderType are indexed. Can anyone suggest how to make these run faster, or things to do to the query to make it more efficient?
5.2 seconds:
sum(cast(theTime AS INT)) as totalTime from Orders
20.2 seconds:
sum(cast(theTime AS INT)) as totalTime from Orders WHERE orderType='something_in_here'
Here's the relevant information:
CREATE TABLE [dbo].[Orders] (
[ID] int IDENTITY(1,1) NOT NULL,
[orderType] nvarchar(90) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[orderTime] nvarchar(90) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PrimaryKey_fe2bdbea-c65a-0b85-1de9-87324cc29bff] PRIMARY KEY CLUSTERED ([ID])
WITH (IGNORE_DUP_KEY = OFF)
)
GO
CREATE NONCLUSTERED INDEX [orderTime]
ON [dbo].[Orders] ([orderTime] ASC)
WITH (IGNORE_DUP_KEY = OFF,
STATISTICS_NORECOMPUTE = OFF,
ONLINE = OFF)
GO
CREATE NONCLUSTERED INDEX [actiontime_int]
CREATE NONCLUSTERED INDEX [orderType]
ON [dbo].[Orders] ([orderType] ASC)
WITH (IGNORE_DUP_KEY = OFF,
STATISTICS_NORECOMPUTE = OFF,
ONLINE = OFF)
GO
I suspect your query is not doing what you think. It is taking the first million counts, rather than the count of the first million rows. I think you want:
select sum(cast(theTime AS INT))
from (select top (1000000) Orders
from Orders
) t
versus:
select sum(cast(theTime AS INT))
from (select top (1000000) Orders
from Orders
WHERE orderType='something_in_here'
) t
My suspicion is that using the index actually slows things down, depending on the selectivity of the where clause.
In the original query, you are reading all the data, sequentially. This is fast, because the pages just cycle through the processor.
Going through the index slows things down, because the pages are not read in order. You may be still be reading all the pages (if every page has a matching row), but they are no longer being read in "physical" or "logical" order. They are being read in the order of the index -- which is likely to be random.

Increasing performance on a logging table in SQL Server 2005

I have a "history" table where I log each request into a Web Handler on our web site. Here is the table definition:
/****** Object: Table [dbo].[HistoryRequest] Script Date: 10/09/2009 17:18:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[HistoryRequest](
[HistoryRequestID] [uniqueidentifier] NOT NULL,
[CampaignID] [int] NOT NULL,
[UrlReferrer] [nvarchar](512) NOT NULL,
[UserAgent] [nvarchar](512) NOT NULL,
[UserHostAddress] [nvarchar](15) NOT NULL,
[UserHostName] [nvarchar](512) NOT NULL,
[HttpBrowserCapabilities] [xml] NOT NULL,
[Created] [datetime] NOT NULL,
[CreatedBy] [nvarchar](100) NOT NULL,
[Updated] [datetime] NULL,
[UpdatedBy] [nvarchar](100) NULL,
CONSTRAINT [PK_HistoryRequest] PRIMARY KEY CLUSTERED
(
[HistoryRequestID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[HistoryRequest] WITH CHECK ADD CONSTRAINT [FK_HistoryRequest_Campaign] FOREIGN KEY([CampaignID])
REFERENCES [dbo].[Campaign] ([CampaignId])
GO
ALTER TABLE [dbo].[HistoryRequest] CHECK CONSTRAINT [FK_HistoryRequest_Campaign]
GO
37 seconds for 1050 rows on this statement:
SELECT *
FROM HistoryRequest AS hr
WHERE Created > '10/9/2009'
ORDER BY Created DESC
Does anyone have anysuggestions for speeding this up? I have a Clustered Index on the PK and a regular Index on the CREATED column. I tried a Unique Index and it barfed complaining there is a duplicate entry somewhere - which can be expected.
Any insights are welcome!
You are requesting all columns (*) over a non-covering index (created). On a large data set you are guaranteed to hit the Index Tipping Point where the clustered index scan is more efficient than an nonclustered index range seek and bookmark lookup.
Do you need * always? If yes, and if the typical access pattern is like this, then you must organize the table accordingly and make Created the leftmost clustered key.
If not, then consider changing your query to a coverable query, eg. select only HistoryRequestID and Created, which are covered by the non clustered index. If more fields are needed, add them as included columns to the non-clustered index, but take into account that this will add extra strorage space and IO log write time.
Hey, I've seen some odd behavior when pulling XML columns in large sets. Try putting your index on Created back, then specify the columns in your select statement; but omit the XML. See how that affects the return time for results.
For a log table, you probably don't need a uniqueidentifier column. You're not likely to query on it either, so it's not a good candidate for a clustered index. Your sample query is on "Created", yet there's no index on it. If you query frequently on ranges of "Created" values then it would be a good candidate for clustering even though it's not necessarily unique.
OTOH, the foreign key suggests frequent querying by Campaign, in which case having the clustering done by that column could make sense, and would also probably do a better job of scattering the inserted keys in the indexes - both the surrogate key and the timestamp would add records in sequential order, which is net more work over time for insertions because the node sectors are filled less randomly.
If it's just a log table, why does it have update audit columns? It would normally be write-only.
Rebuild indexes. Use WITH (NOLOCK) clause after the table names where appropriate, this probably applies if you want to run long(ish) running queries against table that are heavily used in a live environment (such as a log file). It basically means your query migth miss some of teh very latest records but you also aren't holding a lock open on the table - which creates additional overhead.