Clustered and covering index ignored on delete statement. Table scan occurs - sql

Why would SQL Server 2005 find it more efficient to perform a table scan instead of using the available clustered index on the primary key (and only the primary key)?
DISCLAIMER:
There is also a non-clustered, non-unique index on the primary key with no included columns. This is baffling to me and we've had a good office chuckle already. If this index ends up being the problem, then we know who to shoot. Unfortunately, it's a production site and I can't just rip it out but will make plans to do so if necessary.
Maybe the problem is not the mentally deficient contrary index, however...
According to FogLight PASS the following statement has been causing a scan on a table with ~10 million rows about 600 times an hour when we delete a row by the primary key:
DELETE FROM SomeBigTable WHERE ID = #ID
The table DDL:
CREATE TABLE [SomeBigTable]
(
[ID] [int] NOT NULL,
[FullTextIndexTime] [timestamp] NOT NULL,
[FullTextSearchText] [varchar] (max) NOT NULL,
CONSTRAINT [PK_ID] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
) -- ...
ON PRIMARY
The clustered index constraint in detail:
ADD CONSTRAINT [PK_ID] PRIMARY KEY CLUSTERED
(
[ID] ASC
) WITH PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,SORT_IN_TEMPDB = OFF
,IGNORE_DUP_KEY = OFF
,ONLINE = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
,FILLFACTOR = 75
ON PRIMARY
The non-unique, non-clustered index on the same table:
CREATE NONCLUSTERED INDEX [IX_SomeBigTable_ID] ON [SomeBigTable]
(
[ID] ASC
) WITH PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,SORT_IN_TEMPDB = OFF
,IGNORE_DUP_KEY = OFF
,ONLINE = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
,FILLFACTOR = 98
ON PRIMARY
There is also a foreign key constraint on the [ID] column pointing to an equally large table.
The 600 table scans are about ~4% of the total delete operations per hour on this table using the same statement. So, not all executions of this statement cause a table scan.
It goes without saying, but saying it anyway...this is a lot of nasty I/O that I'd like to send packing.

Have you tried recomputing statistics on the table and clearing your proc cache?
e.g. something like this:
USE myDatabase;
GO
UPDATE STATISTICS SomeBigTable;
GO
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
It could be that sql server is just using the wrong index because it has a bad plan cached from when there was different data in the table.

Some things to try, some things to check:
Are you running a DELETE SomBigTable where ID = #Parameter statement? If so, is #Parameter of type int, or is it a different datatype than the column being deleted? (Probably not it, but I hit a situation once where a string was getting cast as unicode, and that caused an index to be ignored.)
Make a copy of the database and mess around with it:
Try to identify which deletes cause a scan, and which do not
Is it related to the presence or absense of data in the FK-related table?
Is the foreign key trusted (check via sys.foriegn_keys)
Drop the FK. Does it change anything?
Drop the second index. Does that change anything?
Might be none of these, but while messing around with them you might stumble across the real issue.

Related

Using clustered than non-clustered index with columns that include date and nvarchar(50)

I have a table called 'GameTransactions'. It is critical for the table to work well in terms of performance (The table will have millions of records when the site is going to be operational). I thought to index it. The columns that I used for the columns are:
UserID [int],
TransactionID [nvarchar(50)]
ProviderID [int]
TransactionTimeStamp [datetime]
Some context on how I use the table.
At the begining of the SQL operation I check if the transaction ID exists for the same user.
SELECT COUNT(1)
FROM GameTransactions WITH(NOLOCK)
WHERE
UserID=#UserID AND
TransactionID=#TransactionID
AND ProviderID=#ProviderID
AND TransactionTimeStamp>DATEADD(MONTH,-1,GETUTCDATE())
If the request doesnt already exist in the database, I insert it.
I chose to use the following index
CREATE CLUSTERED INDEX IX_GameTransactions_UserID_TransactionID_ProviderID_TransactionTimeStamp
ON dbo.GameTransactions (UserID,TransactionID,ProviderID,TransactionTimeStamp);
I read in this article:
https://sqlstudies.com/2014/12/01/using-a-date-or-int-column-as-the-clustered-index/
That it is possible to achieve good performance with datetime being a column in a clustered index. I dont care about disk space that the clustered index is going to take, I am more concerned about speed performance.
I also thought about an alternative solution,
CREATE NONCLUSTERED INDEX IX_GameTransactions_UserID_TransactionID_ProviderID_TransactionTimeStamp
ON dbo.GameTransactions (UserID, Month, Year,ProviderID)
INCLUDE (TransactionID);
I could add 2 additional columns - Month and year. And work with ints instead of date. Keep in mind that the 'TransactionID' field has to be an nvarchar(50). There is no way to work around it.
I have an additional Id column which is auto-incrementing. Would such a solution work?
CONSTRAINT PK_GameTransactions PRIMARY KEY CLUSTERED (
UserID
, TransactionID
, ProviderID
, TransactionTimeStamp
, Id
)
Use EXISTS instead of COUNT to conditionally insert the row. This will be more efficient since a count is not needed. Make sure the index is unique to ensure duplicates are not possible.
Use >= instead of > for the timestamp criteria so that 2 sessions with the same timestamp don't both insert the same row, although one would err if a unique index or constraint exists.
Furthermore, consider removing NOLOCK to ensure concurrent sessions don't insert rows for the same UserID/TransactionID/ProviderID withing the TransactionTimeStamp date range. I suggest SERIALIZABLE for this purpose. Example DDL below with the query encapsulated in a stored procedure below, leveraging the primary key index for both performance and data integrity.
CREATE TABLE dbo.GameTransactions(
UserID int
, TransactionID nvarchar(50)
, ProviderID int
, TransactionTimeStamp datetime
CONSTRAINT PK_GameTransactions PRIMARY KEY CLUSTERED (
UserID
, TransactionID
, ProviderID
, TransactionTimeStamp
)
);
GO
CREATE PROCEDURE dbo.InsertGameTransactions
#UserID int
, #TransactionID nvarchar(50)
, #ProviderID int
AS
DECLARE #TransactionTimeStamp datetime = GETUTCDATE();
INSERT INTO dbo.GameTransactions (
UserID
, TransactionID
, ProviderID
, TransactionTimeStamp
)
SELECT
#UserID
, #TransactionID
, #ProviderID
, #TransactionTimeStamp
WHERE NOT EXISTS(
SELECT 1
FROM dbo.GameTransactions WITH(SERIALIZABLE)
WHERE
UserID=#UserID AND
TransactionID=#TransactionID
AND ProviderID=#ProviderID
AND TransactionTimeStamp >= DATEADD(MONTH,-1,#TransactionTimeStamp)
);
GO
First, a clustered index has no benefit for your comparison.
Second, I strongly agree with Dan that you should be using EXISTS rather than SELECT COUNT(*) if you care about performance.
Third, you are taking the wrong message from the blog. The issue with clustered indexes is that the data is stored in-order on the data pages. When you have a clustered index, you can have a big performance bottleneck when you have to insert rows "between" other rows.
For this reason, the normal advice is to use an identity column as the clustered index key (which is the default, by the way). This is good advice, but there are other circumstances as well. For instance, newsequentialid() is a function that generates GUIDs that are suitable for a clustered index, because they are (almost always) increasing.
In your case, the first column in the index is not a date/time. So you are probably going to have fragmentation problems galore in using such a clustered index. For what you want to do, there is no reason to order the data on the data pages. Just use a regular index with all the columns you need as keys.

stored data sorting: nonclustered primary key overrides clustered index

I need to create a table with a nonclustered primary key (to set foreign keys on other tables to it) and a clustered index to store the data in the intended order.
However, the resulting stored data is sorted in the primary key's order as opposed to the index's.
Is there a way to prevent this from occurring? Here is an example (SQL Server 14.0 RTM):
create table dbo.a (
x nvarchar(50) not null
,y nvarchar(100) not null
,index ix_a clustered (y)
,constraint pk_a primary key nonclustered (x)
)
insert dbo.a
values
('d','p')
,('c','q');
select * from dbo.a
the result should be sorted with p first, then q. Howerver, q is in the first row and p is in the second row.
In a similar case, this approach worked when the primary key was in 2 columns as opposed to only 1 column.
You are confused. This query:
select *
from dbo.a
Does not tell you anything about the "ordering" of a table. A SQL table with no ORDER BY returns rows in an indeterminate order. I also freely admit that with a handful of rows in the table, this would be highly correlated with the actual ordering of the data, but I strongly discourage you from thinking along those lines.
If you want to know the actual ordering, you need to peak at the data pages. Or you can perhaps use an execution plan to see if an index is being used instead of a sort.
I think that what you are seeing is that SQL Server is choosing to return rows from the query using the primary key index. With two rows in the table, the actual execution plan doesn't really matter.

Which sql index to use on my table for best query performance

I have the following table. My stored procedure always uses IitemId and Created date range.
Where ItemId = ... and Created > ... and Created < ....
What would be the best design for performance.
I have a non-clustered index on ItemId
CREATE TABLE [dbo].[LV] (
[Id] UNIQUEIDENTIFIER NOT NULL,
[ItemId] UNIQUEIDENTIFIER NOT NULL,
[C1] NVARCHAR (7) NOT NULL,
[C2] NVARCHAR (7) NOT NULL,
[C3] NVARCHAR (2) NOT NULL,
[Created] DATETIME2 (7) NOT NULL,
CONSTRAINT [PK_LV] PRIMARY KEY CLUSTERED ([Id] ASC),
CONSTRAINT [FK_LV_Items_ItemId] FOREIGN KEY ([ItemId]) REFERENCES [dbo].[Items] ([Id]) ON DELETE CASCADE
);
GO
CREATE NONCLUSTERED INDEX [IX_LV_ItemId]
ON [dbo].[LV]([ItemId] ASC);
Should I add indexes to ItemId and Created?
Non-clustered or clustered?
For the query you have in mind, you want a composite index on (itemId, created).
This works because the condition on itemId is equality, so the inequality will use the second key in the index.
A clustered index could help, depending on the nature of the data. If an item is only stored one or two or three times in the table, then a clustered index will probably not be of much use. If an item is stored many times, then the rows for the item will be spread through the table and a clustered index would help.
There are even caveats to that. If the table is used frequently and is fully loaded into the data pages, then a clustered index would help but not be as beneficial as when the table is too big to fit into available memory.
If your only performance concern is that stored procedure, then yes, you should make the clustered index on ItemId and Created.
I agree with Tab's answer (yes, cluster it), but would add that to tighten your design, you might look a little deeper and consider making this a primary key, or if you can't, why not. There's a nice writeup on the logic behind this on this Stack Overflow Post

Adding a Primary Key and Altering a Column to DatetTime2

I have a table that tracks statuses that a particular file goes through as it is checked over by our system. It looks like this:
FileID int
Status tinyint
TouchedBy varchar(50)
TouchedWhen datetime
There is currently NO primary key on this table however there is a clustered index on Status and TouchedWhen
As the table has continued to grow and performance decrease in querying against it, one thought I've had is to do add a PrimaryKey so that I get off the heap lookups -- a primary key on FileID, Status and TouchedWhen
The problem I'm running into si that TouchedWhen, due to it's rounding issues, has, on occasion, 2 entries with the exact same datetime.
So then I started researching what it takes to convert that to a datetime2(7) and alter those that are duplicate at that time. My table would then look like:
FileID int
Status tinyint
TouchedBy varchar(50)
TouchedWhen datetime2(7)
And a primarykey on FileID, Status and TouchedWhen
My question is this -- what is the best way to go through and add a millisecond to the existing tables if there are duplicates? How can I do this to a table that needs to remain online?
In advance, thanks,
Brent
You shouldn't need to add a primary key to make queries faster - just adding an index on FileID, Status, TouchedWhen will have just as much of a performance impact as adding a primary key. The main benefit of defining a primary key is for record identity and referential integrity, which could be accomplished with a auto-increment primary key.
(I'm NOT saying you shouldn't have a primary key, I'm saying the performance impact of a primary key is in the index itself, not the fact that it's a primary key)
On the other hand, changing your clustered index to include FileID would likely have a bigger impact as lookups using those columns would not need to search the index then look up the data - the data pages would be right there with the index values.

Why am I causing a Clustered Index Update?

I'm executing the following statement:
UPDATE TOP(1) dbo.userAccountInfo
SET Flags = Flags | #AddValue
WHERE ID = #ID;
The column 'ID' is an INT PRIMARY KEY with IDENTITY constraints.
Flags is a BIGINT NOT NULL.
The execution path indicates that a Clustered Index Update is occurring. A very expensive operation.
There's no indexes covering Flags or ID, except for the primary key.
I feel like the actual execution path should be:
Clustered Index Seek => Update
Tables come in two flavors: clustered indexes and heaps. You have a PRIMARY KEY constraint so you have created implicitly a clustered index. You'd have to go to extra length during the table create for this not to happen. Any update of the 'table' is an update of the clustered index, since the clustered index is the table.
As for the clustered index update being a 'very expensive operation', now that is an urban legend surrounding basic misinformation about how a database works. The correct statement is 'a clustered index update that affects the clustered key has to update the all non-clustered indexes'.
The clustered index is the physical table, so whenever you update any row, you're updating the clustered index.
See this MSDN article