Why am I causing a Clustered Index Update? - sql

I'm executing the following statement:
UPDATE TOP(1) dbo.userAccountInfo
SET Flags = Flags | #AddValue
WHERE ID = #ID;
The column 'ID' is an INT PRIMARY KEY with IDENTITY constraints.
Flags is a BIGINT NOT NULL.
The execution path indicates that a Clustered Index Update is occurring. A very expensive operation.
There's no indexes covering Flags or ID, except for the primary key.
I feel like the actual execution path should be:
Clustered Index Seek => Update

Tables come in two flavors: clustered indexes and heaps. You have a PRIMARY KEY constraint so you have created implicitly a clustered index. You'd have to go to extra length during the table create for this not to happen. Any update of the 'table' is an update of the clustered index, since the clustered index is the table.
As for the clustered index update being a 'very expensive operation', now that is an urban legend surrounding basic misinformation about how a database works. The correct statement is 'a clustered index update that affects the clustered key has to update the all non-clustered indexes'.

The clustered index is the physical table, so whenever you update any row, you're updating the clustered index.
See this MSDN article

Related

Understanding how primary key columns are included in a non-clustered index

Assume I have a table called 'demo' with 4 columns; 'a', 'b', 'c' and 'd'. The primary key clustered index for the 'demo' table contains columns 'a' and 'b' in that order.
The 'Actual Execution Plan' from a query referencing table 'demo' has suggested that a new non-unique non-clustered index is required for column 'b' and should include column 'a'.
If I create a non-unique non-clustered index on column 'b' do I need to include column 'a' or will it already be part of the non-clustered index because it is in the primary key?
If primary key column 'a' is already part of the non-clustered index, is column 'a' stored as an include column or is it part of the non-clustered key?
The 'Actual Execution Plan' from a query referencing table 'demo' has
suggested that a new non-unique non-clustered index is required for
column 'b' and should include column 'a'.
...
If primary key column 'a' is already part of the non-clustered index,
is column 'a' stored as an include column or is it part of the
non-clustered key?
In your case column a will be presented on all levels of non-clustered index as the part of clustered index key. The index suggested to you is non-unique so it needs uniquefier and the clustered index key will be used for this purpose.
If the offered index was unique, column a would be stored on the leaf level of this index as the part of row locator that in case of a clustered table is clustered index key.
Column a will not be stored twice if you include it explicitly as included column of your index, so I advice you to include it. It will make difference when one day someone decides to turn your clustered table to a heap (by dropping clustered index). In this case if you did not include column a explicitly in your non clustered index, it will be lost and not contained in your non-clustered index anymore
Including the column a in non-clustered index is useless since it is part of the clustered index key. Therefore, it is part of the data in leaf pages of non-clustered index. Having a query like this
SELECT a FROM tab WHERE b = <value>
then the a value will be naturally part of the leaf data in the non-clustered index.
The PK fields are always part of the key of the index, not part of the included columns.
What I'm thinking here is perhaps it wants to seek by column B; that's something that it can only do if column B is the first key in the index. If you define an index with column B first, followed by column A, perhaps it'll be able to do just that. It seems that it'll be happy as long as both keys are in the index, as you have a compound PK, though they may currently be in the wrong order (first A, then B) thereby preventing a seek.
Reference on PK fields automatically showing up in indexes: https://www.brentozar.com/archive/2013/07/how-to-find-secret-columns-in-nonclustered-indexes/
Try this and watch execution plan. You can see DB uses only INDEX. So, as far as I know, you shouldn't include column A in your index (as, as you said, Clust. index key is already included).
CREATE TABLE DEMO (COLA VARCHAR(10) NOT NULL, COLB VARCHAR(10) NOT NULL, COLC VARCHAR(10), COLD VARCHAR(10));
ALTER TABLE DEMO ADD CONSTRAINT DEMO_PK PRIMARY KEY (COLA, COLB);
CREATE INDEX DEMO_IX1 ON DEMO (COLB);
INSERT INTO DEMO VALUES ('A','B','C','D');
INSERT INTO DEMO VALUES ('A1','B1','C1','D1');
INSERT INTO DEMO VALUES ('A2','B2','C2','D2');
SELECT COLA,COLB FROM DEMO WHERE COLB='B1'
Non-clustered indexes implicitly include the clustered index keys automatically.
In the documentation you could get a lot of information about this, but especially this part explains exactly this:
Nonclustered Index Architecture
The leaf layer of a nonclustered index is made up of index pages
instead of data pages. The row locators in nonclustered index rows are
either a pointer to a row or are a clustered index key for a row.
If your table is a heap, then the row locator would point directly to the data row that contains the key value but if your table is not a heap (which is the case, because you have already a clustered key on that table) then the row locator points to the clustered index key.
Take a look at clustered and nonclustered indexes described as well.
This thread discusses the same: Necessary to include clustered index columns in non-clustered indexes?

Should i add a non clustered index over a clustered index table, based on this where clause and join?

I have a parent table and a child table.
The parent table has a clustered index as primary key with increment value (ParentID). The child table also has a clustered index as primary key with increment value (ChildID)
Primary key Parent.parentID is in relation with child.parentID as a foreign key.
I join those two tables based on following query.
Select ....
Join on parent.parentID = child.parentID
where parent.personalNumber = 197608134356 <-- varchar
Now, should I
add non clustered index on parent.personalNumber as it is in the where clause?
Add a non clustered index on the foreign key child.parentiD to speed up the join?
It would mean I put non clustered index over a clustered index table.
I expect a lot of rows on both parent and child over time. There will be inserts and selects. No updates or deletes
Thanks
/s
You can have only one clustered index on a table, and you already have one on ParentId for Parent table and one for ChildId on Child table, both are incremented values wich is good, and primary keys which is good too (not mandatory, you could choose to have your clustered index on other columns and a non clustered index on your pk).
Your design looks fine. You have to add non clustered indexes on your search columns (parent.personalNumber and others if any) and on foreign keys, it usually helps.
Use clustered index for PK unless you have a compelling reason not to.
That join can still use a clustered composite PK on child
Check the execution plan - I would be very surprised if that clustered index was not used by the join
An index on the personalNumber should aid the where
If you are designing for high performance, with many concurrent clients, all inserting into your table at the same time - you should not have a clustered index based on a PK that is also an identity column. See this article for an explanation. To do so creates a hotspot in your table which can adversely impact performance.
There is almost always a more appropriate column to base a clustered index on in a table with an identity column as a PK.

Clustered and covering index ignored on delete statement. Table scan occurs

Why would SQL Server 2005 find it more efficient to perform a table scan instead of using the available clustered index on the primary key (and only the primary key)?
DISCLAIMER:
There is also a non-clustered, non-unique index on the primary key with no included columns. This is baffling to me and we've had a good office chuckle already. If this index ends up being the problem, then we know who to shoot. Unfortunately, it's a production site and I can't just rip it out but will make plans to do so if necessary.
Maybe the problem is not the mentally deficient contrary index, however...
According to FogLight PASS the following statement has been causing a scan on a table with ~10 million rows about 600 times an hour when we delete a row by the primary key:
DELETE FROM SomeBigTable WHERE ID = #ID
The table DDL:
CREATE TABLE [SomeBigTable]
(
[ID] [int] NOT NULL,
[FullTextIndexTime] [timestamp] NOT NULL,
[FullTextSearchText] [varchar] (max) NOT NULL,
CONSTRAINT [PK_ID] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
) -- ...
ON PRIMARY
The clustered index constraint in detail:
ADD CONSTRAINT [PK_ID] PRIMARY KEY CLUSTERED
(
[ID] ASC
) WITH PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,SORT_IN_TEMPDB = OFF
,IGNORE_DUP_KEY = OFF
,ONLINE = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
,FILLFACTOR = 75
ON PRIMARY
The non-unique, non-clustered index on the same table:
CREATE NONCLUSTERED INDEX [IX_SomeBigTable_ID] ON [SomeBigTable]
(
[ID] ASC
) WITH PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,SORT_IN_TEMPDB = OFF
,IGNORE_DUP_KEY = OFF
,ONLINE = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
,FILLFACTOR = 98
ON PRIMARY
There is also a foreign key constraint on the [ID] column pointing to an equally large table.
The 600 table scans are about ~4% of the total delete operations per hour on this table using the same statement. So, not all executions of this statement cause a table scan.
It goes without saying, but saying it anyway...this is a lot of nasty I/O that I'd like to send packing.
Have you tried recomputing statistics on the table and clearing your proc cache?
e.g. something like this:
USE myDatabase;
GO
UPDATE STATISTICS SomeBigTable;
GO
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
It could be that sql server is just using the wrong index because it has a bad plan cached from when there was different data in the table.
Some things to try, some things to check:
Are you running a DELETE SomBigTable where ID = #Parameter statement? If so, is #Parameter of type int, or is it a different datatype than the column being deleted? (Probably not it, but I hit a situation once where a string was getting cast as unicode, and that caused an index to be ignored.)
Make a copy of the database and mess around with it:
Try to identify which deletes cause a scan, and which do not
Is it related to the presence or absense of data in the FK-related table?
Is the foreign key trusted (check via sys.foriegn_keys)
Drop the FK. Does it change anything?
Drop the second index. Does that change anything?
Might be none of these, but while messing around with them you might stumble across the real issue.

Drop Default Clustered Index Without Knowing it's Name

I wrote a script that creates some tables (it previously drops them if they exist) and then tries to create two indexes on each table.
The first index uses the Primary Key column to create a non-clustered index, and the second uses another column to create the clustered indexed. This is because the primary key column is a GUID instead of an int.
How can I drop the default index if I don't know it's name? or how can I specify a name for the primary key column index so I can drop it? Or better yet, how can I specify the 2 index i need right in the Create Table statement?
SELECT * FROM sys.indexes
However, I'm not understanding where in your process you actually have to drop an index.
You said you are creating some tables and then creating two indexes on each table.
If you are DROPping existing tables at the beginning, any indexes are automatically dropped.
There is no such thing as a default index.
Tables can either be heaps or clustered indexes. If you drop the clustered index, the table will be converted to a heap and any non-clustered indexes will have to be updated to point to the data in the unordered heap.
You can create like this all at once:
CREATE TABLE dbo.tbl
(
Id int NOT NULL IDENTITY (1, 1) CONSTRAINT UK_ID UNIQUE CLUSTERED,
SomeUUID UNIQUEIDENTIFIER NOT NULL CONSTRAINT PK_SomeUUID PRIMARY KEY NONCLUSTERED
)
Here's a SQLFiddle: http://sqlfiddle.com/#!6/d759e/12
You can define the two indices right after you create the table:
CREATE TABLE dbo.YourTable ( ...... )
GO
ALTER TABLE dbo.YourTable
ADD CONSTRAINT PK_YourTable PRIMARY KEY NONCLUSTERED (YourGuidColumn)
****************
this is crucial ! Otherwise, your PK will be clustered!
CREATE CLUSTERED INDEX IX01_YourTable ON dbo.YourTable(YourOtherColumn)
or even better:
CREATE UNIQUE CLUSTERED INDEX IX01_YourTable ON dbo.YourTable(YourOtherColumn)
That should create a non-clustered primary key and a separate (preferably unique) clustered index on a separate column.

What is advantages of non clustered index over primary key (clustered index)

i have got a table (stores data of forum, means normally no edit and update just insert) on which i have a primary key column which is as we know a clustered index.
please tell me, will i get any advantage if i creates a non-clustered index on that column (primary key column)?
EDIT: my table has got currently around 60000 records, what will be better to place non-clustered index on it or create a same new table and create index and then copy records from old to new table.
Thanks
Every table should have a clustered index
Non-clustered indexes allow INCLUDEs which is very useful
Non-clustered indexes allow filtering in SQL Server 2008+
Notes:
Primary key is a constraint which happens to be a clustered index by default
One clustered index only, many non-clustered indexes
One advantage: you can INCLUDE other columns in the index.
A clustered index specifies the physical storage order of the table data (this is why there can only be one clustered index per table).
If there is no clustered index, inserts will typically be faster since the data doesn't have to be stored in a specific order but can just be appended at the end of the table.
On the other hand, index searches on the key column will typically be slower, since the searches cannot use the advantages of the clustered index.
The only possible advantage that I can see could be from the fact that the entries on leaf pages of nonclustered index are not as wide. They only contain index columns while the clustered index' leaf pages are the actual rows of data. Therefore, if you need something like select count(your_column_name) from your_table then scanning the nonclustered index will involve considerably smaller number of data pages. Or if the number of index columns is greater than one and you run any query which does not need data from non-indexed columns then again, nonclustered index scan will be faster.