Clustered Index or Non clustered index on non primary key column? - sql

In my situation I have a table (well many tables) that are using an identity column as their primary key. They will also contain a unique int column RecordID. 90% of the time RecordID will be used to search for a record. The only reason the ID identity column even exists is to keep things consistent across our system. In this case should I drop the clustered index from the ID column and add it to the RecordID column? Then add a non-clustered index to the primary key ID in the rare case it will be used to get a record. Thanks!!

If your queries are singleton seek on RecordID (ie. WHERE RecordID=...) then I wouldn't change it. Is true that a clustered index on it will be faster, but would only be noticeable on really really hot cases.
What I would consider for a change it would be if you have range queries (BETWEEN, < or >). Range scans could benefit more significantly from clustered index, as a non-clustered index may be subject to the index tipping point.
Another thing to consider is if you have sort requirements that could be satisfied by this clustered index (ORDER BY in queries, or GROUP BY, or ranking functions like ROW_NUMBER with ORDER by clause). A clustered index would help these better.

Related

Why PrimaryKey NonClustered Index and Separate Clustered Index on Same PrimaryKey Columns are Designed.?

Tables are configured where PrimaryKey has been created as Non Clustered Index and with an Additional Clustered Index of the same Columns as of PrimaryKey Columns.
Index Info Sample of this Pattern:
Is there any specific reason why such design is implemented?
And what is the best way to optimize the indexes using Ola Hallengren Scripts?
Non-clustered index is nothing but a smaller table with smaller IO footprint. As you don't have included columns in the non-clustered index, it is just a smaller table, just having OrderID values. Ideally, it is not required.
But, say, Somebody is querying just OrderId, then the non-clustered index will be used, as there is less IO footprint here.
SELECT OrderiD from Orders
But, if somebody is querying other columns in the table, then clustered index will be used.
SELECT * FROM Orders
But, It all depends on the use cases. But, generally, I would suggest to keep smaller number of indexes for easier maintenance.
For example, in the above scenario, whenever a new OrderID is coming, it has to be inserted twice:
In the actual data pages, organized by clustered index key
In the non-clustered index page

Are there performance differences in queries with UNIQUE NON NULL indexes and Primary keys?

I want to search a DB with either the PK or a unique non null field that is indexed. Are there any performance differences between those? I am using Postgres as my DB. But a general DB-independent answer would be good too.
In postgreSQL, all indexes are secondary or unclustered indexes. That means the the index points to the heap, the data structure holding the actual column data. So, a primary key's index doesn't have any structural advantage over a UNIQUE index: SELECTs using the index for filtering must then bounce over to the heap for the data.
In fact, it might be the other way around, because postgreSQL indexes can have INCLUDES clauses.
For example consider a table with uniqueid, a, b, and c columns. If your workload is heavy with SELECT b FROM tbl WHERE uniqueid = something queries, you can declare this covering index.
CREATE UNIQUE INDEX uniq ON tbl(uniqueid) INCLUDE (b);
Your whole query can then be satisfied from the index. That saves the extra trip to the heap, and so saves IO and CPU time.
MySQL and SQL Server, on the other hand, use clustered indexes for their primary keys. That is, the table's data is stored in the primary key's index. So, the PK is, automatically, basically an index created like this.
CREATE UNIQUE INDEX pk ON tbl(uniqueid) INCLUDE (a, b, c);
In those databases the PK's index does have an advantage over a separate UNIQUE index, which necessarily is a secondary or unclustered index. (Note: MySQL's indexes don't have INCLUDE() clauses.)

If a table has 'id' column as it's clustered index, is there any benefit in adding 'id' column as an included column in any other non-clustered index

If a table has 'id' (the primary key) column as its clustered index, is there any benefit in adding 'id' column as an included column in any other non-clustered index in Microsoft SQLServer?
eg:- Table 'xyz'
id
name
status
date
1
abc
active
2021-06-23
CREATE NONCLUSTERED INDEX [NonClusteredIndex_status_Date]
ON [xyz]
(
[status] ASC,
[date] ASC
)
INCLUDE
( [id],
[name]
)
And this non-clustered index is targeted for a query similar to bellow on a large data set. In the actual case there could be some other queries as well.
select * from xyz where status='active' and date > '2021-06-20'
The answer to your question is essentially No, there is no benefit.
When you create a non-clustered index on a table, each row in the index needs to be able to point to the row in the base table.
If the base table is a heap, each row in the index will contain a pointer to the rid (row identifier) which is what SQL Server uses to uniquely identify each row.
When the table is defined with a clustered index, every non-clustered index will automatically contain the clustered index column(s) as keys to the row in the base table.
You can see this in an execution plan, where a non-clustered index is used and SQL Server has to retrieve additional columns from the base table; if the table is a heap, it will be an RID lookup, if the table has a clustered index it will be a Key lookup.
Additionally, if the clustered index is not unique, SQL Server adds its own uniquifier value to ensure uniqueness, and this is also included in non-clustered indexes.
So when it comes to non-clustered indexes, it does not matter if you specify the clustered index column(s) - you can, and there is no harm in doing so, but it/they are always included.
This answer assumes you are planning to run the following query:
SELECT * FROM xyz WHERE status = 'active' AND date > '2021-06-20';
If you only created a non clustered index on (status, date), then it would cover the WHERE clause, but not the SELECT clause. What this means is that SQL Server might choose to use the index to find the matching records in the query. But when it gets to evaluating the SELECT clause, it would be forced to seek back to the clustered index to find the values for the columns not included in the index, other than the id clustered index column (which includes the name column in this case). There is performance penalty in doing this, and SQL Server might, depending on your data, even choose to not use the index because it does not completely cover the query.
To mitigate this, you can define the index in your question, where you include the name value in the leaf nodes. Note that id will be included by default in the index, so we do not need to explicitly INCLUDE it. By following this approach, the index itself is said to completely cover the query, meaning that SQL Server may use the index for the entire query plan. This can lead to fast performance in many cases.

SQL: Add Primary Key to Non-Unique Index

Let's say a query is filtering on two fields and returning primary key values.
SELECT RowIdentifier
FROM Table
WHERE QualifierA = 'exampleA' AND QualifierB = 'exampleB'
Assuming the clustered index is not the PrimaryKey would a non-unique index that contains QualifierA and QualiferB be best served via the addition of the RowIdentifier(Scenario A & Scenario B). Or would it be more appropriate to simply include it(Scenario C)?
Scenario A: Non-Unique, Non-Clustered
CREATE NONCLUSTERED INDEX IX_Table_QualifierA
ON [dbo].[Table] ([QualifierA],[QualifierB],[RowIdentifier])
Scenario B: Unique, Non-Clustered
CREATE UNIQUE NONCLUSTERED INDEX IX_Table_QualifierA
ON [dbo].[Table] ([QualifierA],[QualifierB],[RowIdentifier])
Scenario C:
CREATE NONCLUSTERED INDEX IX_Table_QualifierA
ON [dbo].[Table] ([QualifierA],[QualifierB])
INCLUDE ([RowIdentifier])
Finally I'm assuming that if the PrimaryKey were the clustered index that neither is necessary, is this accurate?
If there is a CLUSTERED index, it is automatically included in all indexes on the table. You can explicitly include it but it is not required.
The UNIQUE index simply enforces uniqueness. The PK should already have this constraint. You do not need to re-enforce it in every index.
If you are including the PK in your where clause, it will almost certainly use the PK index to find that row because it is guaranteed to return the fewest results, so including in your index gains you nothing for lookups. It could also potentially skew the cardinality engine and make SQL think the index is more distinct than it really is.
For the above reasons, I would select Option C
CREATE NONCLUSTERED INDEX IX_Table_QualifierA
ON [dbo].[Table] ([QualifierA],[QualifierB])
INCLUDE ([RowIdentifier])
I would use this regardless of what column is clustered. This will give you the performance, insure the index will continue to perform regardless of the CLUSTERED INDEX, and make it explicit what the index is used for.
I'm wondering what's more appropriate? A non-clustered unique index incorporating all three fields, or a non-clustered non-unique index incorporating just the two fields(QualifierA & QualifierB) but including the PrimaryKey.
There's a third option. A non-clustered, non-unique index incorporating all three fields.
When you make an index, the fields in the index are duplicated to another place in memory so the server can go after those fields with ease. If you only have QualiferA and Qualifier B in the index it will find the rows in that index that meet your criteria and then go back to the main table to pick up the RowIdentifier. Instead, include all three in there to improve performance.
Remember, make sure you put QualifierA and QualifierB before RowIdentifier in your index. The order of the columns determine how the data is ordered.
Try it out with some test data if you like, and look at the query plan to see what it's doing.

What is advantages of non clustered index over primary key (clustered index)

i have got a table (stores data of forum, means normally no edit and update just insert) on which i have a primary key column which is as we know a clustered index.
please tell me, will i get any advantage if i creates a non-clustered index on that column (primary key column)?
EDIT: my table has got currently around 60000 records, what will be better to place non-clustered index on it or create a same new table and create index and then copy records from old to new table.
Thanks
Every table should have a clustered index
Non-clustered indexes allow INCLUDEs which is very useful
Non-clustered indexes allow filtering in SQL Server 2008+
Notes:
Primary key is a constraint which happens to be a clustered index by default
One clustered index only, many non-clustered indexes
One advantage: you can INCLUDE other columns in the index.
A clustered index specifies the physical storage order of the table data (this is why there can only be one clustered index per table).
If there is no clustered index, inserts will typically be faster since the data doesn't have to be stored in a specific order but can just be appended at the end of the table.
On the other hand, index searches on the key column will typically be slower, since the searches cannot use the advantages of the clustered index.
The only possible advantage that I can see could be from the fact that the entries on leaf pages of nonclustered index are not as wide. They only contain index columns while the clustered index' leaf pages are the actual rows of data. Therefore, if you need something like select count(your_column_name) from your_table then scanning the nonclustered index will involve considerably smaller number of data pages. Or if the number of index columns is greater than one and you run any query which does not need data from non-indexed columns then again, nonclustered index scan will be faster.