Non clustered index use in SQL server - sql

Can anyone tell me what is the use of having non clustered indexes in SQL server.
As per my knowledge both the clustered and non clustered indexes make the searching easy..

One use is that you can only have one clustered index on a table. If you want more than one, the rest have to be non-clustered.

The others seemed to have all touched on the same points, though I'll keep it short and provide a resource for you to get more information on this.
A clustered index is the table, and it (obviously) includes all columns. That may not always be what is needed and can be a hindrance when there are many rows of data in your result set. You can utilize a non-clustered index (effectively a copy of part of the table) to "cover" your query so that you can get a quicker response time.
Please check out this free video from world-class DBA, Brent Ozar: https://www.brentozar.com/training/think-like-sql-server-engine/
Good luck!

The classic example explaining the difference is one of a phone book. The phone book, how it's physically structured from start to finish by Last Name (I think, it's been a while since I looked at a physical phone book) is analogous to the clustered index on a table. You can only have one clustered index on a table. In fact, the clustered index IS the table; it is how it's physically stored on disk. The structure of the clustered index contains the keys you define, plus ALL the data as well. Side note, in SQL, you don't HAVE to have a clustered index at all; such a table is called a "Heap", but that's rarely a good idea.
A nonclustered index by example would be if, say, you wanted to look up someone's entry in the phone book by address. You'd have an index at the back of the book with addresses sorted alphabetically, and then where in the phone book you can find that phone number. Doing this is called a "lookup". So a nonclustered index has:
The keys you want to index (e.g. Address)
A pointer back to the row in the clustered index (the last name of the person at that address)
Optionally a list of included columns you might frequently need, but not want to have to go back to the clustered index to look up.
Whereas a clustered index contains ALL the data for each row, a nonclustered index is generally smaller because you only have your keys, your pointer and optionally included columns. You can also have as many of them as you want.
As far as how they return data, they're pretty similar, especially if you never have to do a lookup to the clustered index. A query which can get everything it needs from a nonclustered index is said to be "covered" (in that all the stuff you need is covered by the nonclustered index). Also, because clustered indexes are a linear ordering of the physical data, it makes range-based queries faster because it can find the start and end of the range simply by using an offset from the start of the clustered index.

Clustered index is how the data for each row of the table is physically stored on disk (you can only have one of these index types per table), so all write operations' performance is based off of this index. And if you have to rebuild this index or move stuff around on this index, that can be very expensive.
Nonclustered indexes are just a listing of specific parts of the rows in a different order than how they are physically stored (you can have multiple of these index types per table), and a pointer to where it is actually stored. Nonclustered indexes are used to make it easy to find a specific row when you only know certain info about that row.
If you think about a typical text book as a database table, the clustered index is the set of actual pages of content for that book. Because logically it makes sense to write those pages in that order.
And a nonclustered index is the index in the back of the book that list the important terms in alphabetical order. This just lists the word you are looking for, and the page number you can find it. This makes it extreamely easy for you to find what you need to read, when you are looking for a specific term.
Typically it is a good idea to make your clustered index an id that follows the NUSE principle (Narrow, Unique, Static, Ever increasing). Typically, you would accomplish this with a SMALLINT, INT, or BIGINT depending on the amount of data you want to store in the table. This gives you a narrow key because they are only 2, 4, or 8 bytes wide (respectively), you would also probably want to set the IDENTITY property for that column so that it auto increments. And if you never change this value for a row (making it static) -- and there is usually no reason to do so -- then it will be unique and ever increasing. This way, when you insert a new row, it just throws it at the next available spot on disk. Which can help with write speeds.
Nonclustered indexes are usually used when you use certain columns to search for the data. So if you have a table full of people, and you commonly look for people by last name, you would probably want a nonclustered index on the people table over the last name column. or you could have one over last name, first name. If you also commonly search for people based off of their age, then you may want to have another nonclustered index over the birthdate column for people. That way you can easily search for people born above or below a certain date.

Related

Clustered index dilemma - ID or sort?

I have a table with two very important fields:
id INT identity(1,1) PRIMARY KEY
identifiersortcode VARCHAR(900)
My app always sorts and pages search results in the UI based on identifiersortcode, but all table joins (and they are legion) are on the id field. (Aside: yes, the sort code really is that long. There's a strong BL reason.)
Also, due to O/RM use, most SELECT statements are going to pull almost every column.
Currently, the clustered index is on id, but I'm wondering if the TOP / ORDER BY portion of most queries would make identifiersortcode a more attractive option as the clustered key, even considering all of the table joins going on.
Inserts on the table and changes to the identifiersortcode are limited enough that changing my clustered index would be a problem for insert/update operations.
Trying to make the sort code's non-clustered index a covering index (using INCLUDE) is not a good option. There are a number of large columns, and some of them have a lot of update activity.
Kimberly L. Tripp's criteria for a clustered index are that it be:
Unique
Narrow
Static
Ever Increasing
Based on that, I'd stick with your integer identity id column, which satisfies all of the above. Your identifiersortcode would fail most, if not all, of those requirements.
To correctly determine which field will benefit most from the clustered index, you need to do some homework. The first thing that you should consider is the selectivity of your joins. If your execution plans filter rows from this table FIRST, then join on the other tables, then you are not really benefiting from having the clustered index on the primary key, and it makes more sense to have it on the sort key.
If however, your joins are selective on other tables (they are filtered, then an index seek is performed to select rows from this table), then you need to compare the performance of the change manually versus the status quo.
Currently, the clustered index is on id, but I'm wondering if the TOP / ORDER BY portion of most queries would make identifiersortcode a more attractive option as the clustered key, even considering all of the table joins going on.
Making identifiersortcode a CLUSTERED KEY will only help if it is used both in filtering and ordering conditions.
This means that it is chosen a leading table in all your joins and uses Clustered Index Scan or Clustered Index Range Scan access path.
Otherwise, it will only make the things worse: first, all secondary indexes will be larger in size; second, inserts in non-increasing order will result in page splits which will make them run longer and result in a larger table.
Why, for God's sake, does your identifier sort code need to be 900 characters long? If you really need 900 characters to be distinct for sorting, it should probably be broken up into multiple fields.
Appart from repeating what Chris B. said, I think you should really stick to your current PK, since - as you said - all joins are on the Id.
I guess you already have indexed the identifiersortcode....
Nevertheless, IF you have performance issues, would reaaly think twice about this ##"%$£ identifiersortcode !-)

What is an index in SQL?

Also, when is it appropriate to use one?
An index is used to speed up searching in the database. MySQL has some good documentation on the subject (which is relevant for other SQL servers as well):
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
An index can be used to efficiently find all rows matching some column in your query and then walk through only that subset of the table to find exact matches. If you don't have indexes on any column in the WHERE clause, the SQL server has to walk through the whole table and check every row to see if it matches, which may be a slow operation on big tables.
The index can also be a UNIQUE index, which means that you cannot have duplicate values in that column, or a PRIMARY KEY which in some storage engines defines where in the database file the value is stored.
In MySQL you can use EXPLAIN in front of your SELECT statement to see if your query will make use of any index. This is a good start for troubleshooting performance problems. Read more here:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
A clustered index is like the contents of a phone book. You can open the book at 'Hilditch, David' and find all the information for all of the 'Hilditch's right next to each other. Here the keys for the clustered index are (lastname, firstname).
This makes clustered indexes great for retrieving lots of data based on range based queries since all the data is located next to each other.
Since the clustered index is actually related to how the data is stored, there is only one of them possible per table (although you can cheat to simulate multiple clustered indexes).
A non-clustered index is different in that you can have many of them and they then point at the data in the clustered index. You could have e.g. a non-clustered index at the back of a phone book which is keyed on (town, address)
Imagine if you had to search through the phone book for all the people who live in 'London' - with only the clustered index you would have to search every single item in the phone book since the key on the clustered index is on (lastname, firstname) and as a result the people living in London are scattered randomly throughout the index.
If you have a non-clustered index on (town) then these queries can be performed much more quickly.
An index is used to speed up the performance of queries. It does this by reducing the number of database data pages that have to be visited/scanned.
In SQL Server, a clustered index determines the physical order of data in a table. There can be only one clustered index per table (the clustered index IS the table). All other indexes on a table are termed non-clustered.
SQL Server Index Basics
SQL Server Indexes: The Basics
SQL Server Indexes
Index Basics
Index (wiki)
Indexes are all about finding data quickly.
Indexes in a database are analogous to indexes that you find in a book. If a book has an index, and I ask you to find a chapter in that book, you can quickly find that with the help of the index. On the other hand, if the book does not have an index, you will have to spend more time looking for the chapter by looking at every page from the start to the end of the book.
In a similar fashion, indexes in a database can help queries find data quickly. If you are new to indexes, the following videos, can be very useful. In fact, I have learned a lot from them.
Index Basics
Clustered and Non-Clustered Indexes
Unique and Non-Unique Indexes
Advantages and disadvantages of indexes
Well in general index is a B-tree. There are two types of indexes: clustered and nonclustered.
Clustered index creates a physical order of rows (it can be only one and in most cases it is also a primary key - if you create primary key on table you create clustered index on this table also).
Nonclustered index is also a binary tree but it doesn't create a physical order of rows. So the leaf nodes of nonclustered index contain PK (if it exists) or row index.
Indexes are used to increase the speed of search. Because the complexity is of O(log N). Indexes is very large and interesting topic. I can say that creating indexes on large database is some kind of art sometimes.
INDEXES - to find data easily
UNIQUE INDEX - duplicate values are not allowed
Syntax for INDEX
CREATE INDEX INDEX_NAME ON TABLE_NAME(COLUMN);
Syntax for UNIQUE INDEX
CREATE UNIQUE INDEX INDEX_NAME ON TABLE_NAME(COLUMN);
First we need to understand how normal (without indexing) query runs. It basically traverse each rows one by one and when it finds the data it returns. Refer the following image. (This image has been taken from this video.)
So suppose query is to find 50 , it will have to read 49 records as a linear search.
Refer the following image. (This image has been taken from this video)
When we apply indexing, the query will quickly find out the data without reading each one of them just by eliminating half of the data in each traversal like a binary search. The mysql indexes are stored as B-tree where all the data are in leaf node.
INDEX is a performance optimization technique that speeds up the data retrieval process. It is a persistent data structure that is associated with a Table (or View) in order to increase performance during retrieving the data from that table (or View).
Index based search is applied more particularly when your queries include WHERE filter. Otherwise, i.e, a query without WHERE-filter selects whole data and process. Searching whole table without INDEX is called Table-scan.
You will find exact information for Sql-Indexes in clear and reliable way:
follow these links:
For cocnept-wise understanding:
http://dotnetauthorities.blogspot.in/2013/12/Microsoft-SQL-Server-Training-Online-Learning-Classes-INDEX-Overview-and-Optimizations.html
For implementation-wise understanding:
http://dotnetauthorities.blogspot.in/2013/12/Microsoft-SQL-Server-Training-Online-Learning-Classes-INDEX-Creation-Deletetion-Optimizations.html
If you're using SQL Server, one of the best resources is its own Books Online that comes with the install! It's the 1st place I would refer to for ANY SQL Server related topics.
If it's practical "how should I do this?" kind of questions, then StackOverflow would be a better place to ask.
Also, I haven't been back for a while but sqlservercentral.com used to be one of the top SQL Server related sites out there.
An index is used for several different reasons. The main reason is to speed up querying so that you can get rows or sort rows faster. Another reason is to define a primary-key or unique index which will guarantee that no other columns have the same values.
So, How indexing actually works?
Well, first off, the database table does not reorder itself when we put index on a column to optimize the query performance.
An index is a data structure, (most commonly its B-tree {Its balanced tree, not binary tree}) that stores the value for a specific column in a table.
The major advantage of B-tree is that the data in it is sortable. Along with it, B-Tree data structure is time efficient and operations such as searching, insertion, deletion can be done in logarithmic time.
So the index would look like this -
Here for each column, it would be mapped with a database internal identifier (pointer) which points to the exact location of the row. And, now if we run the same query.
Visual Representation of the Query execution
So, indexing just cuts down the time complexity from o(n) to o(log n).
A detailed info - https://pankajtanwar.in/blog/what-is-the-sorting-algorithm-behind-order-by-query-in-mysql
INDEX is not part of SQL. INDEX creates a Balanced Tree on physical level to accelerate CRUD.
SQL is a language which describe the Conceptual Level Schema and External Level Schema. SQL doesn't describe Physical Level Schema.
The statement which creates an INDEX is defined by DBMS, not by SQL standard.
An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.
Indexes are automatically created when PRIMARY KEY and UNIQUE constraints are defined on table columns. For example, when you create a table with a UNIQUE constraint, Database Engine automatically creates a nonclustered index.
If you configure a PRIMARY KEY, Database Engine automatically creates a clustered index, unless a clustered index already exists. When you try to enforce a PRIMARY KEY constraint on an existing table and a clustered index already exists on that table, SQL Server enforces the primary key using a nonclustered index.
Please refer to this for more information about indexes (clustered and non clustered):
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15
Hope this helps!

Does clustered index on foreign key column increase join performance vs non-clustered?

In many places it's recommended that clustered indexes are better utilized when used to select range of rows using BETWEEN statement. When I select joining by foreign key field in such a way that this clustered index is used, I guess, that clusterization should help too because range of rows is being selected even though they all have same clustered key value and BETWEEN is not used.
Considering that I care only about that one select with join and nothing else, am I wrong with my guess ?
Discussing this type of issue in the absolute isn't very useful.
It is always a case-by-case situation !
Essentially, access by way of a clustered index saves one indirection, period.
Assuming the key used in the JOIN, is that of the clustered index, in a single read [whether from an index seek or from a scan or partial scan, doesn't matter], you get the whole row (record).
One problem with clustered indexes, is that you only get one per table. Therefore you need to use it wisely. Indeed in some cases, it is even wiser not to use any clustered index at all because of INSERT overhead and fragmentation (depending on the key and the order of new keys etc.)
Sometimes one gets the equivalent benefits of a clustered index, with a covering index, i.e. a index with the desired key(s) sequence, followed by the column values we are interested in. Just like a clustered index, a covering index doesn't require the indirection to the underlying table. Indeed the covering index may be slightly more efficient than the clustered index, because it is smaller.
However, and also, just like clustered indexes, and aside from the storage overhead, there is a performance cost associated with any extra index, during INSERT (and DELETE or UPDATE) queries.
And, yes, as indicated in other answers, the "foreign-key-ness" of the key used for the clustered index, has absolutely no bearing on the the performance of the index. FKs are constraints aimed at easing the maintenance of the integrity of the database but the underlying fields (columns) are otherwise just like any other field in the table.
To make wise decisions about index structure, one needs
to understands the way the various index types (and the heap) work
(and, BTW, this varies somewhat between SQL implementations)
to have a good image of the statistical profile of the database(s) at hand:
which are the big tables, which are the relations, what's the average/maximum cardinality of relation, what's the typical growth rate of the database etc.
to have good insight regarding the way the database(s) is (are) going to be be used/queried
Then and only then, can one can make educated guesses about the interest [or lack thereof] to have a given clustered index.
I would ask something else: would it be wise to put my clustered index on a foreign key column just to speed a single JOIN up? It probably helps, but..... at a price!
A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:
narrow
static (never changes)
unique
if ever possible: ever increasing
INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.
Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....
Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!
So those are my criteria - put your clustering key on a narrow, stable, unique, hopefully ever-increasing column. If your foreign key column matches those - perfect!
However, I would not under any circumstances put my clustering key on a wide or even compound foreign key. Remember: the value(s) of the clustering key are being added to each and every non-clustered index entry on that table! If you have 10 non-clustered indices, 100'000 rows in your table - that's one million entries. It makes a huge difference whether that's a 4-byte integer, or a 200-byte VARCHAR - HUGE. And not just on disk - in server memory as well. Think very very carefully about what to make your clustered index!
SQL Server might need to add a uniquifier - making things even worse. If the values will ever change, SQL Server would have to do a lot of bookkeeping and updating all over the place.
So in short:
putting an index on your foreign keys is definitely a great idea - do it all the time!
I would be very very careful about making that a clustered index. First of all, you only get one clustered index, so which FK relationship are you going to pick? And don't put the clustering key on a wide and constantly changing column
An index on the FK column will help the JOIN because the index itself is ordered: clustered just means that the data on disk (leaf) is ordered rather then the B-tree.
If you change it to a covering index, then clustered vs non-clustered is irrelevant. What's important is to have a useful index.
It depends on the database implementation.
For SQL Server, a clustered index is a data structure where the data is stored as pages and there are B-Trees and are stored as a separate data structure. The reason you get fast performance, is that you can get to the start of the chain quickly and ranges are an easy linked list to follow.
Non-Clustered indexes is a data structure that contains pointers to the actual records and as such different concerns.
Refer to the documentation regarding Clustered Index Structures.
An index will not help in relation to a Foreign Key relationship, but it will help due to the concept of "covered" index. If your WHERE clause contains a constraint based upon the index. it will be able to generate the returned data set faster. That is where the performance comes from.
The performance gains usually come if you are selecting data sequentially within the cluster. Also, it depends entirely on the size of the table (data) and the conditions in your between statement.

What column should the clustered index be put on?

Lately, I have been doing some reading on indexes of all types and the main advice is to put the clustered index on the primary key of the table, but what if the primary key actually is not used in a query (via a select or join) and is just put for purely relational purposes, so in this case it is not queried against. Example, say I have a car_parts table and it contains 3 columns, car_part_id, car_part_no, and car_part_title. car_part_id is the unique primary key identity column. In this case car_part_no is unique as well and is most likely car_part_title. car_part_no is what is most queried against, so doesn't it make sense to put the clustered index on that column instead of car_part_id? The basics of the question is what column should actually have the clustered index since you are only allowed one of them?
An index, clustered or non clustred, can be used by the query optimizer if and only if the leftmost key in the index is filtered on. So if you define an index on columns (A, B, C), a WHERE condition on B=#b, on C=#c or on B=#b AND C=#c will not fully leverage the index (see note). This applies also to join conditions. Any WHERE filter that includes A will consider the index: A=#a or A=#a AND B=#b or A=#a AND C=#c or A=#a AND B=#b AND C=#c.
So in your example if you make the clustred index on part_no as the leftmost key, then a query looking for a specific part_id will not use the index and a separate non-clustered index must exist on part-id.
Now about the question which of the many indexes should be the clustered one. If you have several query patterns that are about the same importance and frequency and contradict each other on terms of the keys needed (eg. frequent queries by either part_no or part_id) then you take other factors into consideration:
width: the clustered index key is used as the lookup key by all other non-clustered indexes. So if you choose a wide key (say two uniquidentifier columns) then you are making all the other indexes wider, thus consuming more space, generating more IO and slowing down everything. So between equaly good keys from a read point of view, choose the narrowest one as clustered and make the wider ones non-clustered.
contention: if you have specific patterns of insert and delete try to separate them physically so they occur on different portions of the clustered index. Eg. if the table acts as a queue with all inserts at one logical end and all deletes at the other logical end, try to layout the clustered index so that the physical order matches this logical order (eg. enqueue order).
partitioning: if the table is very large and you plan to deploy partioning then the partitioning key must be the clustered index. Typical example is historical data that is archived using a sliding window partitioning scheme. Even thow the entities have a logical primary key like 'entity_id', the clustred index is done by a datetime column that is also used for the partitioning function.
stability: a key that changes often is a poor candidate for a clustered key as each update the clustered key value and force all non-clustered indexes to update the lookup key they store. As an update of a clustered key will also likely relocate the record into a different page it can cause fragmentation on the clustered index.
Note: not fully leverage as sometimes the engine will choose an non-clustered index to scan instead of the clustered index simply because is narrower and thus has fewer pages to scan. In my example if you have an index on (A, B, C) and a WHERE filter on B=#b and the query projects C, the index will be likely used but not as a seek, as a scan, because is still faster than a full clustered scan (fewer pages).
Kimberly Tripp is always one of the best sources on insights on indexing.
See her blog post "Ever-increasing clustering key - the Clustered Index Debate - again!" in which she quite clearly lists and explains the main requirements for a good clustering key - it needs to be:
Unique
Narrow
Static
and best of all, if you can manage:
ever-increasing
Taking all this into account, an INT IDENTITY (or BIGINT IDENTITY if you really need more than 2 billion rows) works out to be the best choice in the vast majority of cases.
One thing a lot of people don't realize (and thus don't take into account when making their choice) is the fact that the clustering key (all the columns that make up the clustered index) will be added to each and every index entry for each and every non-clustered index on your table - thus the "narrow" requirement becomes extra important!
Also, since the clustering key is used for bookmark lookups (looking up the actual data row when a row is found in a non-clustered index), the "unique" requirement also becomes very important. So important in fact, that if you choose a (set of) column(s) that is/are not guaranteed to be unique, SQL Server will add a 4-byte uniquefier to each row --> thus making each and every of your clustered index keys extra wide ; definitely NOT a good thing.
Marc
Clustered indexes are good when you query ranges of data. For example
SELECT * FROM theTable WHERE age BETWEEN 10 AND 20
The clustered index arranges rows in the particular order on your computer disk. That's why rows with age = 10 will be next to each other, and after them there will be rows with age = 11, etc.
If you have exact select, like this:
SELECT * FROM theTable WHERE age = 20
the non-clustered index is also good. It doesn't rearrange data on your computer disk, but it builds special tree with a pointers to the rows you need.
So it strongly depends on the type of queries you perform.
Keep in mind the usage patterns; If you are almost always querying the DB on the car_part_no, then it would probably be beneficial for it to be clustered on that column.
However, don't forget about joins; If you are most often joining to the table and the join uses the car_part_id field, then you have a good reason to keep the cluster on car_part_id.
Something else to keep in mind (less so in this case, but generally when considering clustered indexes) is that the clustered index will appear implicitly in every other index on the table; So for example, if you were to index car_part_title, that index will also include the car_part_id implicitly. This can affect whether or not an index covers a query and also affects how much disk space the index will take (which affects memory usage, etc).
The clustered index should go on the column that will be the most queried. This includes joins, as a join must access the table just like a direct query, and find the rows indicated.
You can always rebuild your indexes later on if your application changes and you find you need to optimize a table with a different index structure.
Some additional guidelines for deciding on what to cluster your table on can be found on MSDN here: Clustered Index Design Guidelines.

SQL Server 2000 Index - Clustered vs Non Clustered

I have inherited a database where there are clustered indexes and additional duplicate indexes for each of the clustered index.
i.e
IX_PrimaryKey is a clustered index on the column ID.
IX_ID is a non clustered index on the column ID.
I want to clean up these duplicate non clustered indexes and I wanted to check to see if anyone could think of a reason to do this.
Can anyone think of a performance benefit for doing this?
For exact same indexes, there's no performance gain. Actually, it incurs performance loss in insertion and updates. However, if there are multicolumn indexes with different column order, there might be a valid reason for them.
Maybe I'm not thinking hard enough, but I can't see any reason to do this; the nature of the clustered index is that the data is organized in the order of the index. It seems that the extra index is a complete waste.
Digging through BOL and watching this question, though ...
There seems no sensible reason for doing this, and there is a performance hit.
The only thing I could think of to do this is to create an index with an incredibly narrow row width so that the rows per page was very high, making it very quick to scan / seek. But since it contains no other fields (except the clustered key, which is the same value) I still cannot see a reason for it.
It's quite possible the original creator was not aware that the PK was defaulting to a clustered index and created an NC index without realising it was a duplicate.
I presume what would have happened is that SQL Server would have automatically created clustered index when a primary key constraint was specified (this would happen if another index (non-clustered/clustered) is not present already) and then some one might have created a non-clustered index for the primary key column.
Such a scenario would:
Have some adverse effect on performance as indexes are updated when inserts/deletes/updates happen.
Use additional disk space.
Might lead to deadlocks.
Would contribute to more time in backup/restore of database.
cheers
It will be a waste to create a clustered primary key. Unless you have query that search for records using WHERE ID = 10 ?
You may want to create a clustered index on the column which will be frequently queried on WHERE City = 'Sydney'. Clustered means that SQL will group the data in the table based on the clustered index. By grouping the City values in the table means SQL can search for data quicker.
Storing two indexes over the same data is a waste of disk space and the processing needed to maintain the data.
However, I can imagine a product which depends on the existence of an index named IX_PrimaryKey. E.G.
string queryPattern = "select * from {0} as t with (index(IX_PrimaryKey))";
You can make the argument that the clustered index itself occupies much less space than the others, since the leaf is the actual data. On the other hand, the clustered index can be more susceptible to page splitting, and some indexes are better non-clustered.
Putting this together, I can definitely think of scenarios where removing the duplicate indexes would be a Bad Thing:
Code like above which depends on a known index name.
Code which can alter the clustered index to any of the non-clustered indexes.
Code which uses the presence/absence of IX_PrimaryKey to treat the table in a certain way.
I don't consider any of these good design, but I can definitely imagine someone doing it. (Have you posted this to DailyWTF?)
There are cases where it makes sense to have overlapping indexes which are not identical:
create index IX_1 on table1 (ID)
create index IX_2 on table1 (ID, TYPE, ORDER_DATE, TOTAL_CHARGES)
If you are looking up strictly by ID, SQL can optimize and use IX_1. If you are running a query based on ID, TYPE, ORDER_DATE and summing up TOTAL_CHARGES, SQL can use IX_2 as a "covering index", satisfying all the query details from the index without ever touching the table. Generally this is something you add in the course of performance tuning, after extensive testing.
Looking at your given example of two indexes on exactly the same field, I don't see a great fit. Perhaps SQL can use IX_ID as a "covering index" when checking for the existence of a value and bypass some blocking on IX_PrimaryKey?