Should primary key clustered index columns added to the non clustered indexes? - sql

Ok here is a non clustered index features
Now as you can see Id is the Identity column which is primary key and clustered. I can either include it into the index columns and mark index as unique or not include it to the index itself and add it as included columns.
Which one should be selected and why ? thank you

The clustered key is automatically included in the nonclustered index, whether you include it explicitly or not. In other words - don't include it, unless you need to use a predicate that filters on the clustered key and then a couple of other columns (in that order) - in that case it may make sense to force it as the first column, as it'll otherwise be stored physically as the last column.

Related

Primary index vs Clustered index

Is it correct to say that clustered index is an index on a non-key value of a table of records while the records are sorted based on that attribute? Where as primary index is on an attribute that is a key for that table of records and table is sorted based on that attribute?
A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.
A primary index is an index on a set of fields that includes the unique primary key for the field and is guaranteed not to contain duplicates.
Primary key is not necessarily clustered index (although probably in 95% of scenarios it is), while Clustered index is not necessarily is primary key.
In several websites is written that these definitions have a different meaning, but it's not written that in Database System Concepts 7th Edition, by Silberschatz, Korth and Sudarshan. In page 625, Chapter 14, Ordered Indices subject, we can see that have exactly the same meaning
So, it's a common case to have a primary index on a primary key, but it's not the right definition because it can appear in any other field
In mysql, primary key is also clustered index. And it's purely depends upon implementation of database. In some of databases, whole record are structured as cluster index.

Removing Uniqueness from an Index

On one of my tables in SQL Server, In the Indexes folder I have a Clustered Index made from three columns of that table but there is also a Unique checkbox when I go to properties window on that index.
My question is with T-SQL commands how I tell it to drop the Uniqueness part and still keep the index? Is it even possible?
You cannot alter index from unique to non-unique. You can set index to ignore duplicates.
Docs: https://msdn.microsoft.com/en-us/library/ms188388.aspx
You can only recreate index with drop and create commands.
DROP INDEX IndexTest.ci_Test;
CREATE INDEX ci_Test ON IndexTest(Key);
But you should have clustered index on one column (for example on new autoincrement primary key). And you can force uniqueness with unique non-clustered index.

SQL Server indexes still needed when primary key defined on same columns?

When I have a primary key on a column, do I also need a non-clustered index on that same column for querying purposes? Primary keys ARE indexes, aren't they?
Also, if I have an aggregate primary key on two columns, do I need to create indexes on both of those columns for querying purposes?
And, finally, if I will be commonly querying for rows specifying two columns to match, is it best to have one index that includes both columns? Or two separate indexes, one on each?
When a Primary Key is created, a Clustered Index in Created automatically. If you have any JOINS or a WHERE condition on this column, the JOIN as well as the search is faster because the Engine would know the Physical location of the record you are looking for.
In your condition, I would say if you have a primary Key which is a combination of several columns and you would either SEARCH/ JOIN on individual columns you would need a Non Clustered Index..Else assigning a Primary Key will do the trick
Refer this for more information: https://www.simple-talk.com/sql/performance/tune-your-indexing-strategy-with-sql-server-dmvs/
When I have a primary key on a column, do I also need a non-clustered index on that same column for querying purposes? Primary keys ARE indexes, aren't they?
A regular index is a sorted copy of one (or multiple) columns. Being sorted it allows for fast searching. If its underlying values change, it will be re-sorted accordingly, but physical table order stays the same.
A clustered index on the other hand defines physical table order. That's why you only can have one - if its values change, the entire table will be re-sorted accordingly.
Often the primary key also is the clustered index of the table. But not necessarily - the defining property of a primary key is its uniqueness.
Having a clustered and a non-clustered index over the same column is redundant and you should not do it. It increases workload during insert/update/delete, but it does nothing for query performance.
if I have an aggregate primary key on two columns, do I need to create indexes on both of those columns for querying purposes?
That depends whether you ever want to query the second column on its own. An index over (A, B) will do nothing for queries that search for B only, so having a second index over B will be necessary in this case.
Include in the index any extra columns you want to return from the query. If set up smartly, a query can be satisfied by the index alone, saving the DB engine from having to look at the table at all.
Note that this applies to non-clustered indexes. Including extra columns for queries against the clustered index is not necessary, as the clustered index is the table. It naturally contains all columns.
if I will be commonly querying for rows specifying two columns to match, is it best to have one index that includes both columns? Or two separate indexes, one on each?
Have a single index that contains both columns, the most selective (highest variance on unique values) or one that you are most likely to query on its own first, the assisting value second. Sometimes it's necessary to have it both ways - (A, B) and (B, A), it entirely depends on how the table is used.

Should primary keys be always assigned as clustered index

I have a SQLServer table that stores employee details, the column ID is of GUID type while the column EmployeeNumber of INT type. Most of the time I will be dealing with EmployeeNumber while doing joins and select criteria's.
My question is, whether is it sensible to assign PrimaryKey to ID column while ClusteredIndex to EmployeeNumber?
Yes, it is possible to have a non-clustered primary key, and it is possible to have a clustered key that is completely unrelated to the primary key. By default a primary keys gets to be the clustered index key too, but this is not a requirement.
The primary key is a logical concept: is the key used in your data model to reference entities.
The clustered index key is a physical concept: is the order in which you want the rows to be stored on disk.
Choosing a different clustered key is driven by a variety of factors, like key width when you desire a narrower clustered key than the primary key (because the clustered key gets replicated in every non-clustered index. Or support for frequent range scans (common in time series) when the data is frequently accessed with queries like date between '20100101' and '20100201' (a clustered index key on date would be appropriate).
This subject has been discussed here ad nauseam before, see also What column should the clustered index be put on?.
The ideal clustered index key is:
Sequential
Selective (no dupes, unique for each record)
Narrow
Used in Queries
In general it is a very bad idea to use a GUID as a clustered index key, since it leads to mucho fragmentation as rows are added.
EDIT FOR CLARITY:
PK and Clustered key are indeed separate concepts. Your PK does not need to be your clustered index key.
In practical applications in my own experience, the same field that is your PK should/would be your clustered key since it meets the same criteria listed above.
First, I have to say that I have misgivings about the choice of a GUID as the primary key for this table. I am of the opinion that EmployeeNumber would probably be a better choice, and something naturally unique about the employee would be better than that, such as an SSN (or ATIN), which employers must legally obtain anyway (at least in the US).
Putting that aside, you should never base a clustered index on a GUID column. The clustered index specifies the physical order of rows in the table. Since GUID values are (in theory) completely random, every new row will fall at a random location. This is very bad for performance. There is something called 'sequential' GUIDs, but I would consider this a bit of a hack.
Using a clustured index on something else than the primary key will improve performance on SELECT query which will take advantage of this index.
But you will loose performance on UPDATE query, because in most scenario, they rely on the primary key to found the specific row you want to update.
CREATE query could also loose performance because when you add a new row in the middle of the index a lot of row have to be moved (physically). This won't happen on a primary key with an increment as new record will always be added in the end and won't make move any other row.
If you don't know what kind of operation need the most performance, I recommend to leave the clustered Index on the primary key and use nonclustered index on common search criteria.
Clustered indexes cause the data to be physically stored in that order. For this reason when testing for ranges of consecutive rows, clustered indexes help a lot.
GUID's are really bad clustered indexes since their order is not in a sensible pattern to order on. Int Identity columns aren't much better unless order of entry helps (e.g. most recent hires)
Since you're probably not looking for ranges of employees it probably doesn't matter much which is the Clustered index, unless you can segment blocks of employees that you often aren't interested in (e.g. Termination Dates)
Since EmployeeNumber is unique, I would make it the PK. In SQL Server, a PK is often a clustered index.
Joins on GUIDs is just horrible. #JNK answers this well.

Create a mysql primary key without a clustered index?

I'm a SQL Server guy experimenting with MySQL for a large upcoming project (due to licensing) and I'm not finding much information in the way of creating a primary key without a clustered index. All the documentation I've read says on 5.1 says that a primary key is automatically given a clustered index. Since I'm using a binary(16) for the primary key column (GUID), I'd rather not have a clustered index on it. So...
Is it possible to create a primary key without a clustered index? I could always put the clustered index on the date_created column instead, but how do I prevent mysql from creating the clustered index on the primary key automatically?
If not possible, will I be OK performance wise with a unique index on the GUID column and no primary key on the table? I'm planning to use nhibernate here, so I'm not sure if having no primary key is allowed (haven't got that far yet).
It depends on which storage engine you are using. MyISAM tables do not support clustered indices, so primary keys on MyISAM tables are not clustered. The primary key on an InnoDB table, however, is clustered.
You should consult the MySQL Manual for further details about the pros and cons of each storage engine.
You need to have a primary key; if you don't create one yourself, MySQL will create a hidden one for you. You could always just create an AUTO_INCREMENT field for the primary key (this is preferable to having MySQL have hidden fields in your table, I think).
Considering what is said on 13.6.10.1. Clustered and Secondary Indexes, it seems you cannot really define on which column the clustered index is set :
it's either on the PK column
or on the first column with a UNIQUE index that only has non-null values
or on some internal column -- not that usefull in your case ^^
About the second question in your post : no PK on the table, and a UNIQUE index on the GUID ; it might be possible, but it will not change anything about the clustered index : it will still probably be on the GUID column.
Some kind of a hack might be to :
not define a primary key
place a UNIQUE index on your date_created field (if you don't create too many rows in short periods of time, it could be viable... )
Not sure you can place a second UNIQUE index on the GUID... Maybe ^^
And not sure that would change much about the clustered index ; but might be worth a try...