I'm a SQL Server guy experimenting with MySQL for a large upcoming project (due to licensing) and I'm not finding much information in the way of creating a primary key without a clustered index. All the documentation I've read says on 5.1 says that a primary key is automatically given a clustered index. Since I'm using a binary(16) for the primary key column (GUID), I'd rather not have a clustered index on it. So...
Is it possible to create a primary key without a clustered index? I could always put the clustered index on the date_created column instead, but how do I prevent mysql from creating the clustered index on the primary key automatically?
If not possible, will I be OK performance wise with a unique index on the GUID column and no primary key on the table? I'm planning to use nhibernate here, so I'm not sure if having no primary key is allowed (haven't got that far yet).
It depends on which storage engine you are using. MyISAM tables do not support clustered indices, so primary keys on MyISAM tables are not clustered. The primary key on an InnoDB table, however, is clustered.
You should consult the MySQL Manual for further details about the pros and cons of each storage engine.
You need to have a primary key; if you don't create one yourself, MySQL will create a hidden one for you. You could always just create an AUTO_INCREMENT field for the primary key (this is preferable to having MySQL have hidden fields in your table, I think).
Considering what is said on 13.6.10.1. Clustered and Secondary Indexes, it seems you cannot really define on which column the clustered index is set :
it's either on the PK column
or on the first column with a UNIQUE index that only has non-null values
or on some internal column -- not that usefull in your case ^^
About the second question in your post : no PK on the table, and a UNIQUE index on the GUID ; it might be possible, but it will not change anything about the clustered index : it will still probably be on the GUID column.
Some kind of a hack might be to :
not define a primary key
place a UNIQUE index on your date_created field (if you don't create too many rows in short periods of time, it could be viable... )
Not sure you can place a second UNIQUE index on the GUID... Maybe ^^
And not sure that would change much about the clustered index ; but might be worth a try...
Related
Is it correct to say that clustered index is an index on a non-key value of a table of records while the records are sorted based on that attribute? Where as primary index is on an attribute that is a key for that table of records and table is sorted based on that attribute?
A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.
A primary index is an index on a set of fields that includes the unique primary key for the field and is guaranteed not to contain duplicates.
Primary key is not necessarily clustered index (although probably in 95% of scenarios it is), while Clustered index is not necessarily is primary key.
In several websites is written that these definitions have a different meaning, but it's not written that in Database System Concepts 7th Edition, by Silberschatz, Korth and Sudarshan. In page 625, Chapter 14, Ordered Indices subject, we can see that have exactly the same meaning
So, it's a common case to have a primary index on a primary key, but it's not the right definition because it can appear in any other field
In mysql, primary key is also clustered index. And it's purely depends upon implementation of database. In some of databases, whole record are structured as cluster index.
We have a transaction table of over 111m rows that has a clustered composite primary key of...
RevenueCentreID int
DateOfSale smalldatetime
SaleItemID int
SaleTypeID int
...in a SQL 2008 R2 database.
We are going to be truncating and refilling the table soon for an archiving project, so the opportunity to get the indexes right will be once the table has been truncated.
Would it be better to keep the composite primary key or should we move to a unique auto increment primary key?
Most searches on the table are done using the DateOfSale and RevenueCentreID columns. We also often join to the SaleItemID column. We hardly ever use the SaleType column, in fact it is only included in the primary key for uniqueness. We dont care about how long it takes to insert & delete new sales figures(done over night) but rather the speed of returning reports.
A surrogate key serves no purpose here. I suggest a clustered primary key on the columns as listed, and an index on SaleItemID.
In have learned you want and need both a natural key and a surrogate key.
The natural key keeps the business keys unique and is prefect for indexing. where the surrogate key will help with queries and development.
So in your case a surrogate auto incrementing key is good in the fact it will help keep all the rows of data in tact. And a natural key of DateOfSale, RevenueID and maybe ClientID would make a great way of ensuring no duplicate records are being stored and speed up querying because you can index the natural key.
If you don't care about the speed of inserts and deletions, then you probably want multiple indexes which target the queries precisely.
You could create an auto increment primary key as you suggest, but also create indexes as required to cover the reporting queries. Create a unique constraint on the columns you currently have in the key to enforce uniqueness.
Index tuning wizard will help with defining the optimum set of indexes, but it's better to create your own.
Rule of thumb - you can define columns to index, and also "include" columns.
If your report has an OrderBy or a Where clause on a column then you need the index to be defined against these. Any other fields returned in the select should be included columns.
I have a SQLServer table that stores employee details, the column ID is of GUID type while the column EmployeeNumber of INT type. Most of the time I will be dealing with EmployeeNumber while doing joins and select criteria's.
My question is, whether is it sensible to assign PrimaryKey to ID column while ClusteredIndex to EmployeeNumber?
Yes, it is possible to have a non-clustered primary key, and it is possible to have a clustered key that is completely unrelated to the primary key. By default a primary keys gets to be the clustered index key too, but this is not a requirement.
The primary key is a logical concept: is the key used in your data model to reference entities.
The clustered index key is a physical concept: is the order in which you want the rows to be stored on disk.
Choosing a different clustered key is driven by a variety of factors, like key width when you desire a narrower clustered key than the primary key (because the clustered key gets replicated in every non-clustered index. Or support for frequent range scans (common in time series) when the data is frequently accessed with queries like date between '20100101' and '20100201' (a clustered index key on date would be appropriate).
This subject has been discussed here ad nauseam before, see also What column should the clustered index be put on?.
The ideal clustered index key is:
Sequential
Selective (no dupes, unique for each record)
Narrow
Used in Queries
In general it is a very bad idea to use a GUID as a clustered index key, since it leads to mucho fragmentation as rows are added.
EDIT FOR CLARITY:
PK and Clustered key are indeed separate concepts. Your PK does not need to be your clustered index key.
In practical applications in my own experience, the same field that is your PK should/would be your clustered key since it meets the same criteria listed above.
First, I have to say that I have misgivings about the choice of a GUID as the primary key for this table. I am of the opinion that EmployeeNumber would probably be a better choice, and something naturally unique about the employee would be better than that, such as an SSN (or ATIN), which employers must legally obtain anyway (at least in the US).
Putting that aside, you should never base a clustered index on a GUID column. The clustered index specifies the physical order of rows in the table. Since GUID values are (in theory) completely random, every new row will fall at a random location. This is very bad for performance. There is something called 'sequential' GUIDs, but I would consider this a bit of a hack.
Using a clustured index on something else than the primary key will improve performance on SELECT query which will take advantage of this index.
But you will loose performance on UPDATE query, because in most scenario, they rely on the primary key to found the specific row you want to update.
CREATE query could also loose performance because when you add a new row in the middle of the index a lot of row have to be moved (physically). This won't happen on a primary key with an increment as new record will always be added in the end and won't make move any other row.
If you don't know what kind of operation need the most performance, I recommend to leave the clustered Index on the primary key and use nonclustered index on common search criteria.
Clustered indexes cause the data to be physically stored in that order. For this reason when testing for ranges of consecutive rows, clustered indexes help a lot.
GUID's are really bad clustered indexes since their order is not in a sensible pattern to order on. Int Identity columns aren't much better unless order of entry helps (e.g. most recent hires)
Since you're probably not looking for ranges of employees it probably doesn't matter much which is the Clustered index, unless you can segment blocks of employees that you often aren't interested in (e.g. Termination Dates)
Since EmployeeNumber is unique, I would make it the PK. In SQL Server, a PK is often a clustered index.
Joins on GUIDs is just horrible. #JNK answers this well.
This question already has answers here:
sql primary key and index
(11 answers)
Closed 8 years ago.
Does having a primary key column mean there is an index on that column? If so, what kind of index is it?
For SQL Server, which I believe from previous questions is what you're using, when you define a PRIMARY KEY, it will automatically have a index on that column which will default to being a CLUSTERED index. You can define whether it should be a NONCLUSTERED or a CLUSTERED index when you create the constraint.
Yes, a primary key implies an index.
If the primary key is clustered, the index will be part of the main table file. It it's not clustered, it will be part of a separate index file.
It depends on the database.
Some databases either require or automatically create primary key indexes as a way to enforce the uniqueness of a primary key. Others are perfectly happy to perform a full scan of the table.
Which database are you using?
EDIT:
SQLServer (versions 7 - 2008) creates indexes or primary keys - you can control whether or not it is clustered.
Older versions of Oracle (8i,9i) also create indexes when you add a unique key constraint. Newer versions (10g) don't seem to, based on the test case I just looked at.
In any "real" database, yes having a primary key means having a unique index. In some databases, the primary key index can/will cluster on key values too.
In all the DBs I've used, PRIMARY KEY is basically just a UNIQUE index.
Please clear my doubt about this, In SQL Server (2000 and above) is primary key automatically cluster indexed or do we have choice to have non-clustered index on primary key?
Nope, it can be nonclustered. However, if you don't explicitly define it as nonclustered and there is no clustered index on the table, it'll be created as clustered.
One might also add that frequently it's BAD to allow the primary key to be clustered. In particular, when the primary key is assigned by an IDENTITY, it has no intrinsic meaning, so any effort to keep the table arranged accordingly would be wasted.
Consider a table Product, with ProductID INT IDENTITY PRIMARY KEY. If this is clustered, then products that are related in some way are likely to be spread all over the disk. It might be better to cluster by something that we're likely to query based on, like the ManufacturerID or the CategoryID. In either of these cases, a clustered index would (other things being equal) make the corresponding query much more efficient.
On the other hand, the foreign key in a child table that points to this might be a good candidate for clustering (my objection is to the column that actually has the IDENTITY attribute, not its relatives). So in my example above, it's likely that ManufacturerID is a foreign key to a Manufacturer table, where it is set as an IDENTITY. That column shouldn't be clustered, but the column in Product that references it might do so to good advantage.