Having an Identity column in a table with primary key with varchar type - sql

I was told to create an autID identity column in the table with GUID varchar(40) as the primary key and use the autID column as a reference key to help in the join process. But is that a good approach?
This causes a lot of problems like this
CREATE TABLE OauthClientInfo
(
autAppID INT IDENTITY(1,1)
strClientID VARCHAR(40), -- GUID
strClientSecret VARCHAR(40)
)
CREATE TABLE OAuth_AuthToken
(
autID INT IDENTITY(1,1)
strAuthToken VARCHAR(40),
autAppID_fk INT
FOREIGN KEY REFERENCES OauthClientInfo(autAppID)
)
I was told that having autAppID_fk helps in the joins vs having strClientID_fk of varchar(40), but my point to defend is we unnecessarily adding a new id as a reference that some times forces to make joins.
Example, to know what is the strClientID that the strAuthToken belongs, if we have strClientID_fk as the reference key then the OAuth_AuthToken table data make sense a lot for me. Please comment your views on this.

I was told to create an autID identity column in the table with GUID varchar
(40) as the primary key and use the autID column as a reference key to help in the join process. But is that a good approach?
You were told this by someone that confuses clustering and primary keys. They are not one and the same, despite the confusing implementation of the database engine that "helps" the lazy developer.
You might get arguments about adding an identity column to every table and designating it as the primary key. I'll disagree with all of this. One does not BLINDLY do anything of this type in a schema. You do the proper analysis, identify (and enforce) any natural keys, and then you decide an whether a synthetic key is both useful and needed. And then you determine which columns to use for the clustered index (because you only have one). And then you verify the appropriateness of your decisions based on how efficient and effective your schema is under load by testing. There are no absolute rules about how to implement your schema.
Many of your indexing (and again note - indexing and primary key are completely separate things) choices will be affected by how your tables are updated over time. Do you have hotspots that need to be minimized? Does your table experience lots of random inserts, updates, and deletes over time? Or maybe just lots of inserts but relatively few updates or deletes? These are just some of the factors that guide your decision.

You need to use UNIQUEIDENTIFIER data type for GUID columns not VARCHAR

As far as I have read, Auto increment int is the most suitable column for clustered index.
And strClientID is the worst candidate for PK or cluster index.
Most importantly you haven't mention the purpose of StrClientID. What kind of data does it hold, how does it get populated?

Related

Setting up foreign key with different datatype

If I create two tables and I want to set one column as foreign key to another table column why the hell am I allowed to set foreign key column datatype?
It just doesn't make any sense or am I missing something? Is there any scenario where column with foreign keys has different datatype on purpose?
Little more deeper about my concerns, I tried to use pgadmin to build some simple Postgres DB. I made first table with primary key serial datatype. Then I tried to make foreign key but what datatype? I have seen somewhere serial is bigint unsigned. But this option doesn't even exists in pgadmin. Of course I could use sql but then why am I using gui? So I tried Navicat instead, same problem. I feel like with every choice I do another mistake in my DB design...
EDIT:
Perhaps I asked the question wrong way.
I was allowed to do build structure:
CREATE TABLE user
(
id bigint NOT NULL,
CONSTRAINT user_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE book
(
user integer,
CONSTRAINT dependent_user_fkey FOREIGN KEY (user)
REFERENCES user (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
I insert some data to table user:
INSERT INTO user(id)
VALUES (5000000000);
But I can't cast following insert:
INSERT INTO book(user)
VALUES (5000000000);
with ERROR: integer out of range which is understandable, but obvious design error.
And my question is: Why when we set CONSTRAINT, data types are not being validated. If I'm wrong, answer should contain scenario where it is useful to have different data types.
Actually it does make sense here is why:
In a table, you can in fact set any column as its primary key. So it could be integer, double, string, etc. Even though nowadays, we mostly use either integers or, more recently, strings as primary key in a table.
Since the foreign key is pointing to another table's primary key, this is why you need to specify the foreign key's datatype. And it obviously needs to be the same datatype.
EDIT:
SQL implementations are lax on this case as we can see: they do allow compatible types (INT and BIG INT, Float or DECIMAL and DOUBLE) but at your own risk. Just as we can see in your example, below.
However, SQL norms do specify that both datatypes must be the same.
If datatype is character, they must have the same length, otherwise, if it is integer, they must have the same size and must both be signed or both unsigned.
You can see by yourself over here, a chapter from a MySQL book published in 2003.
Hope this answers your question.
To answer your question of why you'd ever want different type for a foreign vs. primary key...here is one scenario:
I'm in a situation where an extremely large postgres table is running out of integer values for its id sequence. Lots of other, equally large tables have a foreign key to that parent table.
We are upsizing the ID from integer to bigint, both in the parent table and all the child tables. This requires a full table rewrite. Due to the size of the tables and our uptime commitments and maintenance window size, we cannot rewrite all these tables in one window. We have about three months before it blows up.
So between maintenance windows, we will have primary keys and foreign keys with the same numeric value, but different size columns. This works just fine in our experience.
Even outside an active migration strategy like this, I could see creating a new child table with a bigint foreign key, with the anticipation that "someday" the parent table will get its primary key upsized from integer to bigint.
I don't know if there is any performance penalty with mismatched column sizes. That question is actually what brought me to this page, as I've been unable to find guidance on it online.
(Tangent: Never create any table with an integer id. Go with bigint, no matter what you think your data will look like in ten years. You're welcome.)

SQL Guid Primary Key Join Performance

I'm currently using GUIDs as a NONCLUSTERED PRIMARY KEY alongside an INT IDENTITY column.
The GUIDs are required to allow offline creation of data and synchronisation - which is how the entire database is populated.
I'm aware of the implications of using a GUID as a clustered primary key, hence the integer clustered index but does using a GUID as a primary key and therefore foreign keys on other tables have significant performance implications?
Would a better option to use an integer primary/foreign key, and use the GUID as a client ID which has a UNIQUE INDEX on each table? - My concern there is that entity framework would require loading the navigation properties in order to get the GUID of the related entity without significant alteration to the existing code.
The database/hardware in question is SQL Azure.
You can also create foreign keys against unique key constraints, which then gives you the option to foreign key to the ID identity as an alternative to the Guid.
i.e.
Create Table SomeTable
(
UUID UNIQUEIDENTIFIER NOT NULL,
ID INT IDENTITY(1,1) NOT NULL,
CONSTRAINT PK PRIMARY KEY NONCLUSTERED (UUID),
CONSTRAINT UQ UNIQUE (ID)
)
GO
Create Table AnotherTable
(
SomeTableID INT,
FOREIGN KEY (SomeTableID) REFERENCES SomeTable(ID)
)
GO
Edit
Assuming that your centralized database is a Mart, and that only batch ETL is done from the source databases, if you do your ETL directly to the central database (i.e. not via Entity Framework), given that all your tables have UUID FK's after re-population from the distributed databases, you'll need to either map the INT UKCs during ETL or fix them up after the import (which would require a temporary NOCHECK constraint step on the INT FK's).
Once ETL is loaded and INT keys are mapped, I would suggest you ignore / remove the UUID's from your ORM model - you would need to regenerate your EF navigation on the INT keys.
A different solution would be required if you update the central database directly or do continual ETL and do use EF for the ETL itself. In this case, it might be less total I/O just to leave the PK GUID as FKs for RI, drop the INT FK's altogether, and choose other suitable columns for clustering (minimizing page reads).
GUID have important implications, yes. Your index is nonclustered, but the index itself will be quickly fragmented, and indexes on the foreign keys will be too. The size is also a concern : 16 Bytes instead of a 4 Bytes integer.
You can use the NEWSEQUENTIALID() function as the default value for your column to make it less random and diminish fragmentation.
But yes, I'd say that using an integer as your primary key and for references will be the best solution.
Generally speaking, it is preferable to use INT for Primary Key / Foreign Key fields, whether or not these fields are the leading field in Clustered indexes. The issue has to do with JOIN performance and even if you use UNIQUEINDENTIFIER as NonClustered or even if you used NEWSEQUENTIALID() to reduce fragmentation, as the tables get larger it will be more scalable to JOIN between INT fields. (Please note that I am not saying that PK / FK fields should always be INT as sometimes there are perfectly valid natural keys to use).
In your case, given the concern about Entity Framework and generating the GUIDs in the app and not in the DB, go with your alternate suggestion of using INT as the PK / FK fields, but rather than have the UNIQUEIDENTIFIER in all tables, only put it in the main user / customer info table. I would think that you should be able to do a one-time lookup of the customer INT identifier based on the GUID, cache that value, and then use the INT value for all remaining operations. And yes, be sure there is a UNIQUE, NONCLUSTERED index on the GUID field.
That all being said, if your tables will never (and I mean NEVER as opposed to just not in the first 2 years) grow beyond maybe 100,000 rows each, then using UNIQUEIDENTIFIER is less of a concern as small volumes of rows generally perform ok (given moderately decent hardware that is not overburdened with other processes or low on memory). Obviously, the point at which JOIN performance degrades due to using UNIQUEIDENTIFIER will greatly depend on the specifics of the system: hardware as well as what types of queries, how the queries are written, and how much load on the system.

Identity field and primary key in SQL Server when values are unique

When a set of values that will be stored in a table have a name or a code that should be unique across the system, should it be created with a primary key of ID auto increment (int)?
Take the situation of State Abbreviations. Other than consistency, what would be the purpose of an ID on the table that was the primary key other than the state name or abbreviation?
If for example the foreign key from an shipping address referenced the state abbreviation that is not mutable then ... is there a purpose for having an auto increment int ID?
You highlighted one positive aspect of a separate table: consistency. It is much easier to have this:
CREATE TABLE dbo.States
(
StateID TINYINT PRIMARY KEY,
Name VARCHAR(32),
Abbreviation CHAR(2)
);
CREATE TABLE dbo.CustomerAddresses
(
AddressID INT PRIMARY KEY,
...,
StateID TINYINT NOT NULL FOREIGN KEY REFERENCES dbo.States(StateID)
);
Than to have a trigger or check constraint like:
CHECK StateAbbreviation IN ('AL', 'AK', /* 50+ more states/territories... */)
Now, with something static and small like a 2-character state abbreviation, this design might make more sense, eliminating some unnecessary mapping between the abbreviations and some surrogate ID:
CREATE TABLE dbo.States
(
Abbreviation CHAR(2) PRIMARY KEY,
Name VARCHAR(32)
);
CREATE TABLE dbo.CustomerAddresses
(
AddressID INT PRIMARY KEY,
...,
StateAbbreviation CHAR(2) FOREIGN KEY REFERENCES dbo.States(Abbreviation)
);
This constrains the data to the known set of states, allows you to store the actual data in the table (which can eliminate a lot of joins in queries), actually saves you some space, and avoids having any messy hard-coded check constraints (or constraints using UDFs, or triggers validating the data).
That all said, there is no magic blanket answer that satisfies all designs. As your string gets larger, it can make more sense to use an integer instead of just storing the string. A counter-example would be storing all of the User Agent strings from your web logs - it makes a lot of sense to store the same string once and assign an integer to it, than to store the same 255-character string over and over and over again.
Other things that can make this design troublesome:
What if you expand beyond the US later?
Forget about state abbreviations for a moment (which are pretty static); what if your lookups are things that do change frequently?
State Abbreviation is a rare example of a good non-increment primary key for the following reasons:
They are small (2-character)
They don't change
The set of values is relatively static - new records are unlikely
Just because the natural key is unique doesn't make it a good candidate for the primary key.
Even real-world values that are unique (like SSN) may nod be good candidates if they are entered in by humans. For example, suppose someone enters in a bunch of related data for a person, then get a letter that the SSN is wrong - now you can't just update the primary key - you need to update all of the foreign keys as well!
As a general rule (which may not apply in every single case), it's better to use integers as primary keys for performance reasons. So if your unique key is a string, create an autoincrement primary key.
Also, states don't have to be necessarily unique. It's true in one country but when you look at all countries in the world, same abbreviations may happen.
EDIT
I can't find a very good evidence of string vs. integer performance but take a look e.g. in here: Strings as Primary Keys in SQL Database
Having said that, there's never a lot of states so performance gain will be small in this case.

SQL primary key - complex primary or string with concatenation?

I have a table with 16 columns. It will be most frequently used table in web aplication and it will contain about few hundred tousand rows. Database is created on sql server 2008.
My question is choice for primary key. What is quicker? I can use complex primary key with two bigint-s or i can use one varchar value but i will need to concatenate it after?
There are many more factors you must consider:
data access prevalent pattern, how are you going to access the table?
how many non-clustered indexes?
frequency of updates
pattern of updates (sequential inserts, random)
pattern of deletes
All these factors, and specially the first two, should drive your choice of the clustered key. Note that the primary key and clustered key are different concepts, often confused. Read up my answer on Should I design a table with a primary key of varchar or int? for a lengthier discussion on the criteria that drive a clustered key choice.
Without any information on your access patterns I can answer very briefly and concise, and actually correct: the narrower key is always quicker (for reasons of IO). However, this response bares absolutely no value. The only thing that will make your application faster is to choose a key that is going to be used by the query execution plans.
A primary key which does not rely on any underlying values (called a surrogate key) is a good choice. That way if the row changes, the ID doesn't have to, and any tables referring to it (Foriegn Keys) will not need to change. I would choose an autonumber (i.e. IDENTITY) column for the primary key column.
In terms of performance, a shorter, integer based primary key is best.
You can still create your clustered index on multiple columns.
Why not just a single INT auto-generated primary key? INT is 32-bit, so it can handle over 4 billion records.
CREATE TABLE Records (
recordId INT NOT NULL PRIMARY KEY,
...
);
A surrogate key might be a fine idea if there are foreign key relationships on this table. Using a surrogate will save tables that refer to it from having to duplicate all those columns in their tables.
Another important consideration is indexes on columns that you'll be using in WHERE clauses. Your performance will suffer if you don't. Make sure that you add appropriate indexes, over and above the primary key, to avoid table scans.
What do you mean quicker? if you need to search quicker, you can create index for any column or create full text search. the primary key just make sure you do not have duplicated records.
The decision relies upon its use. If you are using the table to save data mostly and not retrieve it, then a simple key. If you are mostly querying the data and it is mostly static data where the key values will not change, your index strategy needs to optimize the data to the most frequent query that will be used. Personally, I like the idea of using GUIDs for the primary key and an int for the clustered index. That allows for easy data imports. But, it really depends upon your needs.
Lot’s of variables you haven’t mentioned; whether the data in the two columns is “natural” and there is a benefit in identifying records by a logical ID, if disclosure of the key via a UI poses a risk, how important performance is (a few hundred thousand rows is pretty minimal).
If you’re not too fussy, go the auto number path for speed and simplicity. Also take a look at all the posts on the site about SQL primary key types. Heaps of info here.
Is it a ER Model or Dimensional Model. In ER Model, they should be separate and should not be surrogated. The entire record could have a single surrogate for easy references in URLs etc. This could be a hash of all parts of the composite key or an Identity.
In Dimensional Model, also they must be separate and they all should be surrogated.

Database Design and the use of non-numeric Primary Keys

I'm currently in the process of designing the database tables for a customer & website management application. My question is in regards to the use of primary keys as functional parts of a table (and not assigning "ID" numbers to every table just because).
For example, here are four related tables from the database so far, one of which uses the traditional primary key number, the others which use unique names as the primary key:
--
-- website
--
CREATE TABLE IF NOT EXISTS `website` (
`name` varchar(126) NOT NULL,
`client_id` int(11) NOT NULL,
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
`notes` text NOT NULL,
`website_status` varchar(26) NOT NULL,
PRIMARY KEY (`name`),
KEY `client_id` (`client_id`),
KEY `website_status` (`website_status`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
--
-- website_status
--
CREATE TABLE IF NOT EXISTS `website_status` (
`name` varchar(26) NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `website_status` (`name`) VALUES
('demo'),
('disabled'),
('live'),
('purchased'),
('transfered');
--
-- client
--
CREATE TABLE IF NOT EXISTS `client` (
`id` int(11) NOT NULL auto_increment,
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
`client_status` varchar(26) NOT NULL,
`firstname` varchar(26) NOT NULL,
`lastname` varchar(46) NOT NULL,
`address` varchar(78) NOT NULL,
`city` varchar(56) NOT NULL,
`state` varchar(2) NOT NULL,
`zip` int(11) NOT NULL,
`country` varchar(3) NOT NULL,
`phone` text NOT NULL,
`email` varchar(78) NOT NULL,
`notes` text NOT NULL,
PRIMARY KEY (`id`),
KEY `client_status` (`client_status`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;
--
-- client_status
---
CREATE TABLE IF NOT EXISTS `client_status` (
`name` varchar(26) NOT NULL,
PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `client_status` (`name`) VALUES
('affiliate'),
('customer'),
('demo'),
('disabled'),
('reseller');
As you can see, 3 of the 4 tables use their 'name' as the primary key. I know that these will always be unique. In 2 of the cases (the *_status tables) I am basically using a dynamic replacement for ENUM, since status options could change in the future, and for the 'website' table, I know that the 'name' of the website will always be unique.
I'm wondering if this is sound logic, getting rid of table ID's when I know the name is always going to be a unique identifier, or a recipe for disaster? I'm not a seasoned DBA so any feedback, critique, etc. would be extremely helpful.
Thanks for taking the time to read this!
There are 2 reasons I would always add an ID number to a lookup / ENUM table:
If you are referencing a single column table with the name then you may be better served by using a constraint
What happens if you wanted to rename one of the client_status entries? e.g. if you wanted to change the name from 'affiliate' to 'affiliate user' you would need to update the client table which should not be necessary. The ID number serves as the reference and the name is the description.
In the website table, if you are confident that the name will be unique then it is fine to use as a primary key. Personally I would still assign a numeric ID as it reduces the space used in foreign key tables and I find it easier to manage.
EDIT:
As stated above, you will run into problems if the website name is renamed. By making this the primary key you will be making it very difficult if not impossible for this to be changed at a later date.
When making natural PRIMARY KEY's, make sure their uniqueness is under your control.
If you're absolutely sure you will never ever have uniqueness violation, then it's OK to use these values as PRIMARY KEY's.
Since website_status and client_status seem to be generated and used by you and only by you, it's acceptable to use them as a PRIMARY KEY, though having a long key may impact performance.
website name seems be under control of the outer world, that's why I'd make it a plain field. What if they want to rename their website?
The counterexamples would be SSN and ZIP codes: it's not you who generates them and there is no guarantee that they won't be ever duplicated.
Kimberly Tripp has an Excellent series of blog articles (GUIDs as PRIMARY KEYs and/or the clustering key and The Clustered Index Debate Continues) on the issue of creating clustered indexes, and choosing the primary key (related issues, but not always exactly the same). Her recommendation is that a clustered index/primary key should be:
Unique (otherwise useless as a key)
Narrow (the key is used in all non-clustered indexes, and in foreign-key relationships)
Static (you don't want to have to change all related records)
Always Increasing (so new records always get added to the end of the table, and don't have to be inserted in the middle)
Using "Name" as your key, while it seems to satisfy #1, doesn't satisfy ANY of the other three.
Even for your "lookup" table, what if your boss decides to change all affiliates to partners instead? You'll have to modify all rows in the database that use this value.
From a performance perspective, I'm probably most concerned that a key be narrow. If your website name is actually a long URL, then that could really bloat the size of any non-clustered indexes, and all tables that use it as a foreign key.
Besides all the other excellent points that have already been made, I would add one more word of caution against using large fields as clustering keys in SQL Server (if you're not using SQL Server, then this probably doesn't apply to you).
I add this because in SQL Server, the primary key on a table by default also is the clustering key (you can change that, if you want to and know about it, but most of the cases, it's not done).
The clustering key that determines the physical ordering of the SQL Server table is also being added to every single non-clustered index on that table. If you have only a few hundred to a few thousand rows and one or two indices, that's not a big deal. But if you have really large tables with millions of rows, and potentially lots of indices to speed up the queries, this will indeed cause a lot of disk space and server memory to be wasted unnecessarily.
E.g. if your table has 10 million rows, 10 non-clustered indices, and your clustering key is 26 bytes instead of 4 (for an INT), then you're wasting 10 mio. by 10 by 22 bytes for a total of 2.2 billion bytes (or 2.2 GBytes approx.) - that's not peanuts anymore!
Again - this only applies to SQL Server, and only if you have really large tables with lots of non-clustered indices on them.
Marc
"If you're absolutely sure you will never ever have uniqueness violation, then it's OK to use these values as PRIMARY KEY's."
If you're absolutely sure you will never ever have uniqueness violation, then don't bother to define the key.
Personally, I think you will run into trouble using this idea. As you end up with more parent child relationships, you end up with a huge amount of work when the names change (As they always will sooner or later). There can be a big performance hit when having to update a child table that has thousands of rows when the name of the website changes. And you have to plan for how do make sure that those changes happen. Otherwise, the website name changes (oops we let the name expire and someone else bought it.) either break because of the foreign key constraint or you need to put in an automated way (cascade update) to propagate the change through the system. If you use cascading updates, then you can suddenly bring your system to a dead halt while a large chage is processed. This is not considered to be a good thing. It really is more effective and efficient to use ids for relationships and then put unique indexes on the name field to ensure they stay unique. Database design needs to consider maintenance of the data integrity and how that will affect performance.
Another thing to consider is that websitenames tend to be longer than a few characters. This means the performance difference between using an id field for joins and the name for joins could be quite significant. You have to think of these things at the design phase as it is too late to change to an ID when you have a production system with millions of records that is timing out and the fix is to completely restructure the databse and rewrite all of the SQL code. Not something you can fix in fifteen minutes to get the site working again.
This just seems like a really bad idea. What if you need to change the value of the enum? The idea is to make it a relational database and not a set of flat files. At this point, why have the client_status table? Moreover, if you are using the data in an application, by using a type like a GUID or INT, you can validate the type and avoid bad data (in so far as validating the type). Thus, it is another of many lines to deter hacking.
I would argue that a database that is resistant to corruption, even if it runs a little slower, is better than one that isn’t.
In general, surrogate keys (such as arbitrary numeric identifiers) undermine the integrity of the database. Primary keys are the main way of identifying rows in the database; if the primary key values are not meaningful, the constraint is not meaningful. Any foreign keys that refer to surrogate primary keys are therefore also suspect. Whenever you have to retrieve, update or delete individual rows (and be guaranteed of affecting only one), the primary key (or another candidate key) is what you must use; having to work out what a surrogate key value is when there is a meaningful alternative key is a redundant and potentially dangerous step for users and applications.
Even if it means using a composite key to ensure uniqueness, I would advocate using a meaningful, natural set of attributes as the primary key, whenever possible. If you need to record the attributes anyway, why add another one? That said, surrogate keys are fine when there is no natural, stable, concise, guaranteed-to-be-unique key (e.g. for people).
You could also consider using index key compression, if your DBMS supports it. This can be very effective, especially for indexes on composite keys (think trie data structures), and especially if the least selective attributes can appear first in the index.
I think I am in agreement with cheduardo. It has been 25 years since I took a course in database design but I recall being told that database engines can more efficiently manage and load indexes that use character keys. The comments about the database having to update thousands of records when a key is changed and on all of the added space being taken up by the longer keys and then having to be transferred across systems, assumes that the key is actually stored in the records and that it does not have to be transferred across systems anyway. If you create an index on a column(s) of a table, I do not think the value is stored in the records of the table (unless you set some option to do so).
If you have a natural key for a table, even if it is changed occassionally, creating another key creates a redundancy that could result in data integrity issues and actually creates even more information that needs to be stored and transferred across systems. I work for a team that decided to store the local application settings in the database. They have an identity column for each setting, a section name, a key name, and a key value. They have a stored procedure (another holy war) to save a setting that ensures it does not appear twice. I have yet to find a case where I would use a setting's ID. I have, however, ended up with multiple records with the same section and key name that caused my application to fail. And yes, I know that could have been avoided by defining a constraint on the columns.
Here few points should be considered before deciding keys in table
Numeric key is more suitable when you
use references ( foreign keys), since
you not using foreign keys, it ok in
your case to use non numeric key.
Non-numeric key uses more space than
numeric keys, can decrease
performance.
Numeric keys make db look simpler to
understand ( you can easily know no
of rows just by looking at last row)
You NEVER know when the company you work for suddenly explodes in growth and you have to hire 5 developers overnight. Your best bet is to use numeric (integer) primary keys because they will be much easier for the entire team to work with AND will help your performance if and when the database grows. If you have to break records out and partition them, you might want to use the primary key. If you are adding records with a datetime stamp (as every table should), and there is an error somewhere in the code that updates that field incorrectly, the only way to confirm if the record was entered in the proper sequence it to check the primary keys. There are probably 10 more TSQL or debugging reasons to use INT primary keys, not the least of which is writing a simple query to select the last 5 records entered into the table.