Can one field in a composite key be dependent on the other? - sql

I am thinking about making a composite key for a table of mine (which would be composed of two fields, fields A and B). However, field B is dependent on field A. Would this composite key violate any database design principles?

Well, yes. It does violate database design principles. Why not just use A? That is, you can always look up the value of B using a JOIN, so a composite foreign key reference is unnecessary. Storing the value of B in referring tables is redundant and inefficient (takes up space in both data pages and index pages).
There are some cases where such a foreign key is useful. You have not provided enough information to know if you have such a case. So, as a general design principle this doesn't sound right. There may be exceptions, so it is not always a bad idea.

Related

<select> for an entity with composite keys - strategy needed

So say I have database table tours (PK tour_id) holding region independent information and tours_regional_details (PK tour_id, region_id) holding region specific information.
Let's say I want to populate select control with entities from tours_regional_details table (my real scenarios are bit different, just imagine this for the sake of simplicity).
So, how would you tackle this? My guts says concatenate PKs into delimited strings, like "pk1|pk2" or "pk1,pk2" and use that as value of select control. While it works, feels dirty and possibly needs additional validation steps before splitting the string again, which again feels dirty.
I don't want to start a composite vs single pk holy war, but may this be a bad database design decision on my part? I always believed identifying relationships and composite keys are there for a reason, but I feel tempted to alter my tables and just stuff them with auto incremental IDs and unique constraints. I'm just not sure what kind of a fresh hell will that introduce.
I am a little bit flabbergasted that I encounter this for the first time now after so many years.
EDIT: Yes, there is a table regions (PK region_id) but is mostly irrelevant for the topic. While in some scenarios two select boxes would make sense, let's say here they don't, let's say I want only one select box and want to select from:
Dummy tour (Region 1)
Dummy tour (Region 2)
Another dummy tour (region 3)
...
Composite primary keys aren't bad database design. In an ideal world, our programming languages and UI libraries would support tuples and relations as first-class values, so you'd be able to assign a pair of values as the value of an option in your dropdown control. However, since they generally only support scalar variables, we're stuck trying to encode or reduce our identifiers.
You can certainly add surrogate keys / autoincrement columns (and unique constraints on the natural keys where available) to every table. It's a very common pattern, most databases I've seen have at least some tables set up like this. You may be able to keep existing composite foreign keys as is, or you may want/need to change them to reference the surrogate primary keys instead.
The risk with using surrogate keys for foreign keys is that your access paths in the database become fixed. For example, let's assume tours_regional_details had a primary key tours_regional_detail_id that's referenced by a foreign key in another table. Queries against this other table would always need to join with tours_regional_details to obtain the tour_id or region_id. Natural keys allow more flexible access paths since identifiers are reused throughout the database. This becomes significant in deep hierarchies of dependent concepts. These are exactly the scenarios where opponents of composite keys complain about the "explosion" of keys, and I can at least agree that it becomes cumbersome to remember and type out joins on numerous columns when writing queries.
You could duplicate the natural key columns into the referencing tables, but storing redundant information requires additional effort to maintain consistency. I often see this done for performance or convenience reasons where surrogate keys were used as foreign keys, since it allows querying a table without having to do all the joins to dereference the surrogate identifiers. In these cases, it might've been better to reference the natural key instead.
If I'm allowed to return to my ideal world, perhaps DBMSs could allow naming and storing joins.
In practice, surrogate keys help balance the complexity we have to deal with. Use them, but don't worship them.

SQL: Primary key column. Artificial "Id" column vs "Natural" columns [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Relational database design question - Surrogate-key or Natural-key?
When I create relational table there is a temptation to choose primary key column the column which values are unique. But for optimization and uniformity purposes I create artifical Id column every time. If there is a column (or columns combination) that should be unique I create Unique Index for that instead of marking them as (composite) primary key column(s).
Is it really a good practice always to prefer artificial "Id" column + indexes instead of natural columns for a primary key?
This is a bit of a religious debate. My personal preference is to have synthetic primary keys rather than natural primary keys but there are good arguments on both sides. Realistically, so long as you are consistent and reasonable, either approach can work well.
If you use natural keys, the two major downsides are the presence of composite keys and mutating primary key values. If you have composite primary keys, you'd obviously have to have multiple columns in each child table. That can get unwieldy from a data model perspective when there are many relationships among entities. But it can also cause grief for people developing queries-- it's awfully easy to create queries that use N-1 of N join conditions and get almost the right result. If you have natural keys, you'll also inevitably encounter a situation where the natural key value changes and you then have to ripple that change through many different entities-- that's vastly more complicated than changing a unique value in the table.
On the other hand, if you use synthetic keys, you're wasting space by adding additional columns, adding additional overhead to maintain an additional index, and you're increasing the risk that you'll get functionally duplicated results. It's awfully easy to either forget to create a unique constraint on the business key or to see that there is a non-unique index on the combination and just assume that it was a unique index. I actually just got bitten by this particular failing a couple days ago-- I had indexed the composite natural key (with a non-unique index) rather than creating a unique constraint. Dumb mistake but one that's relatively easy to make.
From a query writing and naming convention standpoint, I would also tend to prefer synthetic keys because it's nice to know when you're joining tables that the primary key of A is going to be A_ID and the primary key of B is going to be B_ID. That's far more self-documenting than trying to remember that the primary key of A is the combination of A_NAME and A_REVISION_NUMBER and that the primary key of B is B_CODE.
There is little or no difference between a key enforced through a PRIMARY KEY constraint and a key enforced through a UNIQUE constraint. What's important is that you enforce ALL the keys necessary from a data integrity perspective. Usually that means at least one "natural" key (a key exposed to the users/consumers of the data and used to identify the facts about the universe of discourse) per table.
Optionally you might also want to create "technical" keys to support the application and database features rather than the end user (usually called surrogate keys). That should be very much a secondary consideration however. In the interests of simplicity (and very often performance as well) it usually makes sense only to create surrogate keys where you have identified a particular need for them and not before.
It depends on your natural columns. If they are small and steadily increasing, then they are good candidates for the primary key.
Small - the smaller the key, the more values you can get into a single row, and the faster your index scans will be
Steadily increasing - produces fewer index reshuffles as the table grows, improving performance.
My preference is to always use an artificial key.
First it is consistent. Anyone working on your application knows that there is a key and they can make assumptions on it. This makes it easier to understand and maintain.
I've also seen scenarios where the natural key (aka. a string from an HR system that identifies an employee) has to change during the life of the application. If you have an artificial key that links the natural id to your employee record then you only have to change that natural id in the one table. However, if that natural id is a primary key and you have it duplicated across a number of other tables as a foreign key, then you have a mess on your hands.
In my humble opinion, it is always better to have an artificial Id, if I understand properly your meaning of it.
Some people would use, for instance, business significant unique values as their table Id, and I have already read on MSDN, and even in the NHibernate official documentation that a unique business meaningless value is prefered (artificial Id), though you want to create an index on that value for future reference. So, the day the company will change their nomenclature, the system shall still be running flawlessly.
Yes, it is. If nothing else, one of the most important properties of the artificial primary key is opacity, which means the artificial key doesn't reflect any information beyond itself; if you use natural row contents for keys, you wind up exposing that information to things like Web interfaces, which is just a terrible idea on all manner of principle.

Does every table really need an auto-incrementing artificial primary key? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
Almost every table in every database I've seen in my 7 years of development experience has an auto-incrementing primary key. Why is this? If I have a table of U.S. states where each state where each state must have a unique name, what's the use of an auto-incrementing primary key? Why not just use the state name as the primary key? Seems to me like an excuse to allow duplicates disguised as unique rows.
This seems plainly obvious to me, but then again, no one else seems to be arriving at and acting on the same logical conclusion as me, so I must assume there's a good chance I'm wrong.
Is there any real, practical reason we need to use auto-incrementing keys?
This question has been asked numerous times on SO and has been the subject of much debate over the years amongst (and between) developers and DBAs.
Let me start by saying that the premise of you question implies that one approach is universally superior to the other ... this is rarely the case in real life. Surrogate keys and natural keys both have their uses and challenges - and it's important to understand what they are. Whichever choice you make in your system, keep in mind there is benefit to consistency - it makes the data model easier to understand and easier to develop queries and applications for. I also want to say that I tend to prefer surrogate keys over natural keys for PKs ... but that doesn't mean that natural keys can't sometimes be useful in that role.
It is important to realize that surrogate and natural keys are NOT mutually exclusive - and in many cases they can complement each other. Keep in mind that a "key" for a database table is simply something that uniquely identifies a record (row). It's entirely possible for a single row to have multiple keys representing the different categories of constraints that make a record unique.
A primary key, on the other hand, is a particular unique key that the database will use to enforce referential integrity and to represent a foreign key in other tables. There can only be a single primary key for any table. The essential quality of a primary key is that it be 100% unique and non-NULL. A desirable quality of a primary key is that it be stable (unchanging). While mutable primary keys are possible - they cause many problems for database that are better avoided (cascading updates, RI failures, etc). If you do choose to use a surrogate primary key for your table(s) - you should also consider creating unique constraints to reflect the existence of any natural keys.
Surrogate keys are beneficial in cases where:
Natural keys are not stable (values may change over time)
Natural keys are large or unwieldy (multiple columns or long values)
Natural keys can change over time (columns added/removed over time)
By providing a short, stable, unique value for every row, we can reduce the size of the database, improve its performance, and reduce the volatility of dependent tables which store foreign keys. There's also the benefit of key polymorphism, which I'll get to later.
In some instances, using natural keys to express relationships between tables can be problematic. For instance, imagine you had a PERSON table whose natural key was {LAST_NAME, FIRST_NAME, SSN}. What happens if you have some other table GRANT_PROPOSAL in which you need to store a reference to a Proposer, Reviewer, Approver, and Authorizer. You now need 12 columns to express this information. You also need to come up with a naming convention of some kind to identify which columns belong to which kind of individual. But what if your PERSON table required 6, or 8, or 24 columns to for a natural key? This rapidly becomes unmanageable. Surrogate keys resolve such problems by divorcing the semantics (meaning) of a key from its use as an identifier.
Let's also take a look at the example you described in your question.
Should the 2-character abbreviation of a state be used as the primary key of that table.
On the surface, it looks like the abbreviation field meets the requirements of a good primary key. It's relatively short, it is easy to propagate as a foreign key, it looks stable. Unfortunately, you don't control the set of abbreviations ... the postal service does. And here's an interesting fact: in 1973 the USPS changed the abbreviation of Nebraska from NB to NE to minimize confusion with New Brunswick, Canada. The moral of the story is that natural keys are often outside of the control of the database ... and they can change over time. Even when you think they cannot. This problem is even more pronounced for more complicated data like people, or products, etc. As businesses evolve, the definitions for what makes such entities unique can change. And this can create significant problems for data modelers and application developers.
Earlier I mentioned that primary keys can support key polymorphism. What does that mean? Well, polymorphism is the ability of one type, A, to appear as and be used like another type, B. In databases, this concept refers to the ability to combine keys from different classes of entities into a single table. Let's look at an example. Imagine for a moment that you want have an audit trail in your system that identifies which entities were modified by which user on what date. It would be nice to create a table with the fields: {ENTITY_ID, USER_ID, EDIT_DATE}. Unfortunately, using natural keys, different entities have different keys. So now we need to create a separate linking table for each kind of entity ... and build our application in a manner where it understand the different kinds of entities and how their keys are shaped.
Don't get me wrong. I'm not advocating that surrogate keys should ALWAYS be used. In the real world never, ever, and always are a dangerous position to adopt. One of the biggest drawbacks of surrogate keys is that they can result in tables that have foreign keys consisting of lots of "meaningless" numbers. This can make it cumbersome to interpret the meaning of a record since you have to join or lookup records from other tables to get a complete picture. It also can make a distributed database deployment more complicated, as assigning unique incrementing numbers across servers isn't always possible (although most modern database like Oracle and SQLServer mitigate this via sequence replication).
No.
In most cases, having a surrogate INT IDENTITY key is an easy option: it can be guaranteed to be NOT NULL and 100% unique, something a lot of "natural" keys don't offer - names can change, so can SSN's and other items of information.
In the case of state abbreviations and names - if anything, I'd use the two-letter state abbreviation as a key.
A primary key must be:
unique (100% guaranteed! Not just "almost" unique)
NON NULL
A primary key should be:
stable if ever possible (not change - or at least not too frequently)
State two-letter codes definitely would offer this - that might be a candidate for a natural key. A key should also be small - an INT of 4 bytes is perfect, a two-letter CHAR(2) column just the same. I would not ever use a VARCHAR(100) field or something like that as a key - it's just too clunky, most likely will change all the time - not a good key candidate.
So while you don't have to have an auto-incrementing "artificial" (surrogate) primary key, it's often quite a good choice, since no naturally occuring data is really up to the task of being a primary key, and you want to avoid having huge primary keys with several columns - those are just too clunky and inefficient.
I think the use of the word "Primary", in the phrase "Primary" Key is in a real sense, misleading.
First, use the definition that a "key" is an attribute or set of attributes that must be unique within the table,
Then, having any key serves several often mutually inconsistent purposes.
Purpose 1. To use as joins conditions to one or many records in child tables which have a relationship to this parent table. (Explicitly or implicitly defining a Foreign Key in those child tables)
Purpose 2. (related) Ensuring that child records must have a parent record in the parent table (The child table FK must exist as Key in the parent table)
Purpose 3. To increase performance of queries that need to rapidly locate a specific record/row in the table.
Purpose 4. (Most Important from data consistency perspective!) To ensure data consistency by preventing duplicate rows which represent the same logical entity from being inserted itno the table. (This is often called a "natural" key, and should consist of table (entity) attributes which are relatively invariant.)
Clearly, any non-meaningfull, non-natural key (like a GUID or an auto-generated integer is totally incapable of satisfying Purpose 4.
But often, with many (most) tables, a totally natural key which can provide #4 will often consist of multiple attributes and be excessively wide, or so wide that using it for purposes #1, #2, or #3 will cause unacceptable performance consequencecs.
The answer is simple. Use both. Use a simple auto-Generating integral key for all Joins and FKs in other child tables, but ensure that every table that requires data consistency (very few tables don't) have an alternate natural unique key that will prevent inserts of inconsistent data rows... Plus, if you always have both, then all the objections against using a natural key (what if it changes? I have to change every place it is referenced as a FK) become moot, as you are not using it for that... You are only using it in the one table where it is a PK, to avoid inconsistent duplciate data...
The only time you can get away without both is for a completely stand alone table that participates in no relationships with other tables and has an obvious and reliable natural key.
In general, a numeric primary key will perform better than a string. You can additionaly create unique keys to prevent duplicates from creeping in. That way you get the assurance of no duplicates, but you also get the performance of numbers (vs. strings in your scenario).
In all likelyhood, the major databases have some performance optimizations for integer-based primary keys that are not present for string-based primary keys. But, that is only a reasonable guess.
Yes, in my opinion every table needs an auto incrementing integer key because it makes both JOINs and (especially) front-end programming much, much, much easier. Others feel differently, but this is over 20 years of experience speaking.
The single exception is small "code" or "lookup" tables in which I'm willing to substitute a short (4 or 5 character) TEXT code value. I do this because the I often use a lot of these in my databases and it allows me to present a meaningful display to the user without having to look up the description in the lookup table or JOIN it into a result set. Your example of a States table would fit in this category.
No, absolutely not.
Having a primary key which can't change is a good idea (UPDATE is legal for primary key columns, but in general potentially confusing and can create problems for child rows). But if your application has some other candidate which is more suitable than an auto-incrementing value, then you should probably use that instead.
Performance-wise, in general fewer columns are better, and particularly fewer indexes. If you have another column which has a unique index on it AND can never be changed by any business process, then it may be a suitable primary key.
Speaking from a MySQL (Innodb) perspective, it's also a good idea to use a "real" column as a primary key rather than an "artificial" one, as InnoDB always clusters the primary key and includes it in secondary indexes (that is how it finds the rows in them). This gives it potential to do useful optimisation with a primary key which it can't with any other unique index. MSSQL users often choose to cluster the primary key, but it can also cluster a different unique index.
EDIT:
But if it's a small database and you don't really care about performance or size too much, adding an unnecessary auto-increment column isn't that bad.
A non auto-incrementing value (e.g. UUID, or some other string generated according to your own algorithm) may be useful for distributed, sharded, or diverse systems where maintaining a consistent auto-incrementing ID is difficult (or impossible - think of a distributed system which continues to insert rows on both sides of a network partition).
I think there are two things that may explain the reason why auto-incrementing keys are sometimes used:
Space consideration; ok your state name doesn't amount to much, but the space it takes may add up. If you really want to store the state with its name as a primary key, then go ahead, but it will take more place. That may not be a problem in certain cases, and it sounds like a problem of olden days, but the habit is perhaps ingrained. And we programmers and DBA do love habits :D
Defensive consideration: i recently had the following problem; we have users in the database where the email is the key to all identification. Why not make the email the promary key? except suddenly border cases creep in where one guy must be there twice to have two different adresses, and nobody talked about it in the specs so the adress is not normalized, and there's this situation where two different emails must point to the same person and... After a while, you stop pulling your hairs out and add the damn integer id column
I'm not saying it's a bad habit, nor a good one; i'm sure good systems can be designed around reasonable primary keys, but these two points lead me to believe fear and habit are two among the culprits
It's a key component of relational databases. Having an integer relate to a state instead of having the whole state name saves a bunch of space in your database! Imagine you have a million records referencing your state table. Do you want to use 4 bytes for a number on each of those records or do you want to use a whole crapload of bytes for each state name?
Here are some practical considerations.
Most modern ORMs (rails, django, hibernate, etc.) work best when there is a single integer column as the primary key.
Additionally, having a standard naming convention (e.g. id as primary key and table_name_id for foreign keys) makes identifying keys easier.

Pros and Cons of autoincrement keys on "every table"

We are having a rather long discussion in our company about whether or not to put an autoincrement key on EVERY table in our database.
I can understand putting one on tables that would have a FK reference to, but I kind-of dislike putting such keys on each and every one of our tables, even though the keys would never be used.
Please help with pros and cons for putting autoincrement keys on every table apart from taking extra space and slowing everything a little bit (we have some tables with hundreds of millions of records).
Thanks
I'm assuming that almost all tables will have a primary key - and it's just a question of whether that key consists of one or more natural keys or a single auto-incrementing surrogate key. If you aren't using primary keys then you will generally get a lot of advantages of using them on almost all tables.
So, here are some pros & cons of surrogate keys. First off, the pros:
Most importantly: they allow the natural keys to change. Trivial example, a table of persons should have a primary key of person_id rather than last_name, first_name.
Read performance - very small indexes are faster to scan. However, this is only helpful if you're actually constraining your query by the surrogate key. So, good for lookup tables, not so good for primary tables.
Simplicity - if named appropriately, it makes the database easy to learn & use.
Capacity - if you're designing something like a data warehouse fact table - surrogate keys on your dimensions allow you to keep a very narrow fact table - which results in huge capacity improvements.
And cons:
They don't prevent duplicates of the natural values. So, you'll still usually want a unique constraint (index) on the logical key.
Write performance. With an extra index you're going to slow down inserts, updates and deletes that much more.
Simplicity - for small tables of data that almost never changes they are unnecessary. For example, if you need a list of countries you can use the ISO list of countries. It includes meaningful abbreviations. This is better than a surrogate key because it's both small and useful.
In general, surrogate keys are useful, just keep in mind the cons and don't hesitate to use natural keys when appropriate.
You need primary keys on these tables. You just don't know it yet.
If you use small keys like this for Clustered Indexes, then there's quite significant advantages.
Like:
Inserts will always go at the end of pages.
Non-Clustered Indexes (which need a reference to the CIX key(s)) won't have long row addresses to consider.
And more... Kimberly Tripp's stuff is the best resource for this. Google her...
Also - if you have nothing else ensuring uniqueness, you have a hook into each row that you wouldn't otherwise have. You should still put unique indexes on fields that should be unique, and use FKs onto appropriate fields.
But... please consider the overhead of creating such things on existing tables. It could be quite scary. You can put unique indexes on tables without needing to create extra fields. Those unique indexes can then be used for FKs.
I'm not a fan of auto-increment primary keys on every table. The ideas that these give you fast joins and fast row inserts are really not true. My company calls this meatloaf thinking after the story about the woman who always cut the ends off her meatloaf just because her mother always did it. Her mother only did it because the pan was too short--the tradition keeps going even though the reason no longer exists.
When the driving table in a join has an auto-increment key, the joined table frequently shouldn't because it must have the FK to the driving table. It's the same column type, but not auto-increment. You can use the FK as the PK or part of a composite PK.
Adding an auto-increment key to a table with a naturally unique key will not always speed things up--how can it? You are adding more work by maintaining an extra index. If you never use the auto-increment key, this is completely wasted effort.
It's very difficult to predict optimizer performance--and impossible to predict future performance. On some databases, compressed or clustered indexes will decrease the costs of naturally unique PKs. On some parallel databases, auto-increment keys are negotiated between nodes and that increases the cost of auto-increment. You can only find out by profiling, and it really sucks to have to change Company Policy just to change how you create a table.
Having autoincrementing primary keys may make it easier for you to switch ORM layers in the future, and doesn't cost much (assuming you retain your logical unique keys).
You add surrogate auto increment primary keys as part of the implementation after logical design to respect the physical, on-disk architecture of the db engine.
That is, they have physcial properties (narrow, numeric, strictly monotonically increasing) that suit use as clustered keys, in joins etc.
Example: If you're modelling your data, then "product SKU" is your key. "product ID" is added afterwards, (with a unique constraint on "product SKU") when writing your "CREATE TABLE" statements because you know SQL Server.
This is the main reason.
The other reason a brain dead ORM that can't work without one...
Many tables are better off with a compound PK, composed of two or more FKs. These tables correspond to relationships in the Entity-Relationship (ER) model. The ER model is useful for conceptualizing a schema and understanding the requirements, but it should not be confused with a database design.
The tables that represent entities from an ER model should have a smiple PK. You use a surrogate PK when none of the natural keys can be trusted. The decision about whether a key can be trusted or not is not a technical decision. It depends on the data you are going to be given, and what you are expected to do with it.
If you use a surrogate key that's autoincremented, you now have to make sure that duplicate references to the same entity don't creep into your databases. These duplicates would show up as two or more rows with a distinct PK (because it's been autoincremented), but otherwise duplicates of each other.
If you let duplicates into your database, eventually your use of the data is going to be a mess.
The simplest approach is to always use surrogate keys that are either auto-incremented by the db or via an orm. And on every table. This is because they are the generally fasted method for joins and also they make learning the database extremely simple, i.e. none of this whats my key for a table nonsense as they all use the same kind of key. Yes they can be slower but in truth the most important part of design is something that wont break over time. This is proven for surrogate keys. Remember, maintenance of the system happens a lot longer than development. Plan for a system that can be maintained. Also, with current hardware the potential performance loss is really negligable.
Consider this:
A record is deleted in one table that has a relationship with another table. The corresponding record in the second table cannot be deleted for auditing reasons. This record becomes orphaned from the first table. If a new record is inserted into the first table, and a sequential primary key is used, this record is now linked to the orphan. Obviously, this is bad. By using an auto incremented PK, an id that has never been used before is always guaranteed. This means that orphans remain orphans, which is correct.
I would never use natural keys as a PK. A numeric PK, like an auto increment is the ideal choice the majority of the time, because it can be indexed efficiently. Auto increments are guaranteed to be unique, even when records are deleted, creating trusted data relationships.

What are the down sides of using a composite/compound primary key?

What are the down sides of using a composite/compound primary key?
Could cause more problems for normalisation (2NF, "Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF")
More unnecessary data duplication. If your composite key consists of 3 columns, you will need to create the same 3 columns in every table, where it is used as a foreign key.
Generally avoidable with the help of surrogate keys (read about their advantages and disadvantages)
I can imagine a good scenario for composite key -- in a table representing a N:N relation, like Students - Classes, and the key in the intermediate table will be (StudentID, ClassID). But if you need to store more information about each pair (like a history of all marks of a student in a class) then you'll probably introduce a surrogate key.
There's nothing wrong with having a compound key per se, but a primary key should ideally be as small as possible (in terms of number of bytes required). If the primary key is long then this will cause non-clustered indexes to be bloated.
Bear in mind that the order of the columns in the primary key is important. The first column should be as selective as possible i.e. as 'unique' as possible. Searches on the first column will be able to seek, but searches just on the second column will have to scan, unless there is also a non-clustered index on the second column.
I think this is a specialisation of the synthetic key debate (whether to use meaningful keys or an arbitrary synthetic primary key). I come down almost completely on the synthetic key side of this debate for a number of reasons. These are a few of the more pertinent ones:
You have to keep dependent child
tables on the end of a foriegn key
up to date. If you change the the
value of one of the primary key
fields (which can happen - see
below) you have to somehow change
all of the dependent tables where
their PK value includes these
fields. This is a bit tricky
because changing key values will
invalidate FK relationships with
child tables so you may (depending
on the constraint validation options
available on your platform) have to
resort to tricks like copying the
record to a new one and deleting the
old records.
On a deep schema the keys can get
quite wide - I've seen 8 columns
once.
Changes in primary key values can be
troublesome to identify in ETL
processes loading off the system.
The example I once had occasion to
see was an MIS application
extracting from an insurance
underwriting system. On some
occasions a policy entry would be
re-used by the customer, changing
the policy identifier. This was a
part of the primary key of the
table. When this happens the
warehouse load is not aware of what
the old value was so it cannot match
the new data to it. The developer
had to go searching through audit
logs to identify the changed value.
Most of the issues with non-synthetic primary keys revolve around issues when PK values of records change. The most useful applications of non-synthetic values are where a database schema is intended to be used, such as an M.I.S. application where report writers are using the tables directly. In this case short values with fixed domains such as currency codes or dates might reasonably be placed directly on the table for convenience.
I would recommend a generated primary key in those cases with a unique not null constraint on the natural composite key.
If you use the natural key as primary then you will most likely have to reference both values in foreign key references to make sure you are identifying the correct record.
Take the example of a table with two candidate keys: one simple (single-column) and one compound (multi-column). Your question in that context seems to be, "What disadvantage may I suffer if I choose to promote one key to be 'primary' and I choose the compound key?"
First, consider whether you actually need to promote a key at all: "the very existence of the PRIMARY KEY in SQL seems to be an historical accident of some kind. According to author Chris Date the earliest incarnations of SQL didn't have any key constraints and PRIMARY KEY was only later addded to the SQL standards. The designers of the standard obviously took the term from E.F.Codd who invented it, even though Codd's original notion had been abandoned by that time! (Codd originally proposed that foreign keys must only reference one key - the primary key - but that idea was forgotten and ignored because it was widely recognised as a pointless limitation)." [source: David Portas' Blog: Down with Primary Keys?
Second, what criteria would you apply to choose which key in a table should be 'primary'?
In SQL, the choice of key PRIMARY KEY is arbitrary and product specific. In ACE/Jet (a.k.a. MS Access) the two main and often competing factors is whether you want to use PRIMARY KEY to favour clustering on disk or whether you want the columns comprising the key to appears as bold in the 'Relationships' picture in the MS Access user interface; I'm in the minority by thinking that index strategy trumps pretty picture :) In SQL Server, you can specify the clustered index independently of the PRIMARY KEY and there seems to be no product-specific advantage afforded. The only remaining advantage seems to be the fact you can omit the columns of the PRIMARY KEY when creating a foreign key in SQL DDL, being a SQL-92 Standard behaviour and anyhow doesn't seem such a big deal to me (perhaps another one of the things they added to the Standard because it was a feature already widespread in SQL products?) So, it's not a case of looking for drawbacks, rather, you should be looking to see what advantage, if any, your SQL product gives the PRIMARY KEY. Put another way, the only drawback to choosing the wrong key is that you may be missing out on a given advantage.
Third, are you rather alluding to using an artificial/synthetic/surrogate key to implement in your physical model a candidate key from your logical model because you are concerned there will be performance penalties if you use the natural key in foreign keys and table joins? That's an entirely different question and largely depends on your 'religious' stance on the issue of natural keys in SQL.
Need more specificity.
Taken too far, it can overcomplicate Inserts (Every key MUST exist) and documentation and your joined reads could be suspect if incomplete.
Sometimes it can indicate a flawed data model (is a composite key REALLY what's described by the data?)
I don't believe there is a performance cost...it just can go really wrong really easily.
when you se it on a diagram are less readable
when you use it on a query join are less
readable
when you use it on a foregein key
you have to add a check constraint
about all the attribute have to be
null or not null (if only one is
null the key is not checked)
usualy need more storage when use it
as foreign key
some tool doesn't manage composite
key
The main downside of using a compound primary key, is that you will confuse the hell out of typical ORM code generators.