Naming convention of foreign keys - sql

When making relations between tables (in mysql), I have encountered a naming dilemma.
For example, if I was creating a site where a project could be created by multiple users and also read by multiple users, to link a questions and user tables, I would potentially need two tables.
**project_authors**
questionId
userId
and
**project_bidders**
questionId
userId
The problem here is that the two tables look identical excluding the table name. Probably a more useful representation would be
project_authors
questionId
authorId
and
project_bidders
questionId
bidderID
The problem here now is that authorId and readerId are actually just userIds, and the name does not reflect that, and could possibly misleadingly indicate that authorId and bidderId's are unique and different in their own right.
I am sure my example will have many holes in it, but I have been encountering this problem alot recently, so my question is what method do you use?

What's wrong with author_userID and bidder_userID?
You have people playing roles, which is always a difficult design. You need to reflect the role as well as the underlying object playing that role.

I would say:
project_users
-------------
questionId
userId
roleId
where roleId links to a table that differentiates between author, bidder, etc. Positive effect - you can control with the choice of the composite primary key whether a user can be only one (author or bidder) or both. The former would mean a key over questionId, userId, the latter a key over all three fields.
Side note: Personally, I prefer staying in one naming scheme. Either I use everyhing_with_underscores, or I use camelCase/PascalCase, but not project_users and userId within the same database.

When I can I use the exact name of the PK field I am linking to. However, occassionally I might need two references to the same id in the same table, then I would do:
Users
UserID
Orders
Customer_UserID
SalesRep_UserID
That way you know the specific use of the ID as well as the actual ID name.

If you are asking just about the naming, I would use whatever naming scheme provides the most documentation inherently. I personally think that would be the first option. However, as long as you make sure whatever you decide is consistent and documented somehow, I think either will work fine.
However, have you thought about making more tables? Perhaps have a users table, which stores IDs and other user information, a projects table which stores projects, a bidders table which maps users that bid to projects, and an authors table which maps users to authored projects?

I prefer to keep the foreign key name identical to the primary key name when possible. This helps you quickly determine whether or not a column is a foreign key to another table and there is no ambiguity as to which table it references.
tblUser
UserID (pk)
tblProjectAuthor
ProjectAuthorID (pk)
UserID (fk to tblUser)
tblProjectBidder
ProjectBidderID (pk)
UserID (fk to tblUser)
In your queries, you can use prefixes to distinguish between which table's UserID you are referencing.
select author.UserID
from tblProjectAuthor author
left join tblUser user on user.UserID = author.UserID
The only problem we've experienced with this naming scheme is when you reference a foreign key multiple times in the same table. A good example would be a self-joining table. In cases like these, my only suggestion is to prefix with a meaningful word or phrase that helps distinguish while allowing you to recognize that the column is a foreign key.
For example:
tblEmployee
EmployeeID (pk)
ManagerEmployeeID (fk to tblEmployee)

I like to be really descriptive in my names, because more often then not they find their way into code as property names on some object. So in your case I would have
project_authors
questionId
authoredByUserId
and
project_bidders
questionId
bidByUserId
Then in code it makes a lot more sense when accessing the properties, like
myProjectAuthorEntity.authoredByUserId = someUserId;
myProjectBidderEntity.bidByUserId = someOtherUserId;

Related

Modelling many-to-many relation between more than two tables

I'm modelling a tier-list database using PostgreSQL. This is how it works:
A user can create a new Tier List;
A user can add as many tiers he wants to the list;
A user can add as many items as he can. Initially, the items are added to an "unranked" section (not assigned to any tier), then the user can rank them as he wants.
Modeling details:
A tier necessarily belongs to a tier_list;
An item can be in multiple tier_lists and in multiple tiers as well;
An item added to a tier_list has not necessarily been added to one of the tiers.
For modelling the relations between item-tier and item-tier_list, I thought about two scenarios:
Creating a junction with a composite PFK key of item and tier_list with a nullable tier FK. The records with no tier value would be the unranked ones, while the ones with an assigned tier would be the ranked;
Creating two M-N relations: one between item and tier, storing ranked items, and another between item and tier_list, storing unranked items.
I feel like the first option would be easier to deal with when having to persist things like moving a product between tiers (or even unranking it), while the second looks more compliant to SQL standards. Am I missing something?
First proposed solution model:
Second proposed solution model:
You can create a joint key using 3 different fields.
First of all, why using smallint and not int? Not fluent in Posgres, but it's usually better to have the biggest integer possible as primary key (things can grow faster than you expect).
Second, I strongly suggest to put ID_ before and not after the name of the filed used for lookup. It makes it easier to read.
As how to build your tables:
Item
ID PK
Title
Descriptions
I see no problems here. I'd just change the name in tblProducts, for easier reading.
Tier_List
ID PK
Description
Works fine too. Again I'll look for a better name. I'd call this one tblTiers or tblLegues instead. Usign similar names can bring troubles in 2-3 years when you have to add things and you're not sure what's what. Better use distinctive names for the tables.
Tier (suggesting tblTiers or tblRankings)
ID PK
Tier_List_ID PK FK
Title
Description
Here I see a HUGE problem. For experience, I don't really understand why you create a combination key here with ID and Tier_List_ID. Do you need to reuse the same ID for different tiers? If that ID has a meaning bring it out from the PK absolutely! PK must be simple counters, that will NEVER be changed. I saw people using the ID with a meaning for the end-user. It was a total disaster! I can't even start describing the quantity of garbage data that that DB was containing.
I suppose, because you were talking about ranking, that the ID there is a Rank, a level or something like that.
The table should become
ID PK uuid
Tier_List_ID FK
Rank smallint
Title
Description
There's another reason why I had you do this: when you have a combined PK, certain DBRMs require you to use the same combined key in the lookup tables, and that can become messy fast!
Now, the lookup table:
tier_list_item (tblRankingLookup?)
ID_Product FK PK
ID_Tier_List FK PK
ID_Tier FK PK
You don't need anything else to make it work smoothly! At least, that's how I'd envision it.
Instead I'd add an ID_User (because I'm not sure if all users can see all tiers and all rankings, or they can see only theirs).
Addendum: if you need to have unique combinations of different elements, I'm pretty sure you can create a combined index and mark it as "unique" (don't remember the correct syntax, not sure it is the same in Postgres).
In exmple, if you don't want the Tier table to have the rank repeated only once per tier_list_ID, you can create an index using tier_list_ID and Ranking and mark it unique. This way a two tiers in the same tier_list will not have the same value for the field Rank (rank can still be null).

Primary key/foreign Key naming convention [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In our dev group we have a raging debate regarding the naming convention for Primary and Foreign Keys. There's basically two schools of thought in our group:
1:
Primary Table (Employee)
Primary Key is called ID
Foreign table (Event)
Foreign key is called EmployeeID
or
2:
Primary Table (Employee)
Primary Key is called EmployeeID
Foreign table (Event)
Foreign key is called EmployeeID
I prefer not to duplicate the name of the table in any of the columns (So I prefer option 1 above). Conceptually, it is consistent with a lot of the recommended practices in other languages, where you don't use the name of the object in its property names. I think that naming the foreign key EmployeeID (or Employee_ID might be better) tells the reader that it is the ID column of the Employee Table.
Some others prefer option 2 where you name the primary key prefixed with the table name so that the column name is the same throughout the database. I see that point, but you now can not visually distinguish a primary key from a foreign key.
Also, I think it's redundant to have the table name in the column name, because if you think of the table as an entity and a column as a property or attribute of that entity, you think of it as the ID attribute of the Employee, not the EmployeeID attribute of an employee. I don't go an ask my coworker what his PersonAge or PersonGender is. I ask him what his Age is.
So like I said, it's a raging debate and we go on and on and on about it. I'm interested to get some new perspectives.
If the two columns have the same name in both tables (convention #2), you can use the USING syntax in SQL to save some typing and some boilerplate noise:
SELECT name, address, amount
FROM employees JOIN payroll USING (employee_id)
Another argument in favor of convention #2 is that it's the way the relational model was designed.
The significance of each column is
partially conveyed by labeling it with
the name of the corresponding domain.
It doesn't really matter. I've never run into a system where there is a real difference between choice 1 and choice 2.
Jeff Atwood had a great article a while back on this topic. Basically people debate and argue the most furiously those topics which they cannot be proven wrong on. Or from a different angle, those topics which can only be won through filibuster style endurance based last-man-standing arguments.
Pick one and tell them to focus on issues that actually impact your code.
EDIT: If you want to have fun, have them specify at length why their method is superior for recursive table references.
I think it depends on your how you application is put together. If you use ORM or design your tables to represent objects then option 1 may be for you.
I like to code the database as its own layer. I control everything and the app just calls stored procedures. It is nice to have result sets with complete column names, especially when there are many tables joined and many columns returned. With this stype of application, I like option 2. I really like to see column names match on joins. I've worked on old systems where they didn't match and it was a nightmare,
Have you considered the following?
Primary Table (Employee)
Primary Key is PK_Employee
Foreign table (Event)
Foreign key is called FK_Employee
Neither convention works in all cases, so why have one at all? Use Common sense...
e.g., for self-referencing table, when there are more than one FK column that self-references the same table's PK, you HAVE to violate both "standards", since the two FK columns can't be named the same... e.g., EmployeeTable with EmployeeId PK, SupervisorId FK, MentorId Fk, PartnerId FK, ...
I agree that there is little to choose between them. To me a much more significant thing about either standard is the "standard" part.
If people start 'doing their own thing' they should be strung up by their nethers. IMHO :)
If you are looking at application code, not just database queries, some things seem clear to me:
Table definitions usually directly map to a class that describes one object, so they should be singular. To describe a collection of an object, I usually append "Array" or "List" or "Collection" to the singular name, as it more clearly than use of plurals indicates not only that it is a collection, but what kind of a collection it is. In that view, I see a table name as not the name of the collection, but the name of the type of object of which it is a collection. A DBA who doesn't write application code might miss this point.
The data I deal with often uses "ID" for non-key identification purposes. To eliminate confusion between key "ID"s and non-key "ID"s, for the primary key name, we use "Key" (that's what it is, isn't it?) prefixed with the table name or an abbreviation of the table name. This prefixing (and I reserve this only for the primary key) makes the key name unique, which is especially important because we use variable names that are the same as the database column names, and most classes have a parent, identified by the name of the parent key. This also is needed to make sure that it is not a reserved keyword, which "Key" alone is. To facilitate keeping key variable names consistent, and to provide for programs that do natural joins, foreign keys have the same name as is used in the table in which they are the primary key. I have more than once encountered programs which work much better this way using natural joins. On this last point, I admit a problem with self-referencing tables, which I have used. In this case, I would make an exception to the foreign key naming rule. For example, I would use ManagerKey as a foreign key in the Employee table to point to another record in that table.
The convention we use where I work is pretty close to A, with the exception that we name tables in the plural form (ie, "employees") and use underscores between the table and column name. The benefit of it is that to refer to a column, it's either "employees _ id" or "employees.id", depending on how you want to access it. If you need to specify what table the column is coming from, "employees.employees _ id" is definitely redundant.
I like convention #2 - in researching this topic, and finding this question before posting my own, I ran into the issue where:
I am selecting * from a table with a large number of columns and joining it to a second table that similarly has a large number of columns. Both tables have an "id" column as the primary key, and that means I have to specifically pick out every column (as far as I know) in order to make those two values unique in the result, i.e.:
SELECT table1.id AS parent_id, table2.id AS child_id
Though using convention #2 means I will still have some columns in the result with the same name, I can now specify which id I need (parent or child) and, as Steven Huwig suggested, the USING statement simplifies things further.
I've always used userId as a PK on one table and userId on another table as a FK. 'm seriously thinking about using userIdPK and userIdFK as names to identify one from the other. It will help me to identify PK and FK quickly when looking at the tables and it seems like it will clear up code when using PHP/SQL to access data making it easier to understand. Especially when someone else looks at my code.
I use convention #2. I'm working with a legacy data model now where I don't know what stands for in a given table. Where's the harm in being verbose?
How about naming the foreign key
role_id
where role is the role the referenced entity has relativ to the table at hand. This solves the issue of recursive reference and multiple fks to the same table.
In many cases will be identical to the referenced table name. In this cases it becomes identically to one of your proposals.
In any case havin long arguments is a bad idea
"Where in "employee INNER JOIN order ON order.employee_id = employee.id" is there a need for additional qualification?".
There is no need for additional qualification because the qualification I talked of is already there.
"the reason that a business user refers to Order ID or Employee ID is to provide context, but at a dabase level you already have context because you are refereing to the table".
Pray, tell me, if the column is named 'ID', then how is that "refereing [sic] to the table" done exactly, unless by qualifying this reference to the ID column exactly in the way I talked of ?

SQL: Do you need an auto-incremental primary key for Many-Many tables?

Say you have a Many-Many table between Artists and Fans. When it comes to designing the table, do you design the table like such:
ArtistFans
ArtistFanID (PK)
ArtistID (FK)
UserID (FK)
(ArtistID and UserID will then be contrained with a Unique Constraint
to prevent duplicate data)
Or do you build use a compound PK for the two relevant fields:
ArtistFans
ArtistID (PK)
UserID (PK)
(The need for the separate unique constraint is removed because of the
compound PK)
Are there are any advantages (maybe indexing?) for using the former schema?
ArtistFans
ArtistID (PK)
UserID (PK)
The use of an auto incremental PK has no advantages here, even if the parent tables have them.
I'd also create a "reverse PK" index automatically on (UserID, ArtistID) too: you will need it because you'll query the table by both columns.
Autonumber/ID columns have their place. You'd choose them to improve certain things after the normalisation process based on the physical platform. But not for link tables: if your braindead ORM insists, then change ORMs...
Edit, Oct 2012
It's important to note that you'd still need unique (UserID, ArtistID) and (ArtistID, UserID) indexes. Adding an auto increments just uses more space (in memory, not just on disk) that shouldn't be used
Assuming that you're already a devotee of the surrogate key (you're in good company), there's a case to be made for going all the way.
A key point that is sometimes forgotten is that relationships themselves can have properties. Often it's not enough to state that two things are related; you might have to describe the nature of that relationship. In other words, there's nothing special about a relationship table that says it can only have two columns.
If there's nothing special about these tables, why not treat it like every other table and use a surrogate key? If you do end up having to add properties to the table, you'll thank your lucky presentation layers that you don't have to pass around a compound key just to modify those properties.
I wouldn't even call this a rule of thumb, more of a something-to-consider. In my experience, some slim majority of relationships end up carrying around additional data, essentially becoming entities in themselves, worthy of a surrogate key.
The rub is that adding these keys after the fact can be a pain. Whether the cost of the additional column and index is worth the value of preempting this headache, that really depends on the project.
As for me, once bitten, twice shy – I go for the surrogate key out of the gate.
Even if you create an identity column, it doesn't have to be the primary key.
ArtistFans
ArtistFanId
ArtistId (PK)
UserId (PK)
Identity columns can be useful to relate this relation to other relations. For example, if there was a creator table which specified the person who created the artist-user relation, it could have a foreign key on ArtistFanId, instead of the composite ArtistId+UserId primary key.
Also, identity columns are required (or greatly improve the operation of) certain ORM packages.
I cannot think of any reason to use the first form you list. The compound primary key is fine, and having a separate, artificial primary key (along with the unique contraint you need on the foreign keys) will just take more time to compute and space to store.
The standard way is to use the composite primary key. Adding in a separate autoincrement key is just creating a substitute that is already there using what you have. Proper database normalization patterns would look down on using the autoincrement.
Funny how all answers favor variant 2, so I have to dissent and argue for variant 1 ;)
To answer the question in the title: no, you don't need it. But...
Having an auto-incremental or identity column in every table simplifies your data model so that you know that each of your tables always has a single PK column.
As a consequence, every relation (foreign key) from one table to another always consists of a single column for each table.
Further, if you happen to write some application framework for forms, lists, reports, logging etc you only have to deal with tables with a single PK column, which simplifies the complexity of your framework.
Also, an additional id PK column does not cost you very much in terms of disk space (except for billion-record-plus tables).
Of course, I need to mention one downside: in a grandparent-parent-child relation, child will lose its grandparent information and require a JOIN to retrieve it.
In my opinion, in pure SQL id column is not necessary and should not be used. But for ORM frameworks such as Hibernate, managing many-to-many relations is not simple with compound keys etc., especially if join table have extra columns.
So if I am going to use a ORM framework on the db, I prefer putting an auto-increment id column to that table and a unique constraint to the referencing columns together. And of course, not-null constraint if it is required.
Then I treat the table just like any other table in my project.

Owner ID type database fields

Suppose you have these tables: RestaurantChains, Restaurants, MenuItems - with the obvious relations between them. Now, you have tables Comments and Ratings, which store the customer comments/ratings about chains, restaurants and menu items. What would be the best way to link these tables? The obvious solutions could be:
Use columns OwnerType and OwnerID in the tables Comments and Ratings, but now I can't add foreign keys to link comments/ratings with the objects they are ment for
Create separate tables of Comments and Ratings for each table, e.g. MenuItemRatings, MenuItemComments etc. This solution has the advantage that all the correct foreign keys are present and has the obvious disadavantage of having lots and lots of tables with basically the same structure.
So, which solution works better? Or is there even a better solution that I don't know about?
Since comments about a menu item are different from comments about a restaurant (even if they happen to share the same structure) I would put them in separate tables and have the appropriate FKs to enforce some data integrity in your database.
I don't know why there is an aversion to having more tables in your database. Unless you're going from 50 tables to 50,000 tables you're not going to see a performance problem due to large catalog tables (and having more, smaller tables in this case should actually give you better performance). I would also tend to think that it would be a lot clearer to understand when dealing with tables called "Menu_Item_Comments" and "Restaurant_Comments" than it would to deal with a table called "Comments" and not knowing what exactly is really in it just by the name of it.
How about this alt text http://www.freeimagehosting.net/uploads/8241ff5c76.png
Have a single Comments/Rating table for all the objects and dont use automatically generated foreign keys. The key in the ratings table eg RatingID can be placed in a field in Restaurant, Chain, Menuitems table and they can all point to the same table, they are still foreign keys.
If you need to know in reverse what object the review relates to you would need to have a field specifying the type of review it was, but that should be all.
Use a single table for comments and use GUID's as primary keys for your entites.
Then you can select comments without even knowing beforehand where they belong to:
SELECT CommentText
FROM Comments c, Restaurants r
WHERE c.Source = r.Id
SELECT CommentText
FROM Comments c, Chains ch
WHERE c.Source = ch.Id
etc.
You can't use foreign keys for comments, of course, but it's not that comments cannot live without foreign keys.
You may clean orphaned comments in triggers but there's nothing bad if some of them are left.
You amy also create a global Entity table (with a single GUID column), make your Chains, Restaurants, MenuItems and Comments refer to that table with a FOREING KEY ON DELETE CASCADE, and when DELETE'ing, say, a restaurant, delete it from that table instead. It will delete both a restaurant and all comments on it, and you still have your integrity.
If you want to take advantage of foreign key constraint and normalize the attributes of comments (and ratings) across base tables, you may need to create relationship tables between base tables and comments (and ratings).
e.g. for Restaurants and Comments:
Restaurants
id (PK)
(attributes of restaurants...)
RestaurantComments
id (PK)
restaurantid (FK to Restaurants)
commentid (FK to Comments)
Comments
id (PK)
(attributes of comments...)

Why specify primary/foreign key attributes in column names

A couple of recent questions discuss strategies for naming columns, and I was rather surprised to discover the concept of embedding the notion of foreign and primary keys in column names. That is
select t1.col_a, t1.col_b, t2.col_z
from t1 inner join t2 on t1.id_foo_pk = t2.id_foo_fk
I have to confess I have never worked on any database system that uses this sort of scheme, and I'm wondering what the benefits are. The way I see it, once you've learnt the N principal tables of a system, you'll write several orders of magnitude more requests with those tables.
To become productive in development, you'll need to learn which tables are the important tables, and which are simple tributaries. You'll want to commit an good number of column names to memory. And one of the basic tasks is to join two tables together. To reduce the learning effort, the easiest thing to do is to ensure that the column name is the same in both tables:
select t1.col_a, t1.col_b, t2.col_z
from t1 inner join t2 on t1.id_foo = t2.id_foo
I posit that, as a developer, you don't need to be reminded that much about which columns are primary keys, which are foreign and which are nothing. It's easy enough to look at the schema if you're curious. When looking at a random
tx inner join ty on tx.id_bar = ty.id_bar
... is it all that important to know which one is the foreign key? Foreign keys are important only to the database engine itself, to allow it to ensure referential integrity and do the right thing during updates and deletes.
What problem is being solved here? (I know this is an invitation to discuss, and feel free to do so. But at the same time, I am looking for an answer, in that I may be genuinely missing something).
I agree with you. Putting this information in the column name smacks of the crappy Hungarian Notation idiocy of the early Windows days.
I agree with you that the foreign key column in a child table should have the same name as the primary key column in the parent table. Note that this permits syntax like the following:
SELECT * FROM foo JOIN bar USING (foo_id);
The USING keyword assumes that a column exists by the same name in both tables, and that you want an equi-join. It's nice to have this available as shorthand for the more verbose:
SELECT * FROM foo JOIN bar ON (foo.foo_id = bar.foo_id);
Note, however, there are cases when you can't name the foreign key the same as the primary key it references. For example, in a table that has a self-reference:
CREATE TABLE Employees (
emp_id INT PRIMARY KEY,
manager_id INT REFERENCES Employees(emp_id)
);
Also a table may have multiple foreign keys to the same parent table. It's useful to use the name of the column to describe the nature of the relationship:
CREATE TABLE Bugs (
...
reported_by INT REFERENCES Accounts(account_id),
assigned_to INT REFERENCES Accounts(account_id),
...
);
I don't like to include the name of the table in the column name. I also eschew the obligatory "id" as the name of the primary key column in every table.
I've espoused most of the ideas proposed here over the 20-ish years I've been developing with SQL databases, I'm embarrassed to say. Most of them delivered few or none of the expected benefits and were with hindsight, a pain in the neck.
Any time I've spent more than a few hours with a schema I've fairly rapidly become familiar with the most important tables and their columns. Once it got to a few months, I'd pretty much have the whole thing in my head.
Who is all this explanation for? Someone who only spends a few minutes with the design isn't going to be doing anything serious anyway. Someone who plans to work with it for a long time will learn it if you named your columns in Sanskrit.
Ignoring compound primary keys, I don't see why something as simple as "id" won't suffice for a primary key, and "_id" for foreign keys.
So a typical join condition becomes customer.id = order.customer_id.
Where more than one association between two tables exists, I'd be inclined to use the association rather than the table name, so perhaps "parent_id" and "child_id" rather than "parent_person_id" etc
I only use the tablename with an Id suffix for the primary key, e.g. CustomerId, and foreign keys referencing that from other tables would also be called CustomerId. When you reference in the application it becomes obvious the table from the object properties, e.g. Customer.TelephoneNumber, Customer.CustomerId, etc.
I used "fk_" on the front end of any foreign keys for a table mostly because it helped me keep it straight when developing the DB for a project at my shop. Having not done any DB work in the past, this did help me. In hindsight, perhaps I didn't need to do that but it was three characters tacked onto some column names so I didn't sweat it.
Being a newcomer to writing DB apps, I may have made a few decisions which would make a seasoned DB developer shudder, but I'm not sure the foreign key thing really is that big a deal. Again, I guess it is a difference in viewpoint on this issue and I'll certainly take what you've written and cogitate on it.
Have a good one!
I agree with you--I take a different approach that I have seen recommended in many corporate environments:
Name columns in the format TableNameFieldName, so if I had a Customer table and UserName was one of my fields, the field would be called CustomerUserName. That means that if I had another table called Invoice, and the customer's user name was a foreign key, I would call it InvoiceCustomerUserName, and when I referenced it, I would call it Invoice.CustomerUserName, which immediately tells me which table it's in.
Also, this naming helps you to keep track of the tables your columns are coming from when you're joiining.
I only use FK_ and PK_ in the ACTUAL names of the foreign and primary keys in the DBMS.