Factoring out nulls in bill-of-materials style relations - sql

Given the schema
PERSON { name, spouse }
where PERSON.spouse is a foreign key to PERSON.name, NULLs will be necessary when a person is unmarried or we don't have any info.
Going with the argument against nulls, how do you avoid them in this case?
I have an alternate schema
PERSON { name }
SPOUSE { name1, name2 }
where SPOUSE.name* are FKs to PERSON. The problem I see here is that there is no way to ensure someone has only one spouse (even with all possible UNIQUE constraints, it would be possible to have two spouses).
What's the best way to factor out nulls in bill-of-materials style relations?

I think that enforcing no NULLs and no duplicates for this type of relationship makes the schema definition way more complicated than it really needs to be. Even if you allow nulls, it would still be possible for a person to have more than one spouse, or to have conflicting records e.g:
PERSON { A, B }
PERSON { B, C }
PERSON { C, NULL }
You'd need to introduce more data, like gender (or "spouse-numbers" for same-sex marriages?) to ensure that, for example, only Persons of one type are allowed to have a Spouse. The other Person's spouse would be determined by the first person's record. E.g.:
PERSON { A, FEMALE, B }
PERSON { B, MALE, NULL }
PERSON { C, FEMALE, NULL }
... So that only PERSONs who are FEMALE can have a non-null SPOUSE.
But IMHO, that's overcomplicated and non-intuitive even with NULLs. Without NULLs, it's even worse. I would avoid making schema restrictions like this unless you literally have no choice.

Well, first I would use auto-incrementing IDs as, of course, someone could have the same name. But, I assume you intend to do that and won't harp on it. However, how does the argument against NULLs go exactly? I don't have any problem with NULLs and think that is the appropriate solution to this problem.

I'm not sure why no one has pointed this out yet, but it's actually quite easy to ensure that a person has only one spouse, using pretty much the same model that you have in your question.
I'm going to ignore for the moment the use of a name as a primary key (it can change and duplicates are fairly common, so it's a poor choice) and I'm also going to leave out the possible need for historical tracking (you might want to add an effective date of some sort so that you know WHEN they were a spouse - Joe Celko has written some good stuff on temporal modeling, but I don't recall which book it was in at the moment). Otherwise if I got divorced and remarried you would lose that I had another spouse at another time - maybe that isn't important to you though.
Also, you might want to break up name into first_name, middle_name, last_name, prefix, suffix, etc.
Given those caveats...
CREATE TABLE People
(
person_name VARCHAR(100),
CONSTRAINT PK_People PRIMARY KEY (person_name)
)
GO
CREATE TABLE Spouses
(
person_name VARCHAR(100),
spouse_name VARCHAR(100),
CONSTRAINT PK_Spouses PRIMARY KEY (person_name),
CONSTRAINT FK_Spouses_People FOREIGN KEY (person_name) REFERENCES People (person_name)
)
GO
If you wanted to have spouses appear in the People table as well then you could add an FK for that as well. However, at that point you're dealing with a bidirectional link, which becomes a bit more complex.

All right, use Auto-IDs and then use a Check Constraint. The "Name1" column (which would only be an int ID) will be force to only have ODD numbered IDs and Name2 will only have EVEN.
Then create a Unique Constraint for Column1 and Column2.

Well, begin with using a key other than name, perhaps a int seed. But to prevent someone from having more than one spouse, simply add a unique index to the parent(name1) in the spouse table. that will prevent you from ever inserting the same name1 twice.

You need a person TABLE and a separate "Partner_Off" table to define the relationship.
Person (id, name, etc );
Partner_Off (id, partner_id, relationship);
To deal with the more complex social situation you probaly would probably need some dates in there, plus, to simplify the sqls you need one entry for (fred,wilma,husband) and a matching entry for (wilma,fred,wife).

You can use a trigger to enforce the constraint. PostgreSQL has constraint triggers, a particularly nice way to defer the constraint evaluation until the appropriate time in the transaction.
From Fabian Pascal's Practical Issues in Database Management, pp. 66-67:
Stored procedures—whether triggered or
not—are preferable to application
level integrity code, but they are
practically inferior to and riskier
than declarative support because they
are more burdensome to write, error
prone, and cannot benefit from full
DBMS optimization.
...
Choose DBMSs with better declarative
integrity support. Given the
considerable gaps in such support by
products, knowledgeable users would be
at least in a position to emulate
correctly—albeit with procedural and/or application code—constraints
not supported by the DBMS.

Related

Design conditional relationship

I need help with designing my database tables.
Employee
Id
EmployeeTypeId
EmployeeType
Id
Name
Car
Id
EmployeeId
How do I enforce that only one type of employee (driver) can be a foreign key in the Car table or should I redesign the tables?
I consider it a good idea to forearm the database such that implausible data cannot be entered. To enforce this here, however, is a bit tricky...
Solution 1:
Add EmployeeTypeId to the Car table. Then make (EmployeeId, EmployeeTypeId) a foreign key to the Employee table (where you might have to create a uniqe constraint on the two fields, in order to be able to use them for a foreign key reference). Then add a constraint on Car.EmployeeTypeId to ensure it's a driver. I know this looks redundant, but it really is no problem, because you cannot assign an Employee another EmployeeType here, so consistency is still guaranteed. I admit this approach is a bit clumsy, though.
Solution 2:
Use a before-insert tigger on the Car table, look up the employee and make sure it is a driver, else throw an exception. This is a better solution in my opinion, alone for its simplicity. You could then add a column to table Car holding a unique name for the types that you use, e.g. UniqueName = 'DRIVER', so you don't have to use the ID as a magic number. You see, normally one EmployeeType is a s good as the other in a database. If you want to build special logic on a certain entry, you need a handle for this. The unique name is one way to do this, a flag IsDriver = TRUE/FALSE would be another.

Primary key/foreign Key naming convention [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In our dev group we have a raging debate regarding the naming convention for Primary and Foreign Keys. There's basically two schools of thought in our group:
1:
Primary Table (Employee)
Primary Key is called ID
Foreign table (Event)
Foreign key is called EmployeeID
or
2:
Primary Table (Employee)
Primary Key is called EmployeeID
Foreign table (Event)
Foreign key is called EmployeeID
I prefer not to duplicate the name of the table in any of the columns (So I prefer option 1 above). Conceptually, it is consistent with a lot of the recommended practices in other languages, where you don't use the name of the object in its property names. I think that naming the foreign key EmployeeID (or Employee_ID might be better) tells the reader that it is the ID column of the Employee Table.
Some others prefer option 2 where you name the primary key prefixed with the table name so that the column name is the same throughout the database. I see that point, but you now can not visually distinguish a primary key from a foreign key.
Also, I think it's redundant to have the table name in the column name, because if you think of the table as an entity and a column as a property or attribute of that entity, you think of it as the ID attribute of the Employee, not the EmployeeID attribute of an employee. I don't go an ask my coworker what his PersonAge or PersonGender is. I ask him what his Age is.
So like I said, it's a raging debate and we go on and on and on about it. I'm interested to get some new perspectives.
If the two columns have the same name in both tables (convention #2), you can use the USING syntax in SQL to save some typing and some boilerplate noise:
SELECT name, address, amount
FROM employees JOIN payroll USING (employee_id)
Another argument in favor of convention #2 is that it's the way the relational model was designed.
The significance of each column is
partially conveyed by labeling it with
the name of the corresponding domain.
It doesn't really matter. I've never run into a system where there is a real difference between choice 1 and choice 2.
Jeff Atwood had a great article a while back on this topic. Basically people debate and argue the most furiously those topics which they cannot be proven wrong on. Or from a different angle, those topics which can only be won through filibuster style endurance based last-man-standing arguments.
Pick one and tell them to focus on issues that actually impact your code.
EDIT: If you want to have fun, have them specify at length why their method is superior for recursive table references.
I think it depends on your how you application is put together. If you use ORM or design your tables to represent objects then option 1 may be for you.
I like to code the database as its own layer. I control everything and the app just calls stored procedures. It is nice to have result sets with complete column names, especially when there are many tables joined and many columns returned. With this stype of application, I like option 2. I really like to see column names match on joins. I've worked on old systems where they didn't match and it was a nightmare,
Have you considered the following?
Primary Table (Employee)
Primary Key is PK_Employee
Foreign table (Event)
Foreign key is called FK_Employee
Neither convention works in all cases, so why have one at all? Use Common sense...
e.g., for self-referencing table, when there are more than one FK column that self-references the same table's PK, you HAVE to violate both "standards", since the two FK columns can't be named the same... e.g., EmployeeTable with EmployeeId PK, SupervisorId FK, MentorId Fk, PartnerId FK, ...
I agree that there is little to choose between them. To me a much more significant thing about either standard is the "standard" part.
If people start 'doing their own thing' they should be strung up by their nethers. IMHO :)
If you are looking at application code, not just database queries, some things seem clear to me:
Table definitions usually directly map to a class that describes one object, so they should be singular. To describe a collection of an object, I usually append "Array" or "List" or "Collection" to the singular name, as it more clearly than use of plurals indicates not only that it is a collection, but what kind of a collection it is. In that view, I see a table name as not the name of the collection, but the name of the type of object of which it is a collection. A DBA who doesn't write application code might miss this point.
The data I deal with often uses "ID" for non-key identification purposes. To eliminate confusion between key "ID"s and non-key "ID"s, for the primary key name, we use "Key" (that's what it is, isn't it?) prefixed with the table name or an abbreviation of the table name. This prefixing (and I reserve this only for the primary key) makes the key name unique, which is especially important because we use variable names that are the same as the database column names, and most classes have a parent, identified by the name of the parent key. This also is needed to make sure that it is not a reserved keyword, which "Key" alone is. To facilitate keeping key variable names consistent, and to provide for programs that do natural joins, foreign keys have the same name as is used in the table in which they are the primary key. I have more than once encountered programs which work much better this way using natural joins. On this last point, I admit a problem with self-referencing tables, which I have used. In this case, I would make an exception to the foreign key naming rule. For example, I would use ManagerKey as a foreign key in the Employee table to point to another record in that table.
The convention we use where I work is pretty close to A, with the exception that we name tables in the plural form (ie, "employees") and use underscores between the table and column name. The benefit of it is that to refer to a column, it's either "employees _ id" or "employees.id", depending on how you want to access it. If you need to specify what table the column is coming from, "employees.employees _ id" is definitely redundant.
I like convention #2 - in researching this topic, and finding this question before posting my own, I ran into the issue where:
I am selecting * from a table with a large number of columns and joining it to a second table that similarly has a large number of columns. Both tables have an "id" column as the primary key, and that means I have to specifically pick out every column (as far as I know) in order to make those two values unique in the result, i.e.:
SELECT table1.id AS parent_id, table2.id AS child_id
Though using convention #2 means I will still have some columns in the result with the same name, I can now specify which id I need (parent or child) and, as Steven Huwig suggested, the USING statement simplifies things further.
I've always used userId as a PK on one table and userId on another table as a FK. 'm seriously thinking about using userIdPK and userIdFK as names to identify one from the other. It will help me to identify PK and FK quickly when looking at the tables and it seems like it will clear up code when using PHP/SQL to access data making it easier to understand. Especially when someone else looks at my code.
I use convention #2. I'm working with a legacy data model now where I don't know what stands for in a given table. Where's the harm in being verbose?
How about naming the foreign key
role_id
where role is the role the referenced entity has relativ to the table at hand. This solves the issue of recursive reference and multiple fks to the same table.
In many cases will be identical to the referenced table name. In this cases it becomes identically to one of your proposals.
In any case havin long arguments is a bad idea
"Where in "employee INNER JOIN order ON order.employee_id = employee.id" is there a need for additional qualification?".
There is no need for additional qualification because the qualification I talked of is already there.
"the reason that a business user refers to Order ID or Employee ID is to provide context, but at a dabase level you already have context because you are refereing to the table".
Pray, tell me, if the column is named 'ID', then how is that "refereing [sic] to the table" done exactly, unless by qualifying this reference to the ID column exactly in the way I talked of ?

Should I use Primary key here?

As an example,
I have a 3 tables:
School: ID int, Name varchar
Student: ID int, Name varchar
StudentInSchool: StudentID int, SchoolID int
Now the question is whether I should put a column ID int with a primary key on it in StudentInSchool table? If yes, why?
Will it be helpful in indexing?
Any help appreciated.
Personally, I create composite PK (StudentID and SchoolID) on such junction tables. This also ensures uniqueness.
If, however, uniqueness is not required, you'll have to add an ID column to uniquely identify each row.
Generally speaking, addition of a separate ID column will not help much: very few queries (if any) will actually use this column. As for performance, you can create separate index for each column and you'll be just fine.
Create a primary key on StudentID, SchoolID and a secondary index on SchoolID, or vice versa, depending on what search condition is used more often.
If your table is index organized (ORGANIZATION INDEX in Oracle, CLUSTERED in SQL Server, InnoDB in MySQL), then the secondary index will have a PRIMARY KEY as a leftmost part and, hence, all information can be fetched out of the index.
In this example, unless the StudentInSchool table is going to have other attributes, e.g. timestamps for when the student was in that school to cope with moves, I wouldn't use it and I'd put the schoolID field in the Student table and define it as a foreign key there.
But if this is the design, then yes, you're not going to be losing anything by putting a primary key on the StudentInSchool table.
The answer is, it depends. In most cases the answer is 'No': a compound primary key of (StudentID, SchoolID) will suffice.
But if that intersection table starts to acquire other related data (say, joining date, leaving date) and/or it becomes a parent of related tables (e.g. attendance record) then you may want or need to treat it as a regular table. In which case (StudentID, SchoolID) becomes a business key (i.e. still unique) and you add a synthetic (or surrogate) primary key of Id or whatever.
In terms of pure data integrity: no. It's perfectly sufficient to define the primary key as (StudentID, SchoolID).
However, you don't say which RDBMS you are using. It may be that, for some of them, a single ID column would result in more efficient query plans.
In the case of SQL Server, a composite primary key of two integers is very efficient, and no further indexes should be required on the two columns.
Ok I think there is something missing in the assignment, so I'll try with my poor knowledge of real world :o)
What are students? They go to school(s), they may study at more than one school (especially universities), they may even repat same school later, etc.
Is the junction table as-is (with PK over both ids) enough to model these relationships?
short answer: no
long answer: still no, but for subset of simple cases it is sufficient (is yours one of them?).
If you want to extend db later for all these cases, surrogate PK (your ID) will be required. I would put ID there if I have just a doubt it might be required (as there's not much to lose).
As stated in the first sentence - correct answer is: "We don't know" as requirements and context of application are missing.
You could combine StudentID and SchoolID to one primary key.
There are some general rules which
describe when to use indexes. When
dealing with relatively small tables,
indexes do not improve performance. In
general indexes improve performance
when they are created on fields used
in table joins. Use indexes when most
of your database queries retrieve
relatively small datasets, because if
your queries retrieve most of the data
most of the time, the indexes will
actually slow the data retrieval. Use
indexes for columns that have many
different values (there are not many
repeated values within the column).
Although indexes improve search
performance, they slow the updates,
and this might be something worth
considering.
Source: SQL Indexes

Maintaining subclass integrity in a relational database

Let's say I have a table that represents a super class, students. And then I have N tables that represent subclasses of that object (athletes, musicians, etc). How can I express a constraint such that a student must be modeled in one (not more, not less) subclass?
Clarifications regarding comments:
This is being maintained manually, not through an ORM package.
The project this relates to sits atop SQL Server (but it would be nice to see a generic solution)
This may not have been the best example. There are a couple scenarios we can consider regarding subclassing, and I just happened to invent this student/athlete example.
A) In true object-oriented fashion, it's possible that the superclass can exist by itself and need not be modeled in any subclasses.
B) In real life, any object or student can have multiple roles.
C) The particular scenario I was trying to illustrate was requiring that every object be implemented in exactly one subclass. Think of the superclass as an abstract implementation, or just commonalities factored out of otherwise disparate object classes/instances.
Thanks to all for your input, especially Bill.
Each Student record will have a SubClass column (assume for the sake of argument it's a CHAR(1)). {A = Athlete, M=musician...}
Now create your Athlete and Musician tables. They should also have a SubClass column, but there should be a check constraint hard-coding the value for the type of table they represent. For example, you should put a default of 'A' and a CHECK constraint of 'A' for the SubClass column on the Athlete table.
Link your Musician and Athlete tables to the Student table using a COMPOSITE foreign key of StudentID AND Subclass. And you're done! Go enjoy a nice cup of coffee.
CREATE TABLE Student (
StudentID INT NOT NULL IDENTITY PRIMARY KEY,
SubClass CHAR(1) NOT NULL,
Name VARCHAR(200) NOT NULL,
CONSTRAINT UQ_Student UNIQUE (StudentID, SubClass)
);
CREATE TABLE Athlete (
StudentID INT NOT NULL PRIMARY KEY,
SubClass CHAR(1) NOT NULL,
Sport VARCHAR(200) NOT NULL,
CONSTRAINT CHK_Jock CHECK (SubClass = 'A'),
CONSTRAINT FK_Student_Athlete FOREIGN KEY (StudentID, Subclass) REFERENCES Student(StudentID, Subclass)
);
CREATE TABLE Musician (
StudentID INT NOT NULL PRIMARY KEY,
SubClass CHAR(1) NOT NULL,
Instrument VARCHAR(200) NOT NULL,
CONSTRAINT CHK_Band_Nerd CHECK (SubClass = 'M'),
CONSTRAINT FK_Student_Musician FOREIGN KEY (StudentID, Subclass) REFERENCES Student(StudentID, Subclass)
);
Here are a couple of possibilities. One is a CHECK in each table that the student_id does not appear in any of the other sister subtype tables. This is probably expensive and every time you need a new subtype, you need to modify the constraint in all the existing tables.
CREATE TABLE athletes (
student_id INT NOT NULL PRIMARY KEY,
FOREIGN KEY (student_id) REFERENCES students(student_id),
CHECK (student_id NOT IN (SELECT student_id FROM musicians
UNION SELECT student_id FROM slackers
UNION ...))
);
edit: #JackPDouglas correctly points out that the above form of CHECK constraint is not supported by Microsoft SQL Server. Nor, in fact, is it valid per the SQL-99 standard to reference another table (see http://kb.askmonty.org/v/constraint_type-check-constraint).
SQL-99 defines a metadata object for multi-table constraints. This is called an ASSERTION, however I don't know any RDBMS that implements assertions.
Probably a better way is to make the primary key in the students table a compound primary key, the second column denotes a subtype. Then restrict that column in each child table to a single value corresponding to the subtype represented by the table. edit: no need to make the PK a compound key in child tables.
CREATE TABLE athletes (
student_id INT NOT NULL PRIMARY KEY,
student_type CHAR(4) NOT NULL CHECK (student_type = 'ATHL'),
FOREIGN KEY (student_id, student_type) REFERENCES students(student_id, student_type)
);
Of course student_type could just as easily be an integer, I'm just showing it as a char for illustration purposes.
If you don't have support for CHECK constraints (e.g. MySQL), then you can do something similar in a trigger.
I read your followup about making sure a row exists in some subclass table for every row in the superclass table. I don't think there's a practical way to do this with SQL metadata and constraints. The only option I can suggest to meet this requirement is to use Single-Table Inheritance. Otherwise you need to rely on application code to enforce it.
edit: JackPDouglas also suggests using a design based on Class Table Inheritance. See his example or my examples of the similar technique here or here or here.
If you are interested in data modeling, in addition to object modeling, I suggest you look up "relational modeling generalization specialization" on the web.
There used to be some good resources out there that explains this kind of pattern quite well.
I hope those resources are still there.
Here's a simplified view of what I hope you'll find.
Before you begin designing a database, it's useful to come up with a conceptual data model that connects the values stored in the database back to the subject matter. Making a conceptual data model is really data analysis, not database design. Sometimes it's difficult to keep analysis and design separate.
One way of modeling data at the conceptual level is the Entity-Relationship (ER) model. There are well known patterns for modeling the specialization-generalization situation. Converting those ER patterns to SQL tables (called logical design) is pretty straightforward, although you do have to make some design choices.
The case you gave of a student having possibly several roles like musician probably doesn't illustrate the case you are interested in, if I read you right. You seem to be interested in the case where the subclasses are mutually exclusive. Perhaps the case where a vehicle might be an auto, a truck, or a motorcycle might be easier to discuss.
One difference you are likely to encounter is that the general table for the superclass doesn't really need the type code column. The type of a single superclass instance can be derived by the presence or absence of foreign keys in the various subclass tables. Whether it's smarter to include or omit the type code depends on how you intend to use the data.
interesting problem. Of course the FK constraints are there for the subtables so there has to be a student for those.
The main problem is trying to check as it is inserted. The student has to be inserted first so that you don't violate a FK constraint in a subtable so a trigger that does a check wouldn't work.
You could write an app that checks now and then if you are really concerned about this. I think the biggest fear though would be deletions. Someone could delete a subtable entry but not the student. You could have triggers to check when items are deleted from the subtables since that is probably the biggest problem.
I have a db with a table per subclass hierarchy like this as well. I use Hibernate and its mapped properly so it deletes everything automatically. If doing this by 'hand' then I would make sure to always delete the parent with proper cascades hehe :)
Thanks, Bill. You got me thinking...
The superclass table has a subclass code column. Each of the subclass tables has a foreign key constraint, as well as one that dictates that the id exist with a subset of the superclass table (where code = athlete).
The only missing part here is that it's possible to model a superclass without a subclass. Even if you make the code column mandatory, it could just be an empty join. That can be fixed by adding a constraint that the superclass's ids exist in a union of the ids in the subclass tables. Insertion gets a little hairy with these two constraints if constraints are enforced in the middle of transactions. That or just don't worry about unsubclassed objects.
Edit: Bleh, such a good sounding idea... But impeded by the fact that subqueries that refer to other tables aren't supported. At least not in SQL Server.
That can be fixed by adding a constraint that the superclass's ids exist in a union of
the ids in the subclass tables.
Depending on how much intelligence you want to put into your schema (and how much MS SQL Server lets you put there), you wouldn't actually need to do a union of the subclass tables, since you know that, if the id exists in any subclass table, it must exist in the same subclass as the one identified by the subclass code column.
I would add a Check Constraint possibly.
Create the ForeignKeys as Nullable.
Add a Check to make sure they aren't both null and to make sure they aren't both set.
CONSTRAINT [CK_HasOneForiegnKey] CHECK ((FK_First!= NULL OR FK_Second != NULL) AND NOT (FK_First != NULL AND FK_Second != NULL)).
I am not sure but I believe this would allow you to set only one key at a time.

Why specify primary/foreign key attributes in column names

A couple of recent questions discuss strategies for naming columns, and I was rather surprised to discover the concept of embedding the notion of foreign and primary keys in column names. That is
select t1.col_a, t1.col_b, t2.col_z
from t1 inner join t2 on t1.id_foo_pk = t2.id_foo_fk
I have to confess I have never worked on any database system that uses this sort of scheme, and I'm wondering what the benefits are. The way I see it, once you've learnt the N principal tables of a system, you'll write several orders of magnitude more requests with those tables.
To become productive in development, you'll need to learn which tables are the important tables, and which are simple tributaries. You'll want to commit an good number of column names to memory. And one of the basic tasks is to join two tables together. To reduce the learning effort, the easiest thing to do is to ensure that the column name is the same in both tables:
select t1.col_a, t1.col_b, t2.col_z
from t1 inner join t2 on t1.id_foo = t2.id_foo
I posit that, as a developer, you don't need to be reminded that much about which columns are primary keys, which are foreign and which are nothing. It's easy enough to look at the schema if you're curious. When looking at a random
tx inner join ty on tx.id_bar = ty.id_bar
... is it all that important to know which one is the foreign key? Foreign keys are important only to the database engine itself, to allow it to ensure referential integrity and do the right thing during updates and deletes.
What problem is being solved here? (I know this is an invitation to discuss, and feel free to do so. But at the same time, I am looking for an answer, in that I may be genuinely missing something).
I agree with you. Putting this information in the column name smacks of the crappy Hungarian Notation idiocy of the early Windows days.
I agree with you that the foreign key column in a child table should have the same name as the primary key column in the parent table. Note that this permits syntax like the following:
SELECT * FROM foo JOIN bar USING (foo_id);
The USING keyword assumes that a column exists by the same name in both tables, and that you want an equi-join. It's nice to have this available as shorthand for the more verbose:
SELECT * FROM foo JOIN bar ON (foo.foo_id = bar.foo_id);
Note, however, there are cases when you can't name the foreign key the same as the primary key it references. For example, in a table that has a self-reference:
CREATE TABLE Employees (
emp_id INT PRIMARY KEY,
manager_id INT REFERENCES Employees(emp_id)
);
Also a table may have multiple foreign keys to the same parent table. It's useful to use the name of the column to describe the nature of the relationship:
CREATE TABLE Bugs (
...
reported_by INT REFERENCES Accounts(account_id),
assigned_to INT REFERENCES Accounts(account_id),
...
);
I don't like to include the name of the table in the column name. I also eschew the obligatory "id" as the name of the primary key column in every table.
I've espoused most of the ideas proposed here over the 20-ish years I've been developing with SQL databases, I'm embarrassed to say. Most of them delivered few or none of the expected benefits and were with hindsight, a pain in the neck.
Any time I've spent more than a few hours with a schema I've fairly rapidly become familiar with the most important tables and their columns. Once it got to a few months, I'd pretty much have the whole thing in my head.
Who is all this explanation for? Someone who only spends a few minutes with the design isn't going to be doing anything serious anyway. Someone who plans to work with it for a long time will learn it if you named your columns in Sanskrit.
Ignoring compound primary keys, I don't see why something as simple as "id" won't suffice for a primary key, and "_id" for foreign keys.
So a typical join condition becomes customer.id = order.customer_id.
Where more than one association between two tables exists, I'd be inclined to use the association rather than the table name, so perhaps "parent_id" and "child_id" rather than "parent_person_id" etc
I only use the tablename with an Id suffix for the primary key, e.g. CustomerId, and foreign keys referencing that from other tables would also be called CustomerId. When you reference in the application it becomes obvious the table from the object properties, e.g. Customer.TelephoneNumber, Customer.CustomerId, etc.
I used "fk_" on the front end of any foreign keys for a table mostly because it helped me keep it straight when developing the DB for a project at my shop. Having not done any DB work in the past, this did help me. In hindsight, perhaps I didn't need to do that but it was three characters tacked onto some column names so I didn't sweat it.
Being a newcomer to writing DB apps, I may have made a few decisions which would make a seasoned DB developer shudder, but I'm not sure the foreign key thing really is that big a deal. Again, I guess it is a difference in viewpoint on this issue and I'll certainly take what you've written and cogitate on it.
Have a good one!
I agree with you--I take a different approach that I have seen recommended in many corporate environments:
Name columns in the format TableNameFieldName, so if I had a Customer table and UserName was one of my fields, the field would be called CustomerUserName. That means that if I had another table called Invoice, and the customer's user name was a foreign key, I would call it InvoiceCustomerUserName, and when I referenced it, I would call it Invoice.CustomerUserName, which immediately tells me which table it's in.
Also, this naming helps you to keep track of the tables your columns are coming from when you're joiining.
I only use FK_ and PK_ in the ACTUAL names of the foreign and primary keys in the DBMS.