Design conditional relationship - sql

I need help with designing my database tables.
Employee
Id
EmployeeTypeId
EmployeeType
Id
Name
Car
Id
EmployeeId
How do I enforce that only one type of employee (driver) can be a foreign key in the Car table or should I redesign the tables?

I consider it a good idea to forearm the database such that implausible data cannot be entered. To enforce this here, however, is a bit tricky...
Solution 1:
Add EmployeeTypeId to the Car table. Then make (EmployeeId, EmployeeTypeId) a foreign key to the Employee table (where you might have to create a uniqe constraint on the two fields, in order to be able to use them for a foreign key reference). Then add a constraint on Car.EmployeeTypeId to ensure it's a driver. I know this looks redundant, but it really is no problem, because you cannot assign an Employee another EmployeeType here, so consistency is still guaranteed. I admit this approach is a bit clumsy, though.
Solution 2:
Use a before-insert tigger on the Car table, look up the employee and make sure it is a driver, else throw an exception. This is a better solution in my opinion, alone for its simplicity. You could then add a column to table Car holding a unique name for the types that you use, e.g. UniqueName = 'DRIVER', so you don't have to use the ID as a magic number. You see, normally one EmployeeType is a s good as the other in a database. If you want to build special logic on a certain entry, you need a handle for this. The unique name is one way to do this, a flag IsDriver = TRUE/FALSE would be another.

Related

What Would Be a Best Practice to Uniquely specify an Entry With a Boolean Flag in the Database?

Let's assume that we have a collection (or table) that is called students, and in our system, we need to persist the best student of all time, which is going to be one and only one among all the students we have.
The first thing that came to my mind was to add IsBestStudent property (flag) to the Student class, but thinking about it I think it is naive to add a new field say for one million students just to have one with the value true.
What would be a good practice to fulfill this requirement?
A simple solution is to create another table:
CREATE TABLE BestStudent (
student_id BIGINT PRIMARY KEY,
FOREIGN KEY (student_id) REFERENCES Student(student_id)
);
This table has just one row, so it must reference only one student. You don't need a new column in the Student table, you only need that table to have a primary key for this little table to reference.
If you really only want to keep the single best student, then you should simply never insert any more rows. Update the current row if a new student beats the former best student's performance.
You might also want to use this table to store a history of the best students, in which case you would keep multiple rows. It's up to you. But the point is that you can use this technique to avoid adding a column to millions of rows in the student table. This table is bound to stay small.

Should I use a foreign key or string value from foreign table?

Lets say you have an employee table and a jobs list table. Each employee must have one job in the table. Normally I would give each job an id and reference it as a foreign key in the employee table. One of my collegues however has suggested that we use the jobs list table as a dictionary and that before inserting / updating we check the 'job type' is present in the table and then insert the job type as a string into the employee table.
As far as I can see
the Pros:
Faster selects (although a join on a primary key should make marginal
difference)
the Cons:
Slower to select employees by job type
More difficult to change a job
type
Which of the above approaches is best?
I'm not sure that I understand why there's a conflict. You are aware that keys, both primary and foreign, can as easily be string types as number types? You can have the advantages of both systems by making the job type field the primary key in the job types table and the foreign key in the employee table.
The advantage to this approach is that you don't need to join the job type table back to the employee table to get a descriptive name for the job type, while you still use the database to provide the referential integrity that is important to your database design.
The disadvantage is that if you have a very large number of employees you'll use a bit more storage resources and that if you change the job description you'll need to cascade the changes through the employee table.
While it is common to use integers as primary keys (and some frameworks require it), there's no rule that says it has to be that way.
If it will make it more difficult to administer the job types (which I'm assuming is something you or someone else will be doing), would it really be worth it for a small performance increase? It also sounds like it might be overly complicating things for a small reward. I'd say the best practice is to stick with a foreign key for simplicity's sake.

Joining 2 primary key columns to 1 foreign key column in the same table

I have a jobs table with a projectmanagerid and a projectdirectorid e.g
jobs
----------------
pk jobid
pk projectmanagerid
pk projectdirectorid
Both these id columns need to link to an employees table using the employeeid pk as the link. Is this good practice or is there a better way?
employees
------------------
pk employeeid
other stuff
This seems OK as long as you're only going to have those two types: Manager and Director. But think about whether you might need to add another employee type, for example Coordinator, in the future. If that's a possibility then you've got a many-to-many relationship between jobs and employees that you should resolve by using an intermediary junction table, perhaps also adding a third table to describe the employee's role on the job (Manager, Director, ...).
Nothing wrong with it and it's perfectly acceptable. The field names are descriptive and thereby signify that you do actually need to have two different FKs pointing at the users table. if it was x and y, then it'd look weird.
Seems quite reasonable -- and common in many heirarchies -- to have two relationships to the same table. As with all foreign keys, but perhaps even more so in this case, use cascading carefully. I'm guessing that deletion of the manager or director should not result in deletion of the job record.
There is a rule of thumb that states a table models an entity/class or the relationship between entities/classes but not both. Therefore, consider creating two relationship tables to model the two relationships, project managers and project directors respectively. I don't recommend Joe Stefanelli's employee_role_id approach. I think you will find that the attributes for each role (yes, relationships do indeed have attributes too) will be too different to make the generic table approach to add value.

Primary key/foreign Key naming convention [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In our dev group we have a raging debate regarding the naming convention for Primary and Foreign Keys. There's basically two schools of thought in our group:
1:
Primary Table (Employee)
Primary Key is called ID
Foreign table (Event)
Foreign key is called EmployeeID
or
2:
Primary Table (Employee)
Primary Key is called EmployeeID
Foreign table (Event)
Foreign key is called EmployeeID
I prefer not to duplicate the name of the table in any of the columns (So I prefer option 1 above). Conceptually, it is consistent with a lot of the recommended practices in other languages, where you don't use the name of the object in its property names. I think that naming the foreign key EmployeeID (or Employee_ID might be better) tells the reader that it is the ID column of the Employee Table.
Some others prefer option 2 where you name the primary key prefixed with the table name so that the column name is the same throughout the database. I see that point, but you now can not visually distinguish a primary key from a foreign key.
Also, I think it's redundant to have the table name in the column name, because if you think of the table as an entity and a column as a property or attribute of that entity, you think of it as the ID attribute of the Employee, not the EmployeeID attribute of an employee. I don't go an ask my coworker what his PersonAge or PersonGender is. I ask him what his Age is.
So like I said, it's a raging debate and we go on and on and on about it. I'm interested to get some new perspectives.
If the two columns have the same name in both tables (convention #2), you can use the USING syntax in SQL to save some typing and some boilerplate noise:
SELECT name, address, amount
FROM employees JOIN payroll USING (employee_id)
Another argument in favor of convention #2 is that it's the way the relational model was designed.
The significance of each column is
partially conveyed by labeling it with
the name of the corresponding domain.
It doesn't really matter. I've never run into a system where there is a real difference between choice 1 and choice 2.
Jeff Atwood had a great article a while back on this topic. Basically people debate and argue the most furiously those topics which they cannot be proven wrong on. Or from a different angle, those topics which can only be won through filibuster style endurance based last-man-standing arguments.
Pick one and tell them to focus on issues that actually impact your code.
EDIT: If you want to have fun, have them specify at length why their method is superior for recursive table references.
I think it depends on your how you application is put together. If you use ORM or design your tables to represent objects then option 1 may be for you.
I like to code the database as its own layer. I control everything and the app just calls stored procedures. It is nice to have result sets with complete column names, especially when there are many tables joined and many columns returned. With this stype of application, I like option 2. I really like to see column names match on joins. I've worked on old systems where they didn't match and it was a nightmare,
Have you considered the following?
Primary Table (Employee)
Primary Key is PK_Employee
Foreign table (Event)
Foreign key is called FK_Employee
Neither convention works in all cases, so why have one at all? Use Common sense...
e.g., for self-referencing table, when there are more than one FK column that self-references the same table's PK, you HAVE to violate both "standards", since the two FK columns can't be named the same... e.g., EmployeeTable with EmployeeId PK, SupervisorId FK, MentorId Fk, PartnerId FK, ...
I agree that there is little to choose between them. To me a much more significant thing about either standard is the "standard" part.
If people start 'doing their own thing' they should be strung up by their nethers. IMHO :)
If you are looking at application code, not just database queries, some things seem clear to me:
Table definitions usually directly map to a class that describes one object, so they should be singular. To describe a collection of an object, I usually append "Array" or "List" or "Collection" to the singular name, as it more clearly than use of plurals indicates not only that it is a collection, but what kind of a collection it is. In that view, I see a table name as not the name of the collection, but the name of the type of object of which it is a collection. A DBA who doesn't write application code might miss this point.
The data I deal with often uses "ID" for non-key identification purposes. To eliminate confusion between key "ID"s and non-key "ID"s, for the primary key name, we use "Key" (that's what it is, isn't it?) prefixed with the table name or an abbreviation of the table name. This prefixing (and I reserve this only for the primary key) makes the key name unique, which is especially important because we use variable names that are the same as the database column names, and most classes have a parent, identified by the name of the parent key. This also is needed to make sure that it is not a reserved keyword, which "Key" alone is. To facilitate keeping key variable names consistent, and to provide for programs that do natural joins, foreign keys have the same name as is used in the table in which they are the primary key. I have more than once encountered programs which work much better this way using natural joins. On this last point, I admit a problem with self-referencing tables, which I have used. In this case, I would make an exception to the foreign key naming rule. For example, I would use ManagerKey as a foreign key in the Employee table to point to another record in that table.
The convention we use where I work is pretty close to A, with the exception that we name tables in the plural form (ie, "employees") and use underscores between the table and column name. The benefit of it is that to refer to a column, it's either "employees _ id" or "employees.id", depending on how you want to access it. If you need to specify what table the column is coming from, "employees.employees _ id" is definitely redundant.
I like convention #2 - in researching this topic, and finding this question before posting my own, I ran into the issue where:
I am selecting * from a table with a large number of columns and joining it to a second table that similarly has a large number of columns. Both tables have an "id" column as the primary key, and that means I have to specifically pick out every column (as far as I know) in order to make those two values unique in the result, i.e.:
SELECT table1.id AS parent_id, table2.id AS child_id
Though using convention #2 means I will still have some columns in the result with the same name, I can now specify which id I need (parent or child) and, as Steven Huwig suggested, the USING statement simplifies things further.
I've always used userId as a PK on one table and userId on another table as a FK. 'm seriously thinking about using userIdPK and userIdFK as names to identify one from the other. It will help me to identify PK and FK quickly when looking at the tables and it seems like it will clear up code when using PHP/SQL to access data making it easier to understand. Especially when someone else looks at my code.
I use convention #2. I'm working with a legacy data model now where I don't know what stands for in a given table. Where's the harm in being verbose?
How about naming the foreign key
role_id
where role is the role the referenced entity has relativ to the table at hand. This solves the issue of recursive reference and multiple fks to the same table.
In many cases will be identical to the referenced table name. In this cases it becomes identically to one of your proposals.
In any case havin long arguments is a bad idea
"Where in "employee INNER JOIN order ON order.employee_id = employee.id" is there a need for additional qualification?".
There is no need for additional qualification because the qualification I talked of is already there.
"the reason that a business user refers to Order ID or Employee ID is to provide context, but at a dabase level you already have context because you are refereing to the table".
Pray, tell me, if the column is named 'ID', then how is that "refereing [sic] to the table" done exactly, unless by qualifying this reference to the ID column exactly in the way I talked of ?

Factoring out nulls in bill-of-materials style relations

Given the schema
PERSON { name, spouse }
where PERSON.spouse is a foreign key to PERSON.name, NULLs will be necessary when a person is unmarried or we don't have any info.
Going with the argument against nulls, how do you avoid them in this case?
I have an alternate schema
PERSON { name }
SPOUSE { name1, name2 }
where SPOUSE.name* are FKs to PERSON. The problem I see here is that there is no way to ensure someone has only one spouse (even with all possible UNIQUE constraints, it would be possible to have two spouses).
What's the best way to factor out nulls in bill-of-materials style relations?
I think that enforcing no NULLs and no duplicates for this type of relationship makes the schema definition way more complicated than it really needs to be. Even if you allow nulls, it would still be possible for a person to have more than one spouse, or to have conflicting records e.g:
PERSON { A, B }
PERSON { B, C }
PERSON { C, NULL }
You'd need to introduce more data, like gender (or "spouse-numbers" for same-sex marriages?) to ensure that, for example, only Persons of one type are allowed to have a Spouse. The other Person's spouse would be determined by the first person's record. E.g.:
PERSON { A, FEMALE, B }
PERSON { B, MALE, NULL }
PERSON { C, FEMALE, NULL }
... So that only PERSONs who are FEMALE can have a non-null SPOUSE.
But IMHO, that's overcomplicated and non-intuitive even with NULLs. Without NULLs, it's even worse. I would avoid making schema restrictions like this unless you literally have no choice.
Well, first I would use auto-incrementing IDs as, of course, someone could have the same name. But, I assume you intend to do that and won't harp on it. However, how does the argument against NULLs go exactly? I don't have any problem with NULLs and think that is the appropriate solution to this problem.
I'm not sure why no one has pointed this out yet, but it's actually quite easy to ensure that a person has only one spouse, using pretty much the same model that you have in your question.
I'm going to ignore for the moment the use of a name as a primary key (it can change and duplicates are fairly common, so it's a poor choice) and I'm also going to leave out the possible need for historical tracking (you might want to add an effective date of some sort so that you know WHEN they were a spouse - Joe Celko has written some good stuff on temporal modeling, but I don't recall which book it was in at the moment). Otherwise if I got divorced and remarried you would lose that I had another spouse at another time - maybe that isn't important to you though.
Also, you might want to break up name into first_name, middle_name, last_name, prefix, suffix, etc.
Given those caveats...
CREATE TABLE People
(
person_name VARCHAR(100),
CONSTRAINT PK_People PRIMARY KEY (person_name)
)
GO
CREATE TABLE Spouses
(
person_name VARCHAR(100),
spouse_name VARCHAR(100),
CONSTRAINT PK_Spouses PRIMARY KEY (person_name),
CONSTRAINT FK_Spouses_People FOREIGN KEY (person_name) REFERENCES People (person_name)
)
GO
If you wanted to have spouses appear in the People table as well then you could add an FK for that as well. However, at that point you're dealing with a bidirectional link, which becomes a bit more complex.
All right, use Auto-IDs and then use a Check Constraint. The "Name1" column (which would only be an int ID) will be force to only have ODD numbered IDs and Name2 will only have EVEN.
Then create a Unique Constraint for Column1 and Column2.
Well, begin with using a key other than name, perhaps a int seed. But to prevent someone from having more than one spouse, simply add a unique index to the parent(name1) in the spouse table. that will prevent you from ever inserting the same name1 twice.
You need a person TABLE and a separate "Partner_Off" table to define the relationship.
Person (id, name, etc );
Partner_Off (id, partner_id, relationship);
To deal with the more complex social situation you probaly would probably need some dates in there, plus, to simplify the sqls you need one entry for (fred,wilma,husband) and a matching entry for (wilma,fred,wife).
You can use a trigger to enforce the constraint. PostgreSQL has constraint triggers, a particularly nice way to defer the constraint evaluation until the appropriate time in the transaction.
From Fabian Pascal's Practical Issues in Database Management, pp. 66-67:
Stored procedures—whether triggered or
not—are preferable to application
level integrity code, but they are
practically inferior to and riskier
than declarative support because they
are more burdensome to write, error
prone, and cannot benefit from full
DBMS optimization.
...
Choose DBMSs with better declarative
integrity support. Given the
considerable gaps in such support by
products, knowledgeable users would be
at least in a position to emulate
correctly—albeit with procedural and/or application code—constraints
not supported by the DBMS.