Identifying functional dependencies (FDs) - sql

I am working with a table that has a composite primary key composed of two attributes (with a total of 10) in 1NF form.
In my situation a fully functional dependency involves the dependent relying on both attributes in my primary key.
A partial dependency relies on either one of the attributes from the primary key.
A transitive dependency involves two or more non-key attributes in a functional dependence where one of the non-key attributes is dependent on a key attribute from my primary key.
Pulling the transitive dependencies out of the table, seems do this after normalization, but my assignment requires us to identify all functional dependencies before we draw the dependency diagram (after which we normalize the tables). Parenthesis identify the primary key attributes:
(Student ID), Student Name, Student Address, Student Major, (Course ID), Course Title, Instructor ID, Instructor Name, Instructor Office, Student_course_grade
Only one class is taught for each course ID.
Students may take up to 4 courses.
Each course may have a maximum of 25 students.
Each course is taught by only one Instructor.
Each student may have only one major.

From your question it seems that you do not have a clear understanding of basics.
Application relationships & situations
First you have to take what you were told about your application (including business rules) and identify the application relationships (aka associations) (aka relations, in the math sense of association). Each gets a (base) table (aka relation, in the math sense of associated tuples) variable. Such an application relationship can be characterized by a row membership criterion (aka meaning) (aka predicate) that is a statement template. Eg suppose criterion student [si] takes course [ct] has table variable TAKES. The parameters of the criterion are the columns of its table. We can use a table name with columns (like an SQL declaration) as a shorthand for the criterion. Eg TAKES(si,ct). A criterion plus a row makes a statement (aka proposition) about a situation. Eg row (17,'CS101') gives student 17 takes course 'CS101' ie TAKES(17,'CS101'). Rows that give a true statement go in the table and rows that make a false one stay out.
If we can rephrase a criterion as the AND/conjunction of two others then we only need the tables with those other criteria. This is because NATURAL JOIN is defined so that the NATURAL JOIN of two tables containing the rows making their criteria true returns the rows that make the AND/conjunction of their criteria true. So we can NATURAL JOIN the two tables to get back the original. (This is what normalization is doing by decomposing tables into components.)
/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with title [ct]
from instructor with id [ii] and name [in] and office [io]
with grade [scg]
*/
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with grade [scg]
*/
SG(si,sn,sa,sm,ci,scg)
/* rows where
course [ci] with title [ct]
is taught by instructor with id [ii] and name [in] and office [io]
*/
CI(ci,ct,ii,in,io,scg)
Now by the definition of NATURAL JOIN,
the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg)
are the rows in SG NATURAL JOIN CI.
And since
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
when/iff
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
ie since
the rows where
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
are the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
we have T = SG NATURAL JOIN CI.
Together the application relationships and situations that can arise determine both the rules and constraints! They are just things that are true of every application situation or every database state (ie values of one or more base tables) (which are are a function of the criteria and the possible application situations.)
Then we normalize to reduce redundancy. Normalization replaces a table variable by others whose predicates AND/conjoin together to the original's when this is beneficial.
The only time a rule can tell you something that you don't know already know from the (putative) criteria and (putative) situations is when you don't really understand the criteria or what situations can turn up, and the a priori rules are clarifying something about that. A person giving you rules is already using application relationships that they assume you understand and they can only have determined that a rule holds by using them and all the application situations that can arise (albeit informally)!
(Unfortunately, many presentations of information modeling don't even mention application relationships. Eg: If someone says "there is a X:Y relationship" then they must already have in mind a particular binary application relationship between entities; knowing it and what application situations can arise, they are reporting that it has a certain cardinality in a certain direction. This will correspond to some application relationship, represented by (a projection of) a table using column sets that identify entities. Plus some presentations/methods call FKs "relationships"--confusing them with those relationships.)
Check out "fact-based" information modeling methods Object-Role Modeling or (its predecessor) NIAM.
FDs & CKs
Given the criterion for putting rows into or leaving them out of a table and all possible situations that can arise, only some values (sets of rows) can ever be in a table variable.
For every subset of columns you need to decide which other columns can only have one value for a given subrow value for those columns. When it can only have one we say that the subset of columns functionally determines that column. We say that there is a FD (functional dependency) columns->column. This is when we can express the table's predicate as "... AND column=F(columns)" for some function F. (F is represented by the projection of the table on the column & columns.) But every superset of that subset will also functionally determine it, so that cuts down on cases. Conversely, if a given set does not determine a column then no subset of the set does. Applying Armstrong's axioms gives all the FDs that hold when given FDs hold. (Algorithms & software are available to apply them & determine FD closures & covers.) Also, you may think in terms of column sets being unique; then all other columns are functionally dependent on that set. Such a set is called a superkey.
Only after you have determined the FDs can you determine the CKs (candidate keys)! A CK is a superkey that contains no smaller superkey. (That a CK and/or superkey is present is also a constraint.) We can pick a CK as PK (primary key). PKs have no other role in relational theory.
A partial dependency relies on either one of the attributes from the
Primary key.
Don't use "involve" or "relies on" to give a definition. Say, "when" or "iff" ("if and only if").
Read a definition. A FD that holds is partial when/iff using a proper subset of the determinant gives a FD that holds with the same determined column; otherwise it is full. Note that this does not involve CKs. A relation is in 2NF when all non-prime attributes are fully functionally dependent on every CK.
A transitive dependency involves two or more non-key attributes in a
functional dependence where one of the non-key attributes is dependent
on a key attribute (from my PK).
Read a definition. S -> T is transitive when/iff there is an X where S -> X and X -> T and not (X -> S) and not (X = T). Note that this does not involve CKs. A relation is in 3NF when all non-prime attributes are non-transitively dependent on every CK.
"1NF" has no single meaning.

I am inferring a functional dependency that was not listed in your business rules. Namely that instructor ID determines instructor name.
If this is true, and if you have both instructor ID and instructor name in the Course table, then this is not in 3NF, because there is a transitive dependency between Course ID, Instructor ID, and Instructor Name.
Why is this harmful? Because duplicating the instructor name in each course an instructor teaches makes updating an instructor name difficult, and possible to do in an inconsistent manner. Inconsistent instructor name is just another bug you have to watch out for, and 3NF obviates the problem. The same argument could be made for Instructor office.

Related

Not sure if this consistitues a transitive dependency

I am a bit stuck designing part of a database.
I have a table called Staff. It has different attributes:
StaffID
First Name
Last Name
Job Title
Department Number
Telephone Number
StaffID is the primary key in this table.
My issue however, is that it is possible to find any information based on the telephone number (i.e. each staff member has a different, unique telephone number).
For example, this means that the First Name or Job Title can be found when we have the Phone Number. However, Phone Number is not a primary key, StaffID is.
I am not sure whether this is a transitive dependency and should fixed through 3NF by splitting up the table and having the Staff table without the Phone Number and another table with just StaffID and Telephone Number.
Transitive dependency occurs only if you have indirect relationship between more than 2 attributes that are not part of they key.
In your example, as you explained, the StaffID is part of your dependency, which is fine because it's the primary key.
Also you can look at this question that shows what is wrong with a transitive dependency. It could help put things into perspective.
In your table, if you delete staff member, you delete all the information (rightly so because you don't need it). If you leave phone number in a different table and, for instance, delete entry only in Staff, you're left with a wild phone number. But if your Staff table allowed multiple entries for the same person (but different departments) then the situation would be different.
Other sites that helped me in the past:
https://www.thoughtco.com/transitive-dependency-1019760
https://beginnersbook.com/2015/04/transitive-dependency-in-dbms/
Funnily they always follow the book example : )
In design-theoretical terms, keys are implied by dependencies. If PhoneNumber→StaffID and if StaffID is known to be a key then we can infer that PhoneNumber is also a key. If that is the case then there is no violation of 3NF because the determinants are all keys. Note that the choice of StaffID as primary key is irrelevant here. Normalization treats all keys as equally significant.
In practical database design however, the question arises as to whether PhoneNumber really makes sense as a key. In other words, would you actually want to enforce dependencies like PhoneNumber→StaffID? If, after consideration, you decide that dependency is not applicable then you could discard that dependency (by not making PhoneNumber a key) and the table would still satisfy 3NF with respect to the set of dependencies you have left.
Here's a reason why a dependency like PhoneNumber→StaffID might not be a realistic choice: when I joined my present company I got a staff ID on my first day; I didn't get a phone number until two days later.
It is not because there is no dependency between phone and name or last name, if you know the name you can't know the phone number, it is not the same as for example, Model and Manufacturer, if you know the model is a mustang then you know the manufacturer is ford, and ther other way around, you know that ford makes mustangs
With the columns you mentioned I would have separate tables for departments and job titles, because they do not depend on the PK StaffID. Think about it as removing potential redundancies, you can have five thousand people in there and have job title as a string repeated one thousand times, that is a signal that it needs its own table (2NF).
Transitive dependency means that you have a (set of) attribute(s) that are completely determined by going from a (set of) attribute(s) A -> B and then from B -> C, while you cannot go from B -> A.
In your case, you do indeed have (StaffId) -> (PhoneNumber) and also (PhoneNumber) -> (StaffId). This means you have A -> B and B -> A and hence at this step you can already rule out the transitive dependency.
If you like, you could say that PhoneNumber would be another candidate for PK.
As a background, the problem with transitive dependencies is this: Assume you have a table consisting of "Book Title" (primary key), "Author" and "Gender of Author". Then you certainly have a transitive dependency BT -> A, A -> GoA, hence BT -> GoA.
Now assume that one of your authors is "Andy Smith", Andy being a short name for Andreas. Andreas goes and changes gender, and is now Andrea. Obviously you do not need to change the name, "Andy" works just fine for "Andrea". But you do have to change the Gender. You have to do it for many entries in your table, i.e. for all books from that author.
In this case, you would fix the problem by creating a new table "Author", obviously, and then you'd have only one row for Andy.
Hope that clears it up. It is easy to see that in your example there is no constellation where you have to change many rows due to a phone number change. It's a simple 1:1 relationship between StaffId and PhoneNumber, no problems whatsoever. Both are candidate keys.

Cross Table Dependency/Constraint in SQL Database

Take the example that I have of a table called classes that holds university classes and a table called students that holds students. A class has many students and a student can only take one class. (1 to many relationship). If I had a column in classes that stored the total number of students a class has, this feels like it should violate 3NF. But the dependency is in a separate table. What is this dependency called? And can we say this is violating 3NF? Because in some sense it has all the problems of a 3NF violation. I was wondering if this was a related case.
TL;DR
But the dependency is in a separate table.
You mean there is a dependency (in the everyday sense) on another table. We say there is a constraint on the two tables. (They depend on each other.) In addition to the FK (foreign key) constraint that every students classes value is a classes class value.
What is this dependency called?
We can reasonably categorize the constraint as "inter-table". It is that classes equals SELECT class, SUM(student) AS total FROM classes LEFT JOIN students USING (class) GROUP BY class.
And can we say this is violating 3NF?
The constraint doesn't involve violating a NF. Moreover normalization applies only to a single table and its FDs (functional dependencies).
(A straightforward design is to have base students, base classes1 that is the original classes without total, and VIEW classes AS SELECT class, SUM(student) AS total FROM classes1 LEFT JOIN students USING (class) GROUP BY class.)
If I had a column in classes that stored the total number of students a class has, this feels like it should violate 3NF.
Whether a table is in a given NF (normal form) has nothing to do with any other tables. (We say a database is in a given NF when all its tables are.) Whether your design is nevertheless bad is another matter.
Since a class has just one total number of students, there is a FD (functional dependency) of total on class in classes, ie class functionally determines total.
We say that a set of columns functionally determines another set in a table when each subrow for the first always appears with the same subrow for the second. Normalization to higher NFs replaces a table by projections of it that join back ot it, per the FDs & JDs (join dependencies) that hold in it. There is redundancy in a database when two tables say the same thing about the business/application situation; but not all redundancy is bad. Learn proper information modeling & database design.
It may or may not violate a NF to have your class student count as a column in classes. What FDs violate a NF depends on all the FDs present and the NF. (And it only make sense to talk about a particular FD in a particular table violating a particular NF if you are talking about a particular part of a particular definition of that NF.)
(If a DBMS-calculated/computed/generated column violates a NF that would hold without it then that is not a problem, because it is controlled by the DBMS. You can think of the table as view of the table without the column.)
But the dependency is in a separate table.
When a sequence of database states cannot hold all the values possible per the the columns of tables we say constraints hold or the database is constrained. FDs (functional dependencies), MVDs (multi-valued dependencies), JDs (join dependencies), INDs (inclusion dependencies), EQDs (equality dependencies) and other "dependencies" (which technically are expressions given a context) are each associated with certain constraints. CKs (candidate keys), PKs (primary keys), superkeys (SQL PK & UNIQUE NOT NULL), FKs (foreign keys) (which technically are all column sets) & other notions are also each associated with certain constraints. But arbitrary conditions can hold on a sequence of database states.
SQL has a distinct but related notion of a constraint characterized by a name and an expression/condition (constraint in the above sense), declared by appropriate syntax. A state is constrained by column typing, PK, UNIQUE, NOT NULL & CHECK constraints. ASSERTION gives an arbitrary condition on a state but it is not supported by most DBMSs. CASCADES supports some inter-state inter-table constraints. SQL TRIGGERs enforce arbitrary constraints. Indexes also enforce constraints in a DBMS-specific way.
Because in some sense it has all the problems of a 3NF violation.
Your edits improved your question. Using the wrong words or using words in the wrong way at best states something that is not what we mean. But when what we write doesn't make sense it suggests that our problem, whatever else it involves, involves not knowing what the words mean. Forcing ourselves to use words correctly allows others to know what we really mean. Eg here maybe "... in the join of tables ... there would be a 3NF-violating FD ...". Even by explicitly saying that we are unsure we can communicate some of our vague groping without saying something that we don't mean. Eg your "this feels like ...". But it also leads us to clearly organize what we are faced with. This helps not only the problem we are working on but improves our problem solving.
It does not violate normalization, but it would be painful to maintain rather than doing a count in your query.
Note: Junction tables are for many-to-many.

Misconception of what superkey or Boyce Codd Normal form is

At 9:34 in this video the speaker says that all 3 functional dependencies are in Boyce Codd Normal Form. I don't believe it because clearly GPA can't determine the SSN, sName, address and all other attributes in the student table. Either I'm confused about the definition of Boyce Codd Normal Form or what a super key is? Does it only have to be able to uniquly identify certain attributes, not all attributes in the schema? For example GPA does determine priority (which is on the right side of the functional dependency) but not everything else.
For example if I had the relation R(A,B,C,D) and the FDs A->B would we say A is a superkey for B but I thought a super key is for the whole table? To add to my confusion I know for BCNF it can be a (primary) key but you can only have on primary key for the table. Ugh my brain hurts.
"... the speaker says that all 3 functional dependencies are in Boyce Codd Normal Form."
To be in BC normal form is a property that can be had by RELATIONS (relation variables, more specifically, or relation schemas, if that term suits you better), not by functional dependencies. If you find someone talking so sloppily of normalization theory, leave and move onto more accurate explanations.
Whether or not a relation variable is indeed in BC normal form, depends on which functional dependencies are supposed to hold in it. That is why it is utter nonsense to say that functional dependencies are or are not in BC normal form.
"I don't believe it because clearly GPA can't determine the SSN, sName, address and all other attributes in the student table. Either I'm confused about the definition of Boyce Codd Normal Form or what a super key is? Does it only have to be able to uniquly identify certain attributes, not all attributes in the schema?"
An irreducible candidate key is that set (not necessarily unique) of attributes of the relation schema that is guaranteed to have unique combinations of attribute values in whatever relation values could validly appear in the relation variable in the database.
In your (A,B,C,D) example, if A->B is the only FD that holds, then the only candidate key is {A,C,D}.
"For example if I had the relation R(A,B,C,D) and the FDs A->B would we say A is a superkey for B"
It is sloppy and confusing to talk of A as being the "key" for B in such a case. People who pretend to be teaching others ought to know this, and people who don't, ought not engage in any teaching until they do know this. It would be better to talk of A as the "determinant" for B in such contexts. The term "key" in the context of relational database design has a very well-defined and precise meaning, and using the same term for other meanings merely confuses people. As evidenced by your question.
"but I thought a super key is for the whole table?"
Yes you thought right.
Back to your (A,B,C,D) example. If we were to split that design into (A,B) and (A,C,D), then we would have a relation schema -the (A,B) one- of which we can say that "{A} is a key" in that schema.
That is actually precisely what the FD A->B means : if you take the projection -of the relation value that would appear in the database in the (A,B,C,D) schema- over the attributes {A,B}, then you should be getting a relation in which no A value appears twice (if it did, then that A value would correspond to >1 distinct B value, meaning that A could not possibly be a determinant for B after all).
"To add to my confusion I know for BCNF it can be a (primary) key but ..."
Now you are being sloppy yourself. What does "it" refer to ?

Compound primary key table with subtypes

Me and a database architect were having argument over if a table with a compound primary key with subtypes made sense relationally and if it was a good practice.
Say we have two tables Employee and Project. We create a composite table Employee_Project with a composite primary key back to Employee and Project.
Is there a valid way for Employee_Project to have subtypes? Or can you think of any scenario where a composite key table can have subtypes?
To me a composite key relationship is a 'Is A' relationship (Employee_Project is a Employee and a Project). Subtypes are also a 'Is A' relationship. So if you have a composite key with a subtype its two 'Is A' relationships in one sentence which makes me believe this is a bad practice.
Employee-project is a bit hard, but one can imagine something like this -- although I'm not much of a chemist.
Or something like this, which would require different legal forms (fields) for single person ownership vs joint (time-share).
Or like this, providing that different forms are needed for full time and temp.
Employee projects have subtypes if the candidate subtypes are
not utterly different, but
not exactly alike
That means that
Every employee project has some
attributes (columns) in common. So they're not utterly different.
Some employee projects have different
attributes than others. So they're not exactly alike.
The determination has to do with common and distinct attributes. It doesn't have anything to do with the number of columns in a candidate key. Do you have employee projects that are not utterly different, but not exactly alike?
The most common business supertype/subtype example concerns organizations and individuals. They're not utterly different.
Both have addresses.
Both have phone numbers.
Both can be plaintiffs and defendants
in court.
But they're not exactly alike.
Individuals can go to college.
Organizations can have a CEO.
Individuals can get married.
Individuals can have children.
Organizations (in the USA) can be liquidated.
So you can express individuals and organizations as subtypes of a supertype called, say, "Parties". The attributes all the subtypes have in common relate to the supertype.
Parties have addresses.
Parties have phone numbers.
Parties can be plaintiffs and defendants
in court.
Again, this has to do with attributes that are held in common, and attributes that are distinct. It has nothing to do with the number of columns in a candidate key.
To me a composite key relationship is
a 'Is A' relationship
(Employee_Project is a Employee and a
Project).
Database designers don't think that way. We think in terms of a table's predicate.
If an employee can have many projects and a project can have many employees it is a many-to-many join that RDBM's can only represent easily in one way (the way you have outlined above.) You can see in the ER diagram below (employee / departments is one of the classic many-to-many examples) that it does not have a separate ER component. The separate table is a leaky abstraction of RDBMS's (which is probably why you are having a hard time modeling it).
http://www.library.cornell.edu/elicensestudy/dlfdeliverables/fallforum2003/ERD_final.doc
Bridge Entities
When an instance of an entity may be related to multiple instances of another entity and vice versa, that is called a “many-to-many relationship.” In the example below, a supplier may provide many different products, and each type of product may be offered by many suppliers:
While this relationship model is perfectly valid, it cannot be translated directly into a relational database design. In a relational database, relationships are expressed by keys in a table column that point to the correct instance in the related table. A many-to-many relationship does not allow this relationship expression, because each record in each table might have to point to multiple records in the other table.
http://users.csc.calpoly.edu/~jdalbey/205/Lectures/ERD_image004.gif
Here they do not event bother with a separate box although they add in later (at this step it is a 'pure' ER diagram). It can also be explicitly represented with a box and a diamond superimposed on each other.

Is there ever a time where using a database 1:1 relationship makes sense?

I was thinking the other day on normalization, and it occurred to me, I cannot think of a time where there should be a 1:1 relationship in a database.
Name:SSN? I'd have them in the same table.
PersonID:AddressID? Again, same table.
I can come up with a zillion examples of 1:many or many:many (with appropriate intermediate tables), but never a 1:1.
Am I missing something obvious?
A 1:1 relationship typically indicates that you have partitioned a larger entity for some reason. Often it is because of performance reasons in the physical schema, but it can happen in the logic side as well if a large chunk of the data is expected to be "unknown" at the same time (in which case you have a 1:0 or 1:1, but no more).
As an example of a logical partition: you have data about an employee, but there is a larger set of data that needs to be collected, if and only if they select to have health coverage. I would keep the demographic data regarding health coverage in a different table to both give easier security partitioning and to avoid hauling that data around in queries unrelated to insurance.
An example of a physical partition would be the same data being hosted on multiple servers. I may keep the health coverage demographic data in another state (where the HR office is, for example) and the primary database may only link to it via a linked server... avoiding replicating sensitive data to other locations, yet making it available for (assuming here rare) queries that need it.
Physical partitioning can be useful whenever you have queries that need consistent subsets of a larger entity.
One reason is database efficiency. Having a 1:1 relationship allows you to split up the fields which will be affected during a row/table lock. If table A has a ton of updates and table b has a ton of reads (or has a ton of updates from another application), then table A's locking won't affect what's going on in table B.
Others bring up a good point. Security can also be a good reason depending on how applications etc. are hitting the system. I would tend to take a different approach, but it can be an easy way of restricting access to certain data. It's really easy to just deny access to a certain table in a pinch.
My blog entry about it.
Sparseness. The data relationship may be technically 1:1, but corresponding rows don't have to exist for every row. So if you have twenty million rows and there's some set of values that only exists for 0.5% of them, the space savings are vast if you push those columns out into a table that can be sparsely populated.
Most of the highly-ranked answers give very useful database tuning and optimization reasons for 1:1 relationships, but I want to focus on nothing but "in the wild" examples where 1:1 relationships naturally occur.
Please note one important characteristic of the database implementation of most of these examples: no historical information is retained about the 1:1 relationship. That is, these relationships are 1:1 at any given point in time. If the database designer wants to record changes in the relationship participants over time, then the relationships become 1:M or M:M; they lose their 1:1 nature. With that understood, here goes:
"Is-A" or supertype/subtype or inheritance/classification relationships: This category is when one entity is a specific type of another entity. For example, there could be an Employee entity with attributes that apply to all employees, and then different entities to indicate specific types of employee with attributes unique to that employee type, e.g. Doctor, Accountant, Pilot, etc. This design avoids multiple nulls since many employees would not have the specialized attributes of a specific subtype. Other examples in this category could be Product as supertype, and ManufacturingProduct and MaintenanceSupply as subtypes; Animal as supertype and Dog and Cat as subtypes; etc. Note that whenever you try to map an object-oriented inheritance hierarchy into a relational database (such as in an object-relational model), this is the kind of relationship that represents such scenarios.
"Boss" relationships, such as manager, chairperson, president, etc., where an organizational unit can have only one boss, and one person can be boss of only one organizational unit. If those rules apply, then you have a 1:1 relationship, such as one manager of a department, one CEO of a company, etc. "Boss" relationships don't only apply to people. The same kind of relationship occurs if there is only one store as the headquarters of a company, or if only one city is the capital of a country, for example.
Some kinds of scarce resource allocation, e.g. one employee can be assigned only one company car at a time (e.g. one truck per trucker, one taxi per cab driver, etc.). A colleague gave me this example recently.
Marriage (at least in legal jurisdictions where polygamy is illegal): one person can be married to only one other person at a time. I got this example from a textbook that used this as an example of a 1:1 unary relationship when a company records marriages between its employees.
Matching reservations: when a unique reservation is made and then fulfilled as two separate entities. For example, a car rental system might record a reservation in one entity, and then an actual rental in a separate entity. Although such a situation could alternatively be designed as one entity, it might make sense to separate the entities since not all reservations are fulfilled, and not all rentals require reservations, and both situations are very common.
I repeat the caveat I made earlier that most of these are 1:1 relationships only if no historical information is recorded. So, if an employee changes their role in an organization, or a manager takes responsibility of a different department, or an employee is reassigned a vehicle, or someone is widowed and remarries, then the relationship participants can change. If the database does not store any previous history about these 1:1 relationships, then they remain legitimate 1:1 relationships. But if the database records historical information (such as adding start and end dates for each relationship), then they pretty much all turn into M:M relationships.
There are two notable exceptions to the historical note: First, some relationships change so rarely that historical information would normally not be stored. For example, most IS-A relationships (e.g. product type) are immutable; that is, they can never change. Thus, the historical record point is moot; these would always be implemented as natural 1:1 relationships. Second, the reservation-rental relationship store dates separately, since the reservation and the rental are independent events, each with their own dates. Since the entities have their own dates, rather than the 1:1 relationship itself having a start date, these would remain as 1:1 relationships even though historical information is stored.
Your question can be interpreted in several ways, because of the way you worded it. The responses show this.
There can definitely be 1:1 relationships between data items in the real world. No question about it. The "is a" relationship is generally one to one. A car is a vehicle.
One car is one vehicle. One vehicle might be one car. Some vehicles are trucks, in which case one vehicle is not a car. Several answers address this interpretation.
But I think what you really are asking is... when 1:1 relationships exist, should tables ever be split? In other words, should you ever have two tables that contain exactly the same keys? In practice, most of us analyze only primary keys, and not other candidate keys, but that question is slightly diferent.
Normalization rules for 1NF, 2NF, and 3NF never require decomposing (splitting) a table into two tables with the same primary key. I haven't worked out whether putting a schema in BCNF, 4NF, or 5NF can ever result in two tables with the same keys. Off the top of my head, I'm going to guess that the answer is no.
There is a level of normalization called 6NF. The normalization rule for 6NF can definitely result in two tables with the same primary key. 6NF has the advantage over 5NF that NULLS can be completely avoided. This is important to some, but not all, database designers. I've never bothered to put a schema into 6NF.
In 6NF missing data can be represent by an omitted row, instead of a row with a NULL in some column.
There are reasons other than normalization for splitting tables. Sometimes split tables result in better performance. With some database engines, you can get the same performance benefits by partitioning the table instead of actually splitting it. This can have the advantage of keeping the logical design easy to understand, while giving the database engine the tools needed to speed things up.
I use them primarily for a few reasons. One is significant difference in rate of data change. Some of my tables may have audit trails where I track previous versions of records, if I only care to track previous versions of 5 out of 10 columns splitting those 5 columns onto a separate table with an audit trail mechanism on it is more efficient. Also, I may have records (say for an accounting app) that are write only. You can not change the dollar amounts, or the account they were for, if you made a mistake then you need to make a corresponding record to write adjust off the incorrect record, then create a correction entry. I have constraints on the table enforcing the fact that they cannot be updated or deleted, but I may have a couple of attributes for that object that are malleable, those are kept in a separate table without the restriction on modification. Another time I do this is in medical record applications. There is data related to a visit that cannot be changed once it is signed off on, and other data related to a visit that can be changed after signoff. In that case I will split the data and put a trigger on the locked table rejecting updates to the locked table when signed off, but allowing updates to the data the doctor is not signing off on.
Another poster commented on 1:1 not being normalized, I would disagree with that in some situations, especially subtyping. Say I have an employee table and the primary key is their SSN (it's an example, let's save the debate on whether this is a good key or not for another thread). The employees can be of different types, say temporary or permanent and if they are permanent they have more fields to be filled out, like office phone number, which should only be not null if the type = 'Permanent'. In a 3rd normal form database the column should depend only on the key, meaning the employee, but it actually depends on employee and type, so a 1:1 relationship is perfectly normal, and desirable in this case. It also prevents overly sparse tables, if I have 10 columns that are normally filled, but 20 additional columns only for certain types.
The most common scenario I can think of is when you have BLOB's. Let's say you want to store large images in a database (typically, not the best way to store them, but sometimes the constraints make it more convenient). You would typically want the blob to be in a separate table to improve lookups of the non-blob data.
In terms of pure science, yes, they are useless.
In real databases it's sometimes useful to keep a rarely used field in a separate table: to speed up queries using this and only this field; to avoid locks, etc.
Rather than using views to restrict access to fields, it sometimes makes sense to keep restricted fields in a separate table to which only certain users have access.
I can also think of situations where you have an OO model in which you use inheritance, and the inheritance tree has to be persisted to the DB.
For instance, you have a class Bird and Fish which both inherit from Animal.
In your DB you could have an 'Animal' table, which contains the common fields of the Animal class, and the Animal table has a one-to-one relationship with the Bird table, and a one-to-one relationship with the Fish table.
In this case, you don't have to have one Animal table which contains a lot of nullable columns to hold the Bird and Fish-properties, where all columns that contain Fish-data are set to NULL when the record represents a bird.
Instead, you have a record in the Birds-table that has a one-to-one relationship with the record in the Animal table.
1-1 relationships are also necessary if you have too much information. There is a record size limitation on each record in the table. Sometimes tables are split in two (with the most commonly queried information in the main table) just so that the record size will not be too large. Databases are also more efficient in querying if the tables are narrow.
In SQL it is impossible to enforce a 1:1 relationship between two tables that is mandatory on both sides (unless the tables are read-only). For most practical purposes a "1:1" relationship in SQL really means 1:0|1.
The inability to support mandatory cardinality in referential constraints is one of SQL's serious limitations. "Deferrable" constraints don't really count because they are just a way of saying the constraint is not enforced some of the time.
It's also a way to extend a table which is already in production with less (perceived) risk than a "real" database change. Seeing a 1:1 relationship in a legacy system is often a good indicator that fields were added after the initial design.
Most of the time, designs are thought to be 1:1 until someone asks "well, why can't it be 1:many"? Divorcing the concepts from one another prematurely is done in anticipation of this common scenario. Person and Address don't bind so tightly. A lot of people have multiple addresses. And so on...
Usually two separate object spaces imply that one or both can be multiplied (x:many). If two objects were truly, truly 1:1, even philosophically, then it's more of an is-relationship. These two "objects" are actually parts of one whole object.
If you're using the data with one of the popular ORMs, you might want to break up a table into multiple tables to match your Object Hierarchy.
I have found that when I do a 1:1 relationship its totally for a systemic reason, not a relational reason.
For instance, I've found that putting the reserved aspects of a user in 1 table and putting the user editable fields of the user in a different table allows logically writing those rules about permissions on those fields much much easier.
But you are correct, in theory, 1:1 relationships are completely contrived, and are almost a phenomenon. However logically it allows the programs and optimizations abstracting the database easier.
extended information that is only needed in certain scenarios. in legacy applications and programming languages (such as RPG) where the programs are compiled over the tables (so if the table changes you have to recompile the program(s)). Tag along files can also be useful in cases where you have to worry about table size.
Most frequently it is more of a physical than logical construction. It is commonly used to vertically partition a table to take advantage of splitting I/O across physical devices or other query optimizations associated with segregating less frequently accessed data or data that needs to be kept more secure than the rest of the attributes on the same object (SSN, Salary, etc).
The only logical consideration that prescribes a 1-1 relationship is when certain attributes only apply to some of the entities. However, in most cases there is a better/more normalized way to model the data through entity extraction.
The best reason I can see for a 1:1 relationship is a SuperType SubType of database design. I created a Real Estate MLS data structure based on this model. There were five different data feeds; Residential, Commercial, MultiFamily, Hotels & Land.
I created a SuperType called property that contained data that was common to each of the five separate data feeds. This allowed for very fast "simple" searches across all datatypes.
I create five separate SubTypes that stored the unique data elements for each of the five data feeds. Each SuperType record had a 1:1 relationship to the appropriate SubType record.
If a customer wanted a detailed search they had to select a Super-Sub type for example PropertyResidential.
In my opinion a 1:1 relationship maps a class Inheritance on a RDBMS.
There is a table A that contains the common attributes, i.e. the partent class status
Each inherited class status is mapped on the RDBMS with a table B with a 1:1 relationship
to A table, containing the specialized attributes.
The table namend A contain also a "type" field that represents the "casting" functionality
Bye
Mario
You can create a one to one relationship table if there is any significant performance benefit. You can put the rarely used fields into separate table.
1:1 relationships don't really make sense if you're into normalization as anything that would be 1:1 would be kept in the same table.
In the real world though, it's often different. You may want to break your data up to match your applications interface.
Possibly if you have some kind of typed objects in your database.
Say in a table, T1, you have the columns C1, C2, C3… with a one to one relation. It's OK, it's in normalized form. Now say in a table T2, you have columns C1, C2, C3, … (the names may differ, but say the types and the role is the same) with a one to one relation too. It's OK for T2 for the same reasons as with T1.
In this case however, I see a fit for a separate table T3, holding C1, C2, C3… and a one to one relation from T1 to T3 and from T2 to T3. I even more see a fit if there exist another table, with which there already exist a one to multiple C1, C2, C3… say from table A to multiple rows in table B. Then, instead of T3, you use B, and have a one to one relation from T1 to B, the same for from T2 to B, and still the same one to multiple relation from A to B.
I believe normalization do not agree with this, and that may be an idea outside of it: identifying object types and move objects of a same type to their own storage pool, using a one to one relation from some tables, and a one to multiple relation from some other tables.
It is unnecessary great for security purposes but there better ways to perform security checks. Imagine, you create a key that can only open one door. If the key can open any other door, you should ring the alarm. In essence, you can have "CitizenTable" and "VotingTable". Citizen One vote for Candidate One which is stored in the Voting Table. If citizen one appear in the voting table again, then their should be an alarm. Be advice, this is a one to one relationship because we not refering to the candidate field, we are refering to the voting table and the citizen table.
Example:
Citizen Table
id = 1, citizen_name = "EvryBod"
id = 2, citizen_name = "Lesly"
id = 3, citizen_name = "Wasserman"
Candidate Table
id = 1, citizen_id = 1, candidate_name = "Bern Nie"
id = 2, citizen_id = 2, candidate_name = "Bern Nie"
id = 3, citizen_id = 3, candidate_name = "Hill Arry"
Then, if we see the voting table as so:
Voting Table
id = 1, citizen_id = 1, candidate_name = "Bern Nie"
id = 2, citizen_id = 2, candidate_name = "Bern Nie"
id = 3, citizen_id = 3, candidate_name = "Hill Arry"
id = 4, citizen_id = 3, candidate_name = "Hill Arry"
id = 5, citizen_id = 3, candidate_name = "Hill Arry"
We could say that citizen number 3 is a liar pants on fire who cheated Bern Nie. Just an example.
When you are dealing with a database from a third party product, then you probably don't want to alter their database as to prevent tight coupling. but you may have data that corresponds 1:1 with their data
Anywhere were two entirely independent entities share a one-to-one relationship. There must be lots of examples:
person <-> dentist (its 1:N, so its wrong!)
person <-> doctor (its 1:N, so it's also wrong!)
person <-> spouse (its 1:0|1, so its mostly wrong!)
EDIT: Yes, those were pretty bad examples, particularly if I was always looking for a 1:1, not a 0 or 1 on either side. I guess my brain was mis-firing :-)
So, I'll try again. It turns out, after a bit of thought, that the only way you can have two separate entities that must (as far as the software goes) be together all of the time is for them to exist together in higher categorization. Then, if and only if you fall into a lower decomposition, the things are and should be separate, but at the higher level they can't live without each other. Context, then is the key.
For a medical database you may want to store different information about specific regions of the body, keeping them as a separate entity. In that case, a patient has just one head, and they need to have it, or they are not a patient. (They also have one heart, and a number of other necessary single organs). If you're interested in tracking surgeries for example, then each region should be a unique separate entity.
In a production/inventory system, if you're tracking the assembly of vehicles, then you certainly want to watch the engine progress differently from the car body, yet there is a one to one relationship. A care must have an engine, and only one (or it wouldn't be a 'car' anymore). An engine belongs to only one car.
In each case you could produce the separate entities as one big record, but given the level of decomposition, that would be wrong. They are, in these specific contexts, truly independent entities, although they might not appear so at a higher level.
Paul.