Eliminating Transitive Functional Dependencies - schema

(Primary keys in bold.)
In one of my lectures, we took the following schema
R(a, b, c, d, e)
a -> b
e -> b
c, d -> a
c, d -> b
c, d -> e
and took it to 2NF as follows:
R1(c, d, a, e)
c, d - > a and e
R2(a, e, b) (Not in 2NF)
a -> b
e -> b
Naturally, if I want to take my schema to 3NF this causes a problem, since b cannot be partially determined by a and e. What I want to do is simply create separate relations as follows:
R3(e, b)
e -> b
and
R4(a, b)
a -> b
In this instance b is fully functionally dependent the primary key, which brings me to 2NF and the transative dependencies are eleminated for relations 3 and 4, which are in 3NF. However I think it could be argued that this solution is not satisfactory as the value of b could potentially be different for each relation and as there could be anomalies when it is inevitably used as a foriegn key. Any thoughts on this?

We seek decompositions "preserving" FDs and (this is usually not stated explicitly) not introducing other constraints. An FD is preserved when it holds in some component. The idea is that we can check that an FD holds in recompositions by just checking that it holds in the components, rather than having to join then check. We also prefer an FD and its attibutes to be in just one component, or we would need to add a constraint that where the determinant values agree the dependent values agree. There's always a 3NF schema preserving all FDs without introducing other constraints. When an FD cannot be preserved to get to BCNF, there is instead an "equality dependency" introduced that two components must have the same projection on the FD attributes.
We don't normalize to a given NF by moving through lower NFs. That can preclude good higher NF designs arising. We use an algorithm for a given NF.
When some FDs (functional dependencies) hold, others do, per Armstrong's axioms. We must look among all FDs for NF violators and FDs to preserve, not just some given ones that form a cover. Algorithms also take that into account.
See this recent answer.
PS PKs (primary keys) don't matter, CKs (candidate keys) do. There can be more than one and they can be composite. A PK is just some CK you decided to call PK. So highlighting attributes of a PK is in general inadequate. Just list the CKs.
PPS An (update) anomaly is a certain thing, and it's not what you are using "anomaly" for.

Related

Table Normalization, Attributes that have no dependencies

Hi this is my first time posting so sorry if formatting is bad.
I have a question regarding normalization, I came across a problem in homework and I am having issues finding any information on it. This is not in a specific database language but we are working with sql.
I am giving a table like T( A,B,C,D,E,F,G)
and we are given the following dependencies A+B----> C; A----> D, E; D---->E
we are then asked to give the primary keys followed by a 2NF and 3NF
this was my answer originally
for 2nf (A, B, C); (A, D, E)
and 3nf (A, B, C); (A, D, E); (D, E) with primary keys being A and B
I then noticed the table contained two attributes with no relation F and G
now this is where I am having trouble, would these be considered candidate keys or are they dropped in a 2nf or 3nf for possible redundancy? Since we aren't given actual data it is hard for me to understand why a table would have two attributes that have no relation to a primary key.
Thanks

In which normal form are these FDs?

I've been trying to figure out the difference between the 2nd and 3rd Normal Form using this example. The definitions didn't do the trick for me...
These are the functional dependencies:
A is the candidate key. (A --> A,B,C,D)
FDs:
A --> CD
AC --> D
CD --> B
D --> B
My idea: it's in 1st and 2nd, but not in 3rd Normal form because A, the candidate key, doesn't consist of two or more columns. But B is transitively dependent on D. So it's not in 3rd.
Ist that correct? Especially the argument that A consits of less than two columns?
First, let us see what 2NF and 3NF are. From the context of the question it is clear that 1NF is understood, so I will refer to it. If it is unclear as well, let me know, I will clarify that as well.
2NF: R is in second normal form, if and only if it is in first normal form and no non-prime attribute is dependent on any proper subset of any candidate key of the relation.
non-prime attributes are attributes which are not part of any candidate keys. So, if a non-prime attribute can be determined by a functional dependency which holds a non-whole subset of a candidate key, then the relation is not in 2NF.
For example, let's consider an invoices(number, year, age) table where (number, year) is a candidate key. age can be determined by the year alone, so the table is not in 2NF.
In your case, since the key is one dimensional, assuming it is in 1NF, we can say it is in 2NF as well. However, it is in 3NF if and only if it is in 2NF and every non-prime attribute is non transitively dependent on every key.
In your case, A is the key, but since
A -> D -> B
B is transitively dependent on A, so your table is not in 3NF. To achieve 3NF, you will need to create another table, which will be in relation with this one via D and will hold B. Possible solution:
T1(A, C, D)
T2(D, B)
Note, that AC -> D and A -> CD are trivial, since A is the candidate key and the candidate key determines everything else. If that's not the case, you will need to take a look at 1NF as well.

Normalizing & decomposing to BCNF

I've been given the relation and functional dependencies
And am looking to justify what form it is in, and then to transform it into BCNF.
Now I proposed that it was in 3NF, as the second FD is a transitive dependency with a key attribute as its RHS. This second FD also violates BCNF, because C is not a superkey for R.
However - I am unsure how to go about decomposing into BCNF.
If I decompose into;
This voids the first FD, and effectively makes (A,C) the new key - so it doesn't seem correct! Can this relation be converted to BCNF?
Can this relation be converted to BCNF?
Every relation can be converted in BCNF, by applying the “analysis algorithm”, that can be found on any good book on databases.
Note that the relation has two keys, AB and AC, so that all attributes are primes (and for this reason the relation is automatically in 3NF).
You must start by finding all the dependencies that violates the BCNF, in this case only C → B, since C is not a superkey.
Then you decompose the relation in two relations, one contaning C and all the attributes determinates by it (in this case only B), and the other one including all the other attributes plus C.
So the decomposition is actually:
R1(B, C), with key C, with the only (non-trivial) dependency C → B
R2(A, C), with key AC, without (non-trivial) dependencies
Then the decomposition must be repeated for every relation that has some dependency that violates the BCNF, but in this case there is no such relation, because both R1 and R2 are in BCNF.
Finally note that the decomposition does not preserve the dependencies. In fact the dependency AB → C is not preserved in the decomposition.

Functional dependencies keys and normal form

I am trying to understand functional dependencies
Let's say we have R with {A,B,C,D,E} and FDs A->B, BC->E and ED->A.
What are the keys and is R in 3NF or BCNF?
The keys here are — ACD, BCD and ECD. Since each attribute of the relation R comes at least once in each of the keys, all the attributes in your relation R are prime attributes.
Note that if a relation has all prime attributes then it is already in 3NF.
Hence the given relation R is in 3NF.
To be in BCNF, for each functional dependency X->Y, X should be a key. We see that the very first dependency ( A->B ) violates this and hence the relation R is not in BCNF.
The keys are — ACD, BCD and ECD.
Prime attributes will be (A,B,C,D,E) because all are coming in primary key.
Note that if a relation has all prime attributes then it is already in 3NF.
Hence the given relation R is in 3NF.
To be in BCNF, for each functional dependency X->Y, X should be a superkey. We see that the very first dependency ( A->B ) violates this and hence the relation R is not in BCNF.
The candidate keys are - ACD,BCD and ECD.
Prime attributes are (A,B,C,D,E) because they are all in primary keys.
Now, first we check the relation for BCNF
For BCNF, in the FD's the left side in the attribute must be a super key and as you can notice that not any FD follows this condition
For 3NF, in the FD's there are two conditions:
1. Either the left side be a super key
2. If the first conditions fails, then the right side of the same FD must be a prime attribute.
if the relation follows these conditions, then it is in 3NF and as we can notice all the attributes are prime attributes, the following relation R is in 3NF but not in BCNF.

Identifying functional dependencies (FDs)

I am working with a table that has a composite primary key composed of two attributes (with a total of 10) in 1NF form.
In my situation a fully functional dependency involves the dependent relying on both attributes in my primary key.
A partial dependency relies on either one of the attributes from the primary key.
A transitive dependency involves two or more non-key attributes in a functional dependence where one of the non-key attributes is dependent on a key attribute from my primary key.
Pulling the transitive dependencies out of the table, seems do this after normalization, but my assignment requires us to identify all functional dependencies before we draw the dependency diagram (after which we normalize the tables). Parenthesis identify the primary key attributes:
(Student ID), Student Name, Student Address, Student Major, (Course ID), Course Title, Instructor ID, Instructor Name, Instructor Office, Student_course_grade
Only one class is taught for each course ID.
Students may take up to 4 courses.
Each course may have a maximum of 25 students.
Each course is taught by only one Instructor.
Each student may have only one major.
From your question it seems that you do not have a clear understanding of basics.
Application relationships & situations
First you have to take what you were told about your application (including business rules) and identify the application relationships (aka associations) (aka relations, in the math sense of association). Each gets a (base) table (aka relation, in the math sense of associated tuples) variable. Such an application relationship can be characterized by a row membership criterion (aka meaning) (aka predicate) that is a statement template. Eg suppose criterion student [si] takes course [ct] has table variable TAKES. The parameters of the criterion are the columns of its table. We can use a table name with columns (like an SQL declaration) as a shorthand for the criterion. Eg TAKES(si,ct). A criterion plus a row makes a statement (aka proposition) about a situation. Eg row (17,'CS101') gives student 17 takes course 'CS101' ie TAKES(17,'CS101'). Rows that give a true statement go in the table and rows that make a false one stay out.
If we can rephrase a criterion as the AND/conjunction of two others then we only need the tables with those other criteria. This is because NATURAL JOIN is defined so that the NATURAL JOIN of two tables containing the rows making their criteria true returns the rows that make the AND/conjunction of their criteria true. So we can NATURAL JOIN the two tables to get back the original. (This is what normalization is doing by decomposing tables into components.)
/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with title [ct]
from instructor with id [ii] and name [in] and office [io]
with grade [scg]
*/
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with grade [scg]
*/
SG(si,sn,sa,sm,ci,scg)
/* rows where
course [ci] with title [ct]
is taught by instructor with id [ii] and name [in] and office [io]
*/
CI(ci,ct,ii,in,io,scg)
Now by the definition of NATURAL JOIN,
the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg)
are the rows in SG NATURAL JOIN CI.
And since
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
when/iff
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
ie since
the rows where
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
are the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
we have T = SG NATURAL JOIN CI.
Together the application relationships and situations that can arise determine both the rules and constraints! They are just things that are true of every application situation or every database state (ie values of one or more base tables) (which are are a function of the criteria and the possible application situations.)
Then we normalize to reduce redundancy. Normalization replaces a table variable by others whose predicates AND/conjoin together to the original's when this is beneficial.
The only time a rule can tell you something that you don't know already know from the (putative) criteria and (putative) situations is when you don't really understand the criteria or what situations can turn up, and the a priori rules are clarifying something about that. A person giving you rules is already using application relationships that they assume you understand and they can only have determined that a rule holds by using them and all the application situations that can arise (albeit informally)!
(Unfortunately, many presentations of information modeling don't even mention application relationships. Eg: If someone says "there is a X:Y relationship" then they must already have in mind a particular binary application relationship between entities; knowing it and what application situations can arise, they are reporting that it has a certain cardinality in a certain direction. This will correspond to some application relationship, represented by (a projection of) a table using column sets that identify entities. Plus some presentations/methods call FKs "relationships"--confusing them with those relationships.)
Check out "fact-based" information modeling methods Object-Role Modeling or (its predecessor) NIAM.
FDs & CKs
Given the criterion for putting rows into or leaving them out of a table and all possible situations that can arise, only some values (sets of rows) can ever be in a table variable.
For every subset of columns you need to decide which other columns can only have one value for a given subrow value for those columns. When it can only have one we say that the subset of columns functionally determines that column. We say that there is a FD (functional dependency) columns->column. This is when we can express the table's predicate as "... AND column=F(columns)" for some function F. (F is represented by the projection of the table on the column & columns.) But every superset of that subset will also functionally determine it, so that cuts down on cases. Conversely, if a given set does not determine a column then no subset of the set does. Applying Armstrong's axioms gives all the FDs that hold when given FDs hold. (Algorithms & software are available to apply them & determine FD closures & covers.) Also, you may think in terms of column sets being unique; then all other columns are functionally dependent on that set. Such a set is called a superkey.
Only after you have determined the FDs can you determine the CKs (candidate keys)! A CK is a superkey that contains no smaller superkey. (That a CK and/or superkey is present is also a constraint.) We can pick a CK as PK (primary key). PKs have no other role in relational theory.
A partial dependency relies on either one of the attributes from the
Primary key.
Don't use "involve" or "relies on" to give a definition. Say, "when" or "iff" ("if and only if").
Read a definition. A FD that holds is partial when/iff using a proper subset of the determinant gives a FD that holds with the same determined column; otherwise it is full. Note that this does not involve CKs. A relation is in 2NF when all non-prime attributes are fully functionally dependent on every CK.
A transitive dependency involves two or more non-key attributes in a
functional dependence where one of the non-key attributes is dependent
on a key attribute (from my PK).
Read a definition. S -> T is transitive when/iff there is an X where S -> X and X -> T and not (X -> S) and not (X = T). Note that this does not involve CKs. A relation is in 3NF when all non-prime attributes are non-transitively dependent on every CK.
"1NF" has no single meaning.
I am inferring a functional dependency that was not listed in your business rules. Namely that instructor ID determines instructor name.
If this is true, and if you have both instructor ID and instructor name in the Course table, then this is not in 3NF, because there is a transitive dependency between Course ID, Instructor ID, and Instructor Name.
Why is this harmful? Because duplicating the instructor name in each course an instructor teaches makes updating an instructor name difficult, and possible to do in an inconsistent manner. Inconsistent instructor name is just another bug you have to watch out for, and 3NF obviates the problem. The same argument could be made for Instructor office.