In which normal form are these FDs? - sql

I've been trying to figure out the difference between the 2nd and 3rd Normal Form using this example. The definitions didn't do the trick for me...
These are the functional dependencies:
A is the candidate key. (A --> A,B,C,D)
FDs:
A --> CD
AC --> D
CD --> B
D --> B
My idea: it's in 1st and 2nd, but not in 3rd Normal form because A, the candidate key, doesn't consist of two or more columns. But B is transitively dependent on D. So it's not in 3rd.
Ist that correct? Especially the argument that A consits of less than two columns?

First, let us see what 2NF and 3NF are. From the context of the question it is clear that 1NF is understood, so I will refer to it. If it is unclear as well, let me know, I will clarify that as well.
2NF: R is in second normal form, if and only if it is in first normal form and no non-prime attribute is dependent on any proper subset of any candidate key of the relation.
non-prime attributes are attributes which are not part of any candidate keys. So, if a non-prime attribute can be determined by a functional dependency which holds a non-whole subset of a candidate key, then the relation is not in 2NF.
For example, let's consider an invoices(number, year, age) table where (number, year) is a candidate key. age can be determined by the year alone, so the table is not in 2NF.
In your case, since the key is one dimensional, assuming it is in 1NF, we can say it is in 2NF as well. However, it is in 3NF if and only if it is in 2NF and every non-prime attribute is non transitively dependent on every key.
In your case, A is the key, but since
A -> D -> B
B is transitively dependent on A, so your table is not in 3NF. To achieve 3NF, you will need to create another table, which will be in relation with this one via D and will hold B. Possible solution:
T1(A, C, D)
T2(D, B)
Note, that AC -> D and A -> CD are trivial, since A is the candidate key and the candidate key determines everything else. If that's not the case, you will need to take a look at 1NF as well.

Related

Eliminating Transitive Functional Dependencies

(Primary keys in bold.)
In one of my lectures, we took the following schema
R(a, b, c, d, e)
a -> b
e -> b
c, d -> a
c, d -> b
c, d -> e
and took it to 2NF as follows:
R1(c, d, a, e)
c, d - > a and e
R2(a, e, b) (Not in 2NF)
a -> b
e -> b
Naturally, if I want to take my schema to 3NF this causes a problem, since b cannot be partially determined by a and e. What I want to do is simply create separate relations as follows:
R3(e, b)
e -> b
and
R4(a, b)
a -> b
In this instance b is fully functionally dependent the primary key, which brings me to 2NF and the transative dependencies are eleminated for relations 3 and 4, which are in 3NF. However I think it could be argued that this solution is not satisfactory as the value of b could potentially be different for each relation and as there could be anomalies when it is inevitably used as a foriegn key. Any thoughts on this?
We seek decompositions "preserving" FDs and (this is usually not stated explicitly) not introducing other constraints. An FD is preserved when it holds in some component. The idea is that we can check that an FD holds in recompositions by just checking that it holds in the components, rather than having to join then check. We also prefer an FD and its attibutes to be in just one component, or we would need to add a constraint that where the determinant values agree the dependent values agree. There's always a 3NF schema preserving all FDs without introducing other constraints. When an FD cannot be preserved to get to BCNF, there is instead an "equality dependency" introduced that two components must have the same projection on the FD attributes.
We don't normalize to a given NF by moving through lower NFs. That can preclude good higher NF designs arising. We use an algorithm for a given NF.
When some FDs (functional dependencies) hold, others do, per Armstrong's axioms. We must look among all FDs for NF violators and FDs to preserve, not just some given ones that form a cover. Algorithms also take that into account.
See this recent answer.
PS PKs (primary keys) don't matter, CKs (candidate keys) do. There can be more than one and they can be composite. A PK is just some CK you decided to call PK. So highlighting attributes of a PK is in general inadequate. Just list the CKs.
PPS An (update) anomaly is a certain thing, and it's not what you are using "anomaly" for.

2nf second normal form difficult exercise

I have R(A,B,C,D) with AB
primary key and AD --> C
I think it is in 2nf becouse you cannot determine C with a subset of AB
from wiki "a table is in 2NF if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table"
but many people say it is in 1nf because the definition
"in 2NF if it is in 1NF and every non-prime attribute of the table is dependent on the whole of every candidate key"
so AD is not the whole primary key but just a part with another attribute not prime
please if you can put also some refereces different of wikipedia so I can demostrate my thesis if it is really correct
You state as a fact that AB is the primary key for the given relation R. For that to be true there have to be at least one more functional dependency other than AD->C .
In order to explain 2NF, I assume that the missing FD is say B->D. So we have a relation R(A,B,C,D) with FD's :
AD->C
B->D
Then our primary key is AB. Now in simple words 2NF deals with partial dependency, that is, when an attribute depends on part of the primary key. (So if we have a primary key that's just one attribute than the relation R is already in 2NF!)
Formally:
Given a functional dependency X->A of a relation R where:
X is a set of attributes of R
A is a non-prime attribute not in X
then to be in 2NF, X should not be a proper subset of any key.
Coming back to our example. Primary key is AB. So primary attributes are A and B. Non primary attributes are C and D.
Let's consider the first FD, AD->C
Here C is a non primary attribute. To not violate 2NF condition, AD should not be a proper subset of the primary key AB. AD is not a proper subset of AB, so it does not violate 2NF condition.
Let's see the next FD, B->D
Here D is a non primary attribute and B is a proper subset of primary key AB and therefore it violates 2NF condition.
Hence the relation R is not in second normal form.
On the other hand if the set of FD's for R would have been:
AD->C
AB->D
Our primary key is still AB but now the relation R is in second normal form.

Functional dependencies keys and normal form

I am trying to understand functional dependencies
Let's say we have R with {A,B,C,D,E} and FDs A->B, BC->E and ED->A.
What are the keys and is R in 3NF or BCNF?
The keys here are — ACD, BCD and ECD. Since each attribute of the relation R comes at least once in each of the keys, all the attributes in your relation R are prime attributes.
Note that if a relation has all prime attributes then it is already in 3NF.
Hence the given relation R is in 3NF.
To be in BCNF, for each functional dependency X->Y, X should be a key. We see that the very first dependency ( A->B ) violates this and hence the relation R is not in BCNF.
The keys are — ACD, BCD and ECD.
Prime attributes will be (A,B,C,D,E) because all are coming in primary key.
Note that if a relation has all prime attributes then it is already in 3NF.
Hence the given relation R is in 3NF.
To be in BCNF, for each functional dependency X->Y, X should be a superkey. We see that the very first dependency ( A->B ) violates this and hence the relation R is not in BCNF.
The candidate keys are - ACD,BCD and ECD.
Prime attributes are (A,B,C,D,E) because they are all in primary keys.
Now, first we check the relation for BCNF
For BCNF, in the FD's the left side in the attribute must be a super key and as you can notice that not any FD follows this condition
For 3NF, in the FD's there are two conditions:
1. Either the left side be a super key
2. If the first conditions fails, then the right side of the same FD must be a prime attribute.
if the relation follows these conditions, then it is in 3NF and as we can notice all the attributes are prime attributes, the following relation R is in 3NF but not in BCNF.

Identifying functional dependencies (FDs)

I am working with a table that has a composite primary key composed of two attributes (with a total of 10) in 1NF form.
In my situation a fully functional dependency involves the dependent relying on both attributes in my primary key.
A partial dependency relies on either one of the attributes from the primary key.
A transitive dependency involves two or more non-key attributes in a functional dependence where one of the non-key attributes is dependent on a key attribute from my primary key.
Pulling the transitive dependencies out of the table, seems do this after normalization, but my assignment requires us to identify all functional dependencies before we draw the dependency diagram (after which we normalize the tables). Parenthesis identify the primary key attributes:
(Student ID), Student Name, Student Address, Student Major, (Course ID), Course Title, Instructor ID, Instructor Name, Instructor Office, Student_course_grade
Only one class is taught for each course ID.
Students may take up to 4 courses.
Each course may have a maximum of 25 students.
Each course is taught by only one Instructor.
Each student may have only one major.
From your question it seems that you do not have a clear understanding of basics.
Application relationships & situations
First you have to take what you were told about your application (including business rules) and identify the application relationships (aka associations) (aka relations, in the math sense of association). Each gets a (base) table (aka relation, in the math sense of associated tuples) variable. Such an application relationship can be characterized by a row membership criterion (aka meaning) (aka predicate) that is a statement template. Eg suppose criterion student [si] takes course [ct] has table variable TAKES. The parameters of the criterion are the columns of its table. We can use a table name with columns (like an SQL declaration) as a shorthand for the criterion. Eg TAKES(si,ct). A criterion plus a row makes a statement (aka proposition) about a situation. Eg row (17,'CS101') gives student 17 takes course 'CS101' ie TAKES(17,'CS101'). Rows that give a true statement go in the table and rows that make a false one stay out.
If we can rephrase a criterion as the AND/conjunction of two others then we only need the tables with those other criteria. This is because NATURAL JOIN is defined so that the NATURAL JOIN of two tables containing the rows making their criteria true returns the rows that make the AND/conjunction of their criteria true. So we can NATURAL JOIN the two tables to get back the original. (This is what normalization is doing by decomposing tables into components.)
/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with title [ct]
from instructor with id [ii] and name [in] and office [io]
with grade [scg]
*/
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with grade [scg]
*/
SG(si,sn,sa,sm,ci,scg)
/* rows where
course [ci] with title [ct]
is taught by instructor with id [ii] and name [in] and office [io]
*/
CI(ci,ct,ii,in,io,scg)
Now by the definition of NATURAL JOIN,
the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg)
are the rows in SG NATURAL JOIN CI.
And since
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
when/iff
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
ie since
the rows where
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
are the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
we have T = SG NATURAL JOIN CI.
Together the application relationships and situations that can arise determine both the rules and constraints! They are just things that are true of every application situation or every database state (ie values of one or more base tables) (which are are a function of the criteria and the possible application situations.)
Then we normalize to reduce redundancy. Normalization replaces a table variable by others whose predicates AND/conjoin together to the original's when this is beneficial.
The only time a rule can tell you something that you don't know already know from the (putative) criteria and (putative) situations is when you don't really understand the criteria or what situations can turn up, and the a priori rules are clarifying something about that. A person giving you rules is already using application relationships that they assume you understand and they can only have determined that a rule holds by using them and all the application situations that can arise (albeit informally)!
(Unfortunately, many presentations of information modeling don't even mention application relationships. Eg: If someone says "there is a X:Y relationship" then they must already have in mind a particular binary application relationship between entities; knowing it and what application situations can arise, they are reporting that it has a certain cardinality in a certain direction. This will correspond to some application relationship, represented by (a projection of) a table using column sets that identify entities. Plus some presentations/methods call FKs "relationships"--confusing them with those relationships.)
Check out "fact-based" information modeling methods Object-Role Modeling or (its predecessor) NIAM.
FDs & CKs
Given the criterion for putting rows into or leaving them out of a table and all possible situations that can arise, only some values (sets of rows) can ever be in a table variable.
For every subset of columns you need to decide which other columns can only have one value for a given subrow value for those columns. When it can only have one we say that the subset of columns functionally determines that column. We say that there is a FD (functional dependency) columns->column. This is when we can express the table's predicate as "... AND column=F(columns)" for some function F. (F is represented by the projection of the table on the column & columns.) But every superset of that subset will also functionally determine it, so that cuts down on cases. Conversely, if a given set does not determine a column then no subset of the set does. Applying Armstrong's axioms gives all the FDs that hold when given FDs hold. (Algorithms & software are available to apply them & determine FD closures & covers.) Also, you may think in terms of column sets being unique; then all other columns are functionally dependent on that set. Such a set is called a superkey.
Only after you have determined the FDs can you determine the CKs (candidate keys)! A CK is a superkey that contains no smaller superkey. (That a CK and/or superkey is present is also a constraint.) We can pick a CK as PK (primary key). PKs have no other role in relational theory.
A partial dependency relies on either one of the attributes from the
Primary key.
Don't use "involve" or "relies on" to give a definition. Say, "when" or "iff" ("if and only if").
Read a definition. A FD that holds is partial when/iff using a proper subset of the determinant gives a FD that holds with the same determined column; otherwise it is full. Note that this does not involve CKs. A relation is in 2NF when all non-prime attributes are fully functionally dependent on every CK.
A transitive dependency involves two or more non-key attributes in a
functional dependence where one of the non-key attributes is dependent
on a key attribute (from my PK).
Read a definition. S -> T is transitive when/iff there is an X where S -> X and X -> T and not (X -> S) and not (X = T). Note that this does not involve CKs. A relation is in 3NF when all non-prime attributes are non-transitively dependent on every CK.
"1NF" has no single meaning.
I am inferring a functional dependency that was not listed in your business rules. Namely that instructor ID determines instructor name.
If this is true, and if you have both instructor ID and instructor name in the Course table, then this is not in 3NF, because there is a transitive dependency between Course ID, Instructor ID, and Instructor Name.
Why is this harmful? Because duplicating the instructor name in each course an instructor teaches makes updating an instructor name difficult, and possible to do in an inconsistent manner. Inconsistent instructor name is just another bug you have to watch out for, and 3NF obviates the problem. The same argument could be made for Instructor office.

Misconception of what superkey or Boyce Codd Normal form is

At 9:34 in this video the speaker says that all 3 functional dependencies are in Boyce Codd Normal Form. I don't believe it because clearly GPA can't determine the SSN, sName, address and all other attributes in the student table. Either I'm confused about the definition of Boyce Codd Normal Form or what a super key is? Does it only have to be able to uniquly identify certain attributes, not all attributes in the schema? For example GPA does determine priority (which is on the right side of the functional dependency) but not everything else.
For example if I had the relation R(A,B,C,D) and the FDs A->B would we say A is a superkey for B but I thought a super key is for the whole table? To add to my confusion I know for BCNF it can be a (primary) key but you can only have on primary key for the table. Ugh my brain hurts.
"... the speaker says that all 3 functional dependencies are in Boyce Codd Normal Form."
To be in BC normal form is a property that can be had by RELATIONS (relation variables, more specifically, or relation schemas, if that term suits you better), not by functional dependencies. If you find someone talking so sloppily of normalization theory, leave and move onto more accurate explanations.
Whether or not a relation variable is indeed in BC normal form, depends on which functional dependencies are supposed to hold in it. That is why it is utter nonsense to say that functional dependencies are or are not in BC normal form.
"I don't believe it because clearly GPA can't determine the SSN, sName, address and all other attributes in the student table. Either I'm confused about the definition of Boyce Codd Normal Form or what a super key is? Does it only have to be able to uniquly identify certain attributes, not all attributes in the schema?"
An irreducible candidate key is that set (not necessarily unique) of attributes of the relation schema that is guaranteed to have unique combinations of attribute values in whatever relation values could validly appear in the relation variable in the database.
In your (A,B,C,D) example, if A->B is the only FD that holds, then the only candidate key is {A,C,D}.
"For example if I had the relation R(A,B,C,D) and the FDs A->B would we say A is a superkey for B"
It is sloppy and confusing to talk of A as being the "key" for B in such a case. People who pretend to be teaching others ought to know this, and people who don't, ought not engage in any teaching until they do know this. It would be better to talk of A as the "determinant" for B in such contexts. The term "key" in the context of relational database design has a very well-defined and precise meaning, and using the same term for other meanings merely confuses people. As evidenced by your question.
"but I thought a super key is for the whole table?"
Yes you thought right.
Back to your (A,B,C,D) example. If we were to split that design into (A,B) and (A,C,D), then we would have a relation schema -the (A,B) one- of which we can say that "{A} is a key" in that schema.
That is actually precisely what the FD A->B means : if you take the projection -of the relation value that would appear in the database in the (A,B,C,D) schema- over the attributes {A,B}, then you should be getting a relation in which no A value appears twice (if it did, then that A value would correspond to >1 distinct B value, meaning that A could not possibly be a determinant for B after all).
"To add to my confusion I know for BCNF it can be a (primary) key but ..."
Now you are being sloppy yourself. What does "it" refer to ?