How to normalizate a schema in 2NF and 3NF? - schema

If I have the schema R=(A,B,C,D,E) and the set of functional dependences F={AE->B, AB->C, AC->D}. The key is AE.
I should transform it in 2NF and 3NF.
I did this:
R1=(A,C,D)
R2=(A,E,B)
But I am not sure it's ok because I lost the depedance AB->C

Related

Database; 2NF to 3NF questions about (Primary) Keys

I'm currently learning about databases and 1NF, 2NF and 3NF. So far 1NF is easy, but I get confused going to 2NF and 3NF regaring Primary Keys and foreign keys.
I've read a lot of posts on here but I couldn't figure it out.
In these pictures is a mock up of the database and tables I want to create. I tried to make it as far as I could in 2NF and 3NF (both shown there)
https://imgur.com/a/PPvwaLY
Questions:
Could a foreign key be part of a Composite Primary Key like I did with 'member_number' & 'email_nr' for instance
in regards to question 1: This gives me a unique key, but is it possible for SQL to increment 1 column depending on the value of another column, both being used as part of a Composite Key? (example in picture -> member_number 1234 is coupled with email_nr 1 and 2 for unique ID's, but when I would add another email to the list, could it automatically create the couple 1234 & 3?)
Is there anything other then the Keys that I maybe did wrong?
Sorry if this looks a lot like questions previously asked/answered but I could not find an answer to this...

When does repeating data get removed: before/after normalization?

Given this logical design:
R(a,b, c, d)
a is the only key. I can't underline it using this editor.
a->b
a->c
a->d
It's in BCNF because there are no composite keys, no transitive dependencies, and no partial key dependencies.
However, we still have repeating data across rows in the attributes b, c, and d.
Do we introduce surrogate keys and rewrite it this way:
R(a, bID, cID, dID)
R1(bID, b)
R2(cID, c)
R3(dID, d)
if so, does that happen before or after normalization?
The point of normalization is not to remove repetition. It is to remove inappropriate dependencies. If every non-key attribute is fully functionally dependent on the primary key (and nothing else) then it doesn't matter for purposes of normalization that from one row to another in a table that some column data may be the same. That sameness is incidental.
Here is the thing you have to think about when looking at repetition and deciding whether it is incidental or meaningful. Consider the case of an update to a non-key column.
In one scenario, let's say that the non-key column is a person's name. What happens in your system when someone changes their name? If the old value is "Doug" and the new value is "Bob" do you want every instance of "Doug" to be replaced by "Bob"? Maybe you do, but I'm guessing you probably don't. If you were to create surrogate keys and normalize out the non-key value to another table then you would be incorrectly changing values that you don't mean to change.
In another scenario, let's say the non-key column is a municipality name. What happens in your system when you change a municipality name? Let's say the old value is "New Berlin" and the new value is "Kitchener". Would you want every instance of "New Berlin" to become "Kitchener"? Maybe so. (perhaps not, it depends on your business rules) = If you do want to change every instance then what you've discovered is that the municipality name may not be fully functionally dependent on your primary key. In that case you should normalize it out to a new table.
You have asked when this should happen (before or after normalization). The answer is that it happens as part of the normalization. The act of moving data off into a separate relation in order to avoid a partial or transitive functional dependency is itself the act of normalizing your database schema. Is this part of 2NF or 3NF? It depends. If your non-key attribute is partially functionally dependent on the key then it's during 2NF. If it's transitively dependent (i.e. dependent on another non-key attribute or attributes) then it's during 3NF.
You should perform normalization as part of the logical modeling process as much as possible. When you get to the physical model you are more likely to introduce denormalization for one or another of some practical reasons. Denormalization (in transaction processing systems) is something you should generally do only when you find that you have to. 3NF or higher is a good stake in the ground for OLTP systems. Therefore, you will have built your logical and your physical schemas before you start denormalizing in most cases.

Database Schema Normalization Violation

To my knowledge the following doesn't violate 1NF or 2NF. Is 3NF violated because LocationID and LocationName isn't separated into a different table?
Department (DNum PK, DName, LocationID, LocationName)
You can't be sure whether a relation is normalized or not based only on a list of attribute names. What really matter are the dependencies that apply to those attributes. For example, given the following set of dependencies
F: {DNum}->{LocationID}->{DNum,DName,LocationName}
we can say that Department satisfies 3NF (and therefore 2NF) with respect to F because both DNum and LocationID should be keys for Department and there are no functional dependencies other than the ones implied by those keys. The choice of primary key is irrelevant as far as normalization is concerned because a relation may have more than one key and all the keys are equally significant.
Alternatively, given the following set of dependencies
G: {DNum}->{DName,LocationID,LocationName}, {LocationID}->{LocationName}
the Department relation violates 3NF with respect to G because LocationID is a non-key determinant.
Yes, 3NF is violated.
3NF requires 2NF and that no non-key field should depend on another non-key field.
LocationName is dependent on LocationID as (I'm assuming that) LocationID is a PK and that's the violation.
I earlier explained 1NF, 2NF and 3NF in this another answer.

Identifying relationships losing meaning in relating entity

So as you can see I have an Identifying 1 to many relationship in the tables above.
If I was to change this relationship to a Identifying 1 to 1 relationship, then the auto_leads table will still contain two composite primary keys from its parent leads table. In other words, nothing will change.
Does an identifying relationship have any meaning in the context of relational models? It doesnt appear to change its effect with respect to relationships.
Identifying relationship is an ER-modelling concept which arises because ER modelling assumes there is some semantic significance to having a primary key for each entity. Primary keys have no special role in relational database design and therefore the concept of an identifying relationship is usually of no great importance.
Consider the example of a table with two candidate keys, A and B. A is also a foreign key. According to ER-modelling convention if A is chosen as a primary key then the foreign key relationship is an identifying one. If A is an alternate key then the relationship is deemed to be non-identifying. Yet the form, function, integrity constraints and presumably the business meaning is exactly the same in both cases. The concept of identifying relationships is only as important as you want it to be.

Misconception of what superkey or Boyce Codd Normal form is

At 9:34 in this video the speaker says that all 3 functional dependencies are in Boyce Codd Normal Form. I don't believe it because clearly GPA can't determine the SSN, sName, address and all other attributes in the student table. Either I'm confused about the definition of Boyce Codd Normal Form or what a super key is? Does it only have to be able to uniquly identify certain attributes, not all attributes in the schema? For example GPA does determine priority (which is on the right side of the functional dependency) but not everything else.
For example if I had the relation R(A,B,C,D) and the FDs A->B would we say A is a superkey for B but I thought a super key is for the whole table? To add to my confusion I know for BCNF it can be a (primary) key but you can only have on primary key for the table. Ugh my brain hurts.
"... the speaker says that all 3 functional dependencies are in Boyce Codd Normal Form."
To be in BC normal form is a property that can be had by RELATIONS (relation variables, more specifically, or relation schemas, if that term suits you better), not by functional dependencies. If you find someone talking so sloppily of normalization theory, leave and move onto more accurate explanations.
Whether or not a relation variable is indeed in BC normal form, depends on which functional dependencies are supposed to hold in it. That is why it is utter nonsense to say that functional dependencies are or are not in BC normal form.
"I don't believe it because clearly GPA can't determine the SSN, sName, address and all other attributes in the student table. Either I'm confused about the definition of Boyce Codd Normal Form or what a super key is? Does it only have to be able to uniquly identify certain attributes, not all attributes in the schema?"
An irreducible candidate key is that set (not necessarily unique) of attributes of the relation schema that is guaranteed to have unique combinations of attribute values in whatever relation values could validly appear in the relation variable in the database.
In your (A,B,C,D) example, if A->B is the only FD that holds, then the only candidate key is {A,C,D}.
"For example if I had the relation R(A,B,C,D) and the FDs A->B would we say A is a superkey for B"
It is sloppy and confusing to talk of A as being the "key" for B in such a case. People who pretend to be teaching others ought to know this, and people who don't, ought not engage in any teaching until they do know this. It would be better to talk of A as the "determinant" for B in such contexts. The term "key" in the context of relational database design has a very well-defined and precise meaning, and using the same term for other meanings merely confuses people. As evidenced by your question.
"but I thought a super key is for the whole table?"
Yes you thought right.
Back to your (A,B,C,D) example. If we were to split that design into (A,B) and (A,C,D), then we would have a relation schema -the (A,B) one- of which we can say that "{A} is a key" in that schema.
That is actually precisely what the FD A->B means : if you take the projection -of the relation value that would appear in the database in the (A,B,C,D) schema- over the attributes {A,B}, then you should be getting a relation in which no A value appears twice (if it did, then that A value would correspond to >1 distinct B value, meaning that A could not possibly be a determinant for B after all).
"To add to my confusion I know for BCNF it can be a (primary) key but ..."
Now you are being sloppy yourself. What does "it" refer to ?