Not sure if this consistitues a transitive dependency - sql

I am a bit stuck designing part of a database.
I have a table called Staff. It has different attributes:
StaffID
First Name
Last Name
Job Title
Department Number
Telephone Number
StaffID is the primary key in this table.
My issue however, is that it is possible to find any information based on the telephone number (i.e. each staff member has a different, unique telephone number).
For example, this means that the First Name or Job Title can be found when we have the Phone Number. However, Phone Number is not a primary key, StaffID is.
I am not sure whether this is a transitive dependency and should fixed through 3NF by splitting up the table and having the Staff table without the Phone Number and another table with just StaffID and Telephone Number.

Transitive dependency occurs only if you have indirect relationship between more than 2 attributes that are not part of they key.
In your example, as you explained, the StaffID is part of your dependency, which is fine because it's the primary key.
Also you can look at this question that shows what is wrong with a transitive dependency. It could help put things into perspective.
In your table, if you delete staff member, you delete all the information (rightly so because you don't need it). If you leave phone number in a different table and, for instance, delete entry only in Staff, you're left with a wild phone number. But if your Staff table allowed multiple entries for the same person (but different departments) then the situation would be different.
Other sites that helped me in the past:
https://www.thoughtco.com/transitive-dependency-1019760
https://beginnersbook.com/2015/04/transitive-dependency-in-dbms/
Funnily they always follow the book example : )

In design-theoretical terms, keys are implied by dependencies. If PhoneNumber→StaffID and if StaffID is known to be a key then we can infer that PhoneNumber is also a key. If that is the case then there is no violation of 3NF because the determinants are all keys. Note that the choice of StaffID as primary key is irrelevant here. Normalization treats all keys as equally significant.
In practical database design however, the question arises as to whether PhoneNumber really makes sense as a key. In other words, would you actually want to enforce dependencies like PhoneNumber→StaffID? If, after consideration, you decide that dependency is not applicable then you could discard that dependency (by not making PhoneNumber a key) and the table would still satisfy 3NF with respect to the set of dependencies you have left.
Here's a reason why a dependency like PhoneNumber→StaffID might not be a realistic choice: when I joined my present company I got a staff ID on my first day; I didn't get a phone number until two days later.

It is not because there is no dependency between phone and name or last name, if you know the name you can't know the phone number, it is not the same as for example, Model and Manufacturer, if you know the model is a mustang then you know the manufacturer is ford, and ther other way around, you know that ford makes mustangs
With the columns you mentioned I would have separate tables for departments and job titles, because they do not depend on the PK StaffID. Think about it as removing potential redundancies, you can have five thousand people in there and have job title as a string repeated one thousand times, that is a signal that it needs its own table (2NF).

Transitive dependency means that you have a (set of) attribute(s) that are completely determined by going from a (set of) attribute(s) A -> B and then from B -> C, while you cannot go from B -> A.
In your case, you do indeed have (StaffId) -> (PhoneNumber) and also (PhoneNumber) -> (StaffId). This means you have A -> B and B -> A and hence at this step you can already rule out the transitive dependency.
If you like, you could say that PhoneNumber would be another candidate for PK.
As a background, the problem with transitive dependencies is this: Assume you have a table consisting of "Book Title" (primary key), "Author" and "Gender of Author". Then you certainly have a transitive dependency BT -> A, A -> GoA, hence BT -> GoA.
Now assume that one of your authors is "Andy Smith", Andy being a short name for Andreas. Andreas goes and changes gender, and is now Andrea. Obviously you do not need to change the name, "Andy" works just fine for "Andrea". But you do have to change the Gender. You have to do it for many entries in your table, i.e. for all books from that author.
In this case, you would fix the problem by creating a new table "Author", obviously, and then you'd have only one row for Andy.
Hope that clears it up. It is easy to see that in your example there is no constellation where you have to change many rows due to a phone number change. It's a simple 1:1 relationship between StaffId and PhoneNumber, no problems whatsoever. Both are candidate keys.

Related

SQL Join to either table, Best way or alternative design

I am designing a database for a system and I came up with the following three tables
My problem is that an Address can belong to either a Person or a Company (or other things in the future) So how do I model this?
I discarded putting the address information in both tables (Person
and Company) because of it would be repeated
I thought of adding two columns (PersonId and CompanyId) to the
Address table and keep one of them null, but then I will need to add
one column for every future relation like this that appears (for
example an asset can have an address where its located at)
The last option that occur to me was to create two columns, one
called Type and other Id, so a pair of values would represent a
single record in the target table, for example: Type=Person,Id=5 and
Type=Company,Id=9 this way I can Join the right table using the type
and it will only be two columns no matter how many tables relate to
this table. But I cannot have constraints which reduce data integrity
I don't know if I am designing this properly. I think this should be a common issue (I've faced it at least three times during this small design in objects like Contact information, etc...) But I could not find many information or examples that would resemble mine.
Thanks for any guidance that you can give me
There are several basic approaches you could take, depending on how much you want to future proof your system.
In general, Has-One relationships are modeled by a foreign key on the owning entity, pointing to the primary key on the owned entity. So you would have an AddressId on both Company and Person,which would be a foreign key to Address.Id. The complexity in your case is how to handle the fact that a person can have multiple addresses. If you are 100% sure that there will only ever be a home and work address, you could put two foreign key columns on Person, but this becomes a big problem if there's a third, fourth, fifth etc. address. The other option is to create a join table, PersonAddress, with three columns a PersonId an AddressId and a AddressType, to indicate whether its a home work or whatever address.

Database Relation Anomalies

I am tasked to find anomalies within this relation. I had identified a few insertion, deletion and update anomalies.
Commission Percentage: the percentage of the total sales made by a salesperson that is paid as commission to that salesperson.
Year of Hire: the year the salesperson was first hired
Department Number: the number of the department where the salesperson works
Manager Name: name of the manager of the department
However, I am confused with a anomalies that I pulled out. Below is the statement:
There can not be a manager with the same name in the company as there is no primary identifier for the manager entity except for the name, which can be a duplicate within the company.
May I know how should I phrase the above statement and under which (update/deletion/insertion) anomaly should I include it in?
Thank you
May I request additional assistance below as well:
How would you change the current design and how does your new design address the problems you have identified with the current design.
My current design is splitting it into 3 relations:
Salesperson(salespersonNumber, salespersonName, commissionPercentage, YearOfHire, deparetmentNumber)
Product(productNumber, productName, unitPrice)
Manager(managerNumber, managerName, departmentNumber)
However, I am missing out quantity entity.
Quantity requires composite key of productNumber & salespersonNumber.
Should I make it in another relation by itself?
Quantity(productNumber, salespersonNumber)
Anomalies
When identifying identifying (potential) anomalies, you're listing dependent attributes that are affected by the anomalies (you forgot Salesperson Name, btw). Specifically, you listed attributes that depended on a subset of the key (Salesperson Number, Product Number), thus violating 2NF. You're on the right track.
However, be careful not to confuse attributes with anomalies. An update anomaly would be if 1 of the 3 instances of Bilstein got changed. The (assumed) functional dependency Salesperson Name depends on Salesperson Number would be broken and the data would be inconsistent (Salesperson Number 437 would be associated with more than one name). Remember that normalization aims to eliminate redundant associations.
Identity
The problem with identifying managers by name indicates a poor modeling decision. As you stated, a company's set of managers isn't uniquely identified by name, so there's a mismatch between the logical data model and the world it models. This won't cause insert, update or delete anomalies as long as we use different values for different managers, but it will prevent convenient identification of managers. Possible improvements would be to use multiple attributes (abstract domains are often easily identified by a combination of attributes, but natural domains like people usually aren't, e.g. Manager Name, Birthdate would be more identifying but still not a good solution), turn the Manager Name into a surrogate key (e.g. Scott1, Scott2), or introduce a new surrogate key (e.g. a numeric ID).
Proposed improvement
Your proposed design normalizes the original table as well as addressing the identification problem. It's a good answer except for two issues: it doesn't include the association between Salesperson and Manager, and in your Quantity relation, you forgot to include the quantity as a dependent attribute.
Good job so far, hope this helps.

How To Interpret This Diagram

I am in a database course this semester and I'm trying to figure out how to interpret this diagram
I know the key symbols represent either a primary or foreign key, but I can't tell which ones are which. I think the tables that have the 2 perpendicular lines have at least one foreign key from the table where the line came from, but I am not 100% sure. That's about all I (think) I understand.
What I really need is someone to either tell me the name of this type of diagram and/or how to interpret it so that I can write the SQL script to represent it.
A Key symbol mean Primary Key or PK.
Foreign Key (FK) doesn't have any symbol but you can guess. For example student.dept_name is FK from department.dept_name
The arrow go from department to student mean one department have 0 to N students
They are two symbols starting the line one with a circle and another one doble lines. My guess one is 0 .. N and the other 1 .. N but without know how you make that diagram can't be sure.
This diagram is call ER or Entity Relationship
Each box is a table or Entity you have to create in your script, then create PK, and than define FK.
This is what I'm seeing in the diagram, although it may not be perfect because not all the keys have their direct lines to their primary sources. The >O or O< indicates there are required to be many of whatever it's against. An example is there are many students in a department. The O| or |O indicates there must be 1 and only 1. Each student must be registered with a department, but they can only be registered with one department. The || indicates a one to many plurality. Each course can have one or many sections (typically determined by how many students wish to attend that course).
For the issue of the keys, there is no apparent distinction between primary and foreign keys. I would assume that in each case, if a key is simply called ID or includes part of the table name (such as department: dept_name), then it is the primary key and all others are foreign keys. Again, somewhat difficult to tell since not all relationships are mapped in this particular diagram (such as teaches/takes and the key set course_id, semester, & year), but in these cases we assume that it's a composite key (values in multiple fields make up a unique record) rather than a single primary key (although there appears to be a single primary key in the section table). In such cases, simply saying section 01 or 01O doesn't mean anything and will likely return as many rows as there are class titles with a section number equivalent to those. You would have to specify course_id = CIT261, sec_id = '01O', semester = 'fall', year = 2015 for the first section of an online CIT261 course during the current semester, which should return a single row.
Another interesting note, it would appear that the advisor table satisfies a many to many relationship, and does not contain a primary key, but another composite key, but it doesn't seem to be a solid model as academic advisors are generally tied to the student via the department. This may be meant to reflect the instructors' TA.
I hope this points you in the right direction.
-C§

Identifying functional dependencies (FDs)

I am working with a table that has a composite primary key composed of two attributes (with a total of 10) in 1NF form.
In my situation a fully functional dependency involves the dependent relying on both attributes in my primary key.
A partial dependency relies on either one of the attributes from the primary key.
A transitive dependency involves two or more non-key attributes in a functional dependence where one of the non-key attributes is dependent on a key attribute from my primary key.
Pulling the transitive dependencies out of the table, seems do this after normalization, but my assignment requires us to identify all functional dependencies before we draw the dependency diagram (after which we normalize the tables). Parenthesis identify the primary key attributes:
(Student ID), Student Name, Student Address, Student Major, (Course ID), Course Title, Instructor ID, Instructor Name, Instructor Office, Student_course_grade
Only one class is taught for each course ID.
Students may take up to 4 courses.
Each course may have a maximum of 25 students.
Each course is taught by only one Instructor.
Each student may have only one major.
From your question it seems that you do not have a clear understanding of basics.
Application relationships & situations
First you have to take what you were told about your application (including business rules) and identify the application relationships (aka associations) (aka relations, in the math sense of association). Each gets a (base) table (aka relation, in the math sense of associated tuples) variable. Such an application relationship can be characterized by a row membership criterion (aka meaning) (aka predicate) that is a statement template. Eg suppose criterion student [si] takes course [ct] has table variable TAKES. The parameters of the criterion are the columns of its table. We can use a table name with columns (like an SQL declaration) as a shorthand for the criterion. Eg TAKES(si,ct). A criterion plus a row makes a statement (aka proposition) about a situation. Eg row (17,'CS101') gives student 17 takes course 'CS101' ie TAKES(17,'CS101'). Rows that give a true statement go in the table and rows that make a false one stay out.
If we can rephrase a criterion as the AND/conjunction of two others then we only need the tables with those other criteria. This is because NATURAL JOIN is defined so that the NATURAL JOIN of two tables containing the rows making their criteria true returns the rows that make the AND/conjunction of their criteria true. So we can NATURAL JOIN the two tables to get back the original. (This is what normalization is doing by decomposing tables into components.)
/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with title [ct]
from instructor with id [ii] and name [in] and office [io]
with grade [scg]
*/
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with grade [scg]
*/
SG(si,sn,sa,sm,ci,scg)
/* rows where
course [ci] with title [ct]
is taught by instructor with id [ii] and name [in] and office [io]
*/
CI(ci,ct,ii,in,io,scg)
Now by the definition of NATURAL JOIN,
the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg)
are the rows in SG NATURAL JOIN CI.
And since
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
when/iff
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
ie since
the rows where
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
are the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
we have T = SG NATURAL JOIN CI.
Together the application relationships and situations that can arise determine both the rules and constraints! They are just things that are true of every application situation or every database state (ie values of one or more base tables) (which are are a function of the criteria and the possible application situations.)
Then we normalize to reduce redundancy. Normalization replaces a table variable by others whose predicates AND/conjoin together to the original's when this is beneficial.
The only time a rule can tell you something that you don't know already know from the (putative) criteria and (putative) situations is when you don't really understand the criteria or what situations can turn up, and the a priori rules are clarifying something about that. A person giving you rules is already using application relationships that they assume you understand and they can only have determined that a rule holds by using them and all the application situations that can arise (albeit informally)!
(Unfortunately, many presentations of information modeling don't even mention application relationships. Eg: If someone says "there is a X:Y relationship" then they must already have in mind a particular binary application relationship between entities; knowing it and what application situations can arise, they are reporting that it has a certain cardinality in a certain direction. This will correspond to some application relationship, represented by (a projection of) a table using column sets that identify entities. Plus some presentations/methods call FKs "relationships"--confusing them with those relationships.)
Check out "fact-based" information modeling methods Object-Role Modeling or (its predecessor) NIAM.
FDs & CKs
Given the criterion for putting rows into or leaving them out of a table and all possible situations that can arise, only some values (sets of rows) can ever be in a table variable.
For every subset of columns you need to decide which other columns can only have one value for a given subrow value for those columns. When it can only have one we say that the subset of columns functionally determines that column. We say that there is a FD (functional dependency) columns->column. This is when we can express the table's predicate as "... AND column=F(columns)" for some function F. (F is represented by the projection of the table on the column & columns.) But every superset of that subset will also functionally determine it, so that cuts down on cases. Conversely, if a given set does not determine a column then no subset of the set does. Applying Armstrong's axioms gives all the FDs that hold when given FDs hold. (Algorithms & software are available to apply them & determine FD closures & covers.) Also, you may think in terms of column sets being unique; then all other columns are functionally dependent on that set. Such a set is called a superkey.
Only after you have determined the FDs can you determine the CKs (candidate keys)! A CK is a superkey that contains no smaller superkey. (That a CK and/or superkey is present is also a constraint.) We can pick a CK as PK (primary key). PKs have no other role in relational theory.
A partial dependency relies on either one of the attributes from the
Primary key.
Don't use "involve" or "relies on" to give a definition. Say, "when" or "iff" ("if and only if").
Read a definition. A FD that holds is partial when/iff using a proper subset of the determinant gives a FD that holds with the same determined column; otherwise it is full. Note that this does not involve CKs. A relation is in 2NF when all non-prime attributes are fully functionally dependent on every CK.
A transitive dependency involves two or more non-key attributes in a
functional dependence where one of the non-key attributes is dependent
on a key attribute (from my PK).
Read a definition. S -> T is transitive when/iff there is an X where S -> X and X -> T and not (X -> S) and not (X = T). Note that this does not involve CKs. A relation is in 3NF when all non-prime attributes are non-transitively dependent on every CK.
"1NF" has no single meaning.
I am inferring a functional dependency that was not listed in your business rules. Namely that instructor ID determines instructor name.
If this is true, and if you have both instructor ID and instructor name in the Course table, then this is not in 3NF, because there is a transitive dependency between Course ID, Instructor ID, and Instructor Name.
Why is this harmful? Because duplicating the instructor name in each course an instructor teaches makes updating an instructor name difficult, and possible to do in an inconsistent manner. Inconsistent instructor name is just another bug you have to watch out for, and 3NF obviates the problem. The same argument could be made for Instructor office.

Table needs more than one identifier

I am in no way a SQL expert so I am sure I did something wrong. I have read a few questions on here about needing a primary key. The way I created this table I can't find a way to actually have a unique key. It is a survey type database. I have a table for the main details like date, triage number, and the person involved. Another table for the questions results and another for the comments. I would have made the triage unique but more than one person can be involved so the same triage number would be used more than once. The people involved can appear more than once as well. The only truly unique thing is combining the person with the triage. I thought about an auto key but it would serve no purpose. Can using two identifiers be an acceptable practice for a survey type table?
The important part:
"... more than one person can be involved so the same triage number would be used more than once. The people involved can appear more than once as well."
Based on your comments, data in these two fields, for example:
Triage Person
------ ------
1 PersonA
1 PersonB
...
7 PersonA
7 PersonB
is fine in that Triage and Person can make a composite key, provided each person recorded in the Person field is uniquely identifiable. That is, if ea. person value is a name like "John Smith", you may have a problem if there are 2 or more John Smiths answering the survey. So, your Person value itself has to identify people uniquely. Assuming the triage nos. are distinguished (i.e., no triage no. represents more than one semantically-relevant triage position), these two fields as the composite key will work for you if and only if at no time does your survey create more than one unique triage-person combination.
The foreign key for each of your other tables ought to be the main table's composite key combination, but if the other two tables can be merged into the main one, consider it to reduce join burdens. E.g.: if the comments table stores only comments in a single field and nothing more, why not include that field in the main table and get rid of the comments table?
Your question is quite general and I don't have enough information to give you a definite answer but hopefully my comments below can help.
It is not a problem to use a composite primary key (key consisting of 2 or more columns). It is more often used in linking tables, e.g. in many-to-many relationships.
One thing that you should consider is that if you want to also refer to a table with a composite primary key from other tables, you will have to refer to 2 columns in the foreign key, all the joins, etc. It may be easier to create a separate column for a primary key (e.g. autoincrementing number).