ERD Mapping - Relational Database - sql

With in my ERD I have a disjoint mandatory relationship. See the following example.
** Assume "Teen" and "Adult" entities have their own unique attributes.
How to map this kind of a relation ship. And how to implement this with in tables.
[As I know you have to create two tables called "Teen" and "Adult". Both have the same primary key "st_id". But the problem is, in both tables the "st_id" has to be unique since the relationship is mandatory. How to implement that?]
Example:
**Teen**
st_id | name
001 AA
002 BB
**Adult**
st_id | name
(as the st_id you can't have the value 001 again since it is already in the Teen table)

What is stopping you from having three tables:
STUDENT [PK: st_id] (optionally also an attribute of type BOOLEAN or similar, "is_teen" or "is_adult")
TEEN [PK: st_id]
ADULT [PK: st_id]
We will add one row into STUDENT for every new student; each time you add a student, then create a TEEN row if the new student is a teen, otherwise create a new ADULT row.
In either case, the st_id to use will be that of the new student. The st_ids in TEEN and STUDENT will be mutually exclusive, but all st_ids will point back to the appropriate STUDENT row.
Your ERD snippet appears to be for a Logical Data Model; I have copied that into the physical implementation directly. In the real world, often we would "roll up" (denormalize) the TEEN and ADULT attributes into the STUDENT entity to limit the amount of joining needed to get to all the info. available for a given student.
Optionally, you may choose to define a pair of foreign key relationships between STUDENT and TEEN and between STUDENT and ADULT.

Why not use 1 table for teen and adult like this
person
------
id int
name varchar
is_adult bit
study
-------
id int
person_id int
...

Related

Design a table to store different values for single entity?

I am designing a database where I have a doubt. My requirement is to store the subject score for each student. I can achieve in two ways like below.
student_id and each subject as column and store one record for each student.
student_id,subject_name,score as columns and store one record per subject.
I need help in understanding the pros and cons of each implementation type.
Or table for Students:
StudentID - primary key
StudentName
etc.
and one for Subjects:
SubjectID - primary key
SubjectName
etc.
and one for Scores:
SubjectID
StudentID
Score
etc. (might be you want date here)
PrimaryKey (SubjectID, StudentID, SemesterID?)
Think about the last table - it will combine student and subject details given a score for each entity but you may need to add some date here, or exam ID or something else as one student may have score for same subject during the years (for example on math).
I think you're going to need 3 tables:
students
subjects
scores
students and subjects are master tables. They're going to provide student_id and subject_id for scores table as composite key. Here's the relational database design.
The point of seperating them into 3 tables is to avoid data anomalies. There are 3 anomalies:
Insert anomaly
Delete anomaly
Update anomaly
To learn more about data anomalies: https://www.geeksforgeeks.org/anomalies-in-relational-model/

Why is this table not normalized?

I am taking a database course and I am studying table normalization.
Could anyone explain to me, why the second table in the first row on the right not normalized?
It is not normalized because
For a student who has signed for more than one course, the entries in the table will be:
23 Jake Smith CS101 B+
23 Jake Smith B102 C+
Clearly the data is being repeated(redundant data). It is leading to anomalies(insert, update, delete anomalies).
Ex:When you have to change the name of a Student say Jake Smith, you have to modify all of the rows,this is called an update anomalie.
Normalization is used to avoid these kind of anomalies and redundant data storage.
The table on the right hand side in the second row handles this situation in a better way, as it stores id, name and DOB in a separate table, the edits can be made easily using id attribute on a single row.
There are several normal forms like 1NF, 2NF, 3NF etc. Each normal form has some constraints associated with it. Each Higher form being stricter than the previous one.
I suppose it is table for students grades. It is not normalized because it contains students names directly, instead of references to students records.
It's better not to include student_name into this table, but store all students data in separate students table and reference it by student_id foreign key (something like first table in second row except the ids.).
It's not normalised because neither id nor student_name is the key (both have duplicates) so the key must be one of those (probably id) together with the course code. The other one (name) then doesn't depend on that key, but just on id.
The simple rule for 3NF is that every non-key column must depend on "the key, the whole key, and nothing but the key" - to which we all solemnly intone "so help me Codd"!
The higher normal forms deal with dependencies inside the parts of a key.
Because in your first right table you have twice values
23 - j.smith
that is repeated and do not adhere to Codd 1 normal form

More joins or more columns?

I have a very basic question, which would be a more efficient design, something that involves more joins, or just adding columns to one larger table?
For instance, if we had a table that stored relatives like below:
Person | Father | Mother | Cousing | Etc.
________________________________________________
Would it be better to list the name, age, etc. directly in that table.. or better to have a person table with their name, age, etc., and linked by person_id or something?
This may be a little simplistic of an example, since there are more than just those two options. But for the sake of illustration, assume that the relationships cannot be stored in the person table.
I'm doing the latter of the two choices above currently, but I'm curious if this will get to a point where the performance will suffer, either when the person table gets large enough or when there are enough linked columns in the relations table.
Id' go for more "Normality" to increase flexibility and reduce data duplication.
PERSON:
ID
First Name
Last Name
Person_Relations
PersonID
RelationID
TypeID
Relation_Type
TypeID
Description
This way you could support any relationship (4th cousin mothers side once removed) without change code.
It is a much more flexible design to separate out the details of each person from the table relating them together. Typically, this will lead to less data consumption.
You could even go one step further and have three tables: one for people, one for relationship_types, and one for relationships.
People would have all the individual identifying info -- age, name, etc.
Relationship_types would have a key, a label, and potentially a description. This table is for elaborating the details of each possible relationship. So you would have a row for 'parent', a row for 'child', a row for 'sibling', etc.
Then the Relationships table has a four fields: one for the key of each person in the relationship, one for the key of the relationship_type, and one for its own key. Note that you need to be explicit in how you name the person columns to make it clear which party is which part of the relationship (i.e. saying that A and B have a 'parent' relationship only makes sense if you indicate which person is the parent vs which has the parent).
Depending on how you plan to use the data a better structure may be
a table for Person ( id , name etc )
a table for relationships (person_a_id, person_b_id, relation_type
etc)
where person_a_id and person_b_id relate to id in person
sample data may look like
Person
ID Name
1 Frank
2 Suzy
3 Emma
Relationship
A B Relationship
1 2 Wife
2 1 Husband
1 3 Daughter
2 3 Daughter
3 1 Father
3 2 Mother

Question on Database Modeling

Tables:
Students
Professors
Entries (there’s no physical table intry in the database for entries yet, this table is on the front-end, so it is probably composed from multiple helper tables if we need them. Just need to create valid erd)
Preambula:
One student can have an association to many professors
One professor can have an association to many students
One entry can have 0,1 or more Students or professors in it.
Professor is required to be associated with one or more students
Student is not required to have an association with any professor
It should be more like this (front-end entry table):
Any professor in this table must have an associated name in the table.( For example Wandy is associated to Alex)
It is not required for student (but possible) to have associated professors in this table
One row (for example Linda (Student), Kelly (Professor),Victor (Professor))
Cannot be associated between each other in any manner.
But it is absolutely fine if Linda associated with David.
The problem is that I do not quite understand how one column can have ids of different tables (And those are multiple!) And do not quite understand how to build valid erd for that.
I will answer any additional questions you need. Thanks a lot!
If you simply want an association between Students and Professors - just make a many-to-many relationship in ERD. In logical (relational) schema it will make an intermediate table with foreign keys to Student and Professor tables.
But from your example it looks like you need to design the DB for your "PeopleEntries", which is not straightforward. ERD seems to have the following entities:
Students(ID, name)
Professors(ID,
name)
PeopleEntries(ID, LoveCats,
LoveDogs, LoveAnts)
Relationships (considering people cannot appear in entries more than once):
Students Many - 1 PeopleEntries
Professors Many - 1 PeopleEntries
Students Many - Many Professors
Relational schema would contain tables (foreign keys according to erd relationships):
Students(ID, name, PeopleEntryID FK)
Professors(ID, name, PeopleEntryID
FK)
PeopleEntries(ID, LoveCats, LoveDogs,
LoveAnts)
StudentProfessor(StudentID FK,
ProfessorID FK)
I don't know how to implement the constraint, disallowing association between people from the same entry, on conceptual level (ER-diagram). On physical level you can implement the logic in triggers or update procedures to check this.
As per my quick understanding,
Create a table with following columns
PersonName
Designation
.....
Create one more table
PersonName
LinksTo
In the second table each person entry will have multiple records based on the relation
You want a junction table:
ID StudentID ProfessorID
0 23 34
1 22 34
2 12 33
3 12 34
In the table above, one professor has 3 students, one student has two professors.
StudentID and ProfessorID should together be a unique index to avoid duplicate relationships.

How to enforce DB integrity with non-unique foreign keys?

I want to have a database table that keeps data with revision history (like pages on Wikipedia). I thought that a good idea would be to have two columns that identify the row: (name, version). So a sample table would look like this:
TABLE PERSONS:
id: int,
name: varchar(30),
version: int,
... // some data assigned to that person.
So if users want to update person's data, they don't make an UPDATE -- instead, they create a new PERSONS row with the same name but different version value. Data shown to the user (for given name) is the one with highest version.
I have a second table, say, DOGS, that references persons in PERSONS table:
TABLE DOGS:
id: int,
name: varchar(30),
owner_name: varchar(30),
...
Obviously, owner_name is a reference to PERSONS.name, but I cannot declare it as a Foreign Key (in MS SQL Server), because PERSONS.name is not unique!
Question: How, then, in MS SQL Server 2008, should I ensure database integrity (i.e., that for each DOG, there exists at least one row in PERSONS such that its PERSON.name == DOG.owner_name)?
I'm looking for the most elegant solution -- I know I could use triggers on PERSONS table, but this is not as declarative and elegant as I want it to be. Any ideas?
Additional Information
The design above has the following advantage that if I need to, I can "remember" a person's current id (or (name, version) pair) and I'm sure that data in that row will never be changed. This is important e.g. if I put this person's data as part of a document that is then printed and in 5 years someone might want to print a copy of it exactly unchanged (e.g. with the same data as today), then this will be very easy for them to do.
Maybe you can think of a completely different design that achieves the same purpose and its integrity can be enforced easier (preferably with foreign keys or other constraints)?
Edit: Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, which I posted as answers. Please vote which one you like better.
In your parent table, create a unique constraint on (id, version). Add version column to your child table, and use a check constraint to make sure that it is always 0. Use a FK constraint to map (parentid, version) to your parent table.
Alternatively you could maintain a person history table for the data that has historic value. This way you keep your Persons and Dogs table tidy and the references simple but also have access to the historically interesting information.
Okay, first thing is that you need to normalize your tables. Google "database normalization" and you'll come up with plenty of reading. The PERSONS table, in particular, needs attention.
Second thing is that when you're creating foreign key references, 99.999% of the time you want to reference an ID (numeric) value. I.e., [DOGS].[owner] should be a reference to [PERSONS].[id].
Edit: Adding an example schema (forgive the loose syntax). I'm assuming each dog has only a single owner. This is one way to implement Person history. All columns are not-null.
Persons Table:
int Id
varchar(30) name
...
PersonHistory Table:
int Id
int PersonId (foreign key to Persons.Id)
int Version (auto-increment)
varchar(30) name
...
Dogs Table:
int Id
int OwnerId (foreign key to Persons.Id)
varchar(30) name
...
The latest version of the data would be stored in the Persons table directly, with older data stored in the PersonHistory table.
I would use and association table to link the many versions to the one pk.
A project I have worked on addressed a similar problem. It was a biological records database where species names can change over time as new research improved understanding of taxonomy.
However old records needed to remain related to the original species names. It got complicated but the basic solution was to have a NAME table that just contained all unique species names, a species table that represented actual species and a NAME_VERSION table that linked the two together. At any one time there would be a preferred name (ie the currently accepted scientific name for the species) which was a boolean field held in name_version.
In your example this would translate to a Details table (detailsid, otherdetails columns) a link table called DetailsVersion (detailsid, personid) and a Person Table (personid, non-changing data). Relate dogs to Person.
Persons
id (int),
name,
.....
activeVersion (this will be UID from personVersionInfo)
note: Above table will have 1 row for each person. will have original info with which person was created.
PersonVersionInfo
UID (unique identifier to identify person + version),
id (int),
name,
.....
versionId (this will be generated for each person)
Dogs
DogID,
DogName
......
PersonsWithDogs
UID,
DogID
EDIT: You will have to join PersonWithDogs, PersionVersionInfo, Dogs to get the full picture (as of today). This kind of structure will help you link a Dog to the Owner (with a specific version).
In case the Person's info changes and you wish to have latest info associated with the Dog, you will have to Update PersonWithDogs table to have the required UID (of the person) for the given Dog.
You can have restrictions such as DogID should be unique in PersonWithDogs.
And in this structure, a UID (person) can have many Dogs.
Your scenarios (what can change/restrictions etc) will help in designing the schema better.
Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the first of them. Please vote which one you like better.
Solution 1
In PERSONS table, we leave only the name (unique identifier) and a link to current person's data:
TABLE PERSONS:
name: varchar(30),
current_data_id: int
We create a new table, PERSONS_DATA, that contains all data history for that person:
TABLE PERSONS_DATA:
id: int
version: int (auto-generated)
... // some data, like address, etc.
DOGS table stays the same, it still points to a person's name (FK to PERSONS table).
ADVANTAGE: for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)
DISADVANTAGE: if you want to change a person's data, you have to:
add a new PERSONS_DATA row
update PERSONS entry for this person to point to the new PERSONS_DATA row.
Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the second of them. Please vote which one you like better.
Solution 2
In PERSONS table, we leave only the name (unique identifier) and a link to the first (not current!) person's data:
TABLE PERSONS:
name: varchar(30),
first_data_id: int
We create a new table, PERSONS_DATA, that contains all data history for that person:
TABLE PERSONS_DATA:
id: int
name: varchar(30)
version: int (auto-generated)
... // some data, like address, etc.
DOGS table stays the same, it still points to a person's name (FK to PERSONS table).
ADVANTAGES:
for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)
if I want to change a person's data, I don't have to update the PERSONS row, only add a new PERSONS_DATA row
DISADVANTAGE: to retrieve current person's data, I have to either:
choose PERSONS_DATA with given name and highest version (may be expensive)
choose PERSONS_DATA with special version, e.g. "-1", but then I would have to update two PERSONS_DATA rows each time I add new PERSONS_DATA, and in this solution I wanted to avoid having to update 2 rows...
What do you think?