SQL database structure with two changing properties - sql

Let's assume I am building the backend of a university management software.
I have a users table with the following columns:
id
name
birthday
last_english_grade
last_it_grade
profs table columns:
id
name
birthday
I'd like to have a third table with which I can determine all professors teaching a student.
So I'd like to assign multiple teachers to each student.
Those Professors may change any time.
New students may be added any time too.
What's the best way to achieve this?

The canonical way to do this would be to introduce a third junction table, which exists mainly to relate users to professors:
users_profs (
user_id,
prof_id,
PRIMARY KEY (user_id, prof_id)
)
The primary key of this junction table is the combination of a user and professor ID. Note that this table is fairly lean, and avoids the problem of repeating metadata for a given user or professor. Rather, user/professor information remains in your two original tables, and does not get repeated.

Related

sql many to many relationship table design

I am learning sql now and practicing the scenarios to design the tables. I have a one scenario where I could not find proper suitable table structure.
The scenarios is as follows, I want to store depedencies user journey in sql. For example, in customer creation journey, we need to create valid sector, language and country codes in the system. Another example, to create a new account (bank account), we need to create the sector, language and country followed by customer.
So far, I could think of following table design, but I am sure this is not good as there is no primary key and not following the normalization standards.
journey
dependent
order
CUSTOMER
SECTOR
0
CUSTOMER
LANGUAGE
1
CUSTOMER
COUNTRY
2
ACCOUNT
CUSTOMER
0
I understand that this is many to many relationship as one journey can have many dependent and one dependent can be associated with multiple journeys. I need help to efficiently design the tables in sql, please can anyone help on this.
You need from the intermediate/join table that should look like this -
Table name - journey_dependent
Coll(Jurney_FK) Coll(Dependent_FK)
journey_id dependent_id
You can check more here - https://www.baeldung.com/jpa-many-to-many#1-modeling-a-many-to-many-relationship
If journey and dependent values are PK in origin tables, you have 2 FK. You can create a composite PK on that table with that 2 columns.
Maybe order need to be in dependent table. If not, there is information on that table : order. So this is not a pur relationship table. So you could optionally had a technical PK column (auto increment) on it.

Generate connections table in SQL

I looked for this and did not find a solution that would apply to my scenario.
I'm building a database of game devs and I wish to generate a connections table:
I have the following:
Employee
(
name, date of birth, department they work at, task they do
)
Department
(
department name
)
Task
(
task name
)
and I need to generate a connections table that shows which department contributes to which task. I would do that by checking for each employee their department (only one) and task (also only one) and upon a match, the department contributes to that task.
That is the idea but I have to clue how to code it using Oracle
SELECT DISTINCT "department they work at", "task they do"
FROM Employee;
You should first work out an entity-relationship diagram, that lists the entities you use and with what attributes (and which primary keys), and the relations between those entities. Relationships can be: 1-to-1, 1-to-many and many-to-1, and many-to-many.
In the last case (M:N relation), the implementation in database tables requires an extra table to record such a M:N relationship.
The way to implement a 1:N relationship in a table is adding a foreign key in the child table to the primary key of the parent table.
EDIT: I see that you now supplied some details, and it is clear now that EMPLOYEE table is in fact the connection table, so you could simply query that table and show the DEPTID and TASKID (both the primary keys of their respective tables) to have a connection between departments and tasks. See the query in the other answer, and just add an ORDERBY on DEPTID, to show results in the order of DEPTID.

Query to retrieve data using two foreign keys

I'm working on a football statistics database, and in the table to store results of matches, I have two references to the primary key of a team table: one home, one away.
My intention is to create a query which returns the name of both of the teams, along with other details, but I can't think of a way to achieve this WITH the team names (my attempts so far can only produce one team name, with the other an ID number). I'll give the relation structure if this wasn't clear:
(PKs in bold, FKs asterisk)
team(team_id, team_name, venue)
match(match_id, home_team*, away_team*, home_score, away_score, date,)
My desired output would be a table with these columns:
home_team_name, home_team_score, away_team_score, away_team_name, date, venue
Is this possible with my tables, or should I change the way I store results?
When joining the team table to the match table in a query, you'll need to join the match table to the team table twice. You need to use an different alias for the teams each time.

How to enforce DB integrity with non-unique foreign keys?

I want to have a database table that keeps data with revision history (like pages on Wikipedia). I thought that a good idea would be to have two columns that identify the row: (name, version). So a sample table would look like this:
TABLE PERSONS:
id: int,
name: varchar(30),
version: int,
... // some data assigned to that person.
So if users want to update person's data, they don't make an UPDATE -- instead, they create a new PERSONS row with the same name but different version value. Data shown to the user (for given name) is the one with highest version.
I have a second table, say, DOGS, that references persons in PERSONS table:
TABLE DOGS:
id: int,
name: varchar(30),
owner_name: varchar(30),
...
Obviously, owner_name is a reference to PERSONS.name, but I cannot declare it as a Foreign Key (in MS SQL Server), because PERSONS.name is not unique!
Question: How, then, in MS SQL Server 2008, should I ensure database integrity (i.e., that for each DOG, there exists at least one row in PERSONS such that its PERSON.name == DOG.owner_name)?
I'm looking for the most elegant solution -- I know I could use triggers on PERSONS table, but this is not as declarative and elegant as I want it to be. Any ideas?
Additional Information
The design above has the following advantage that if I need to, I can "remember" a person's current id (or (name, version) pair) and I'm sure that data in that row will never be changed. This is important e.g. if I put this person's data as part of a document that is then printed and in 5 years someone might want to print a copy of it exactly unchanged (e.g. with the same data as today), then this will be very easy for them to do.
Maybe you can think of a completely different design that achieves the same purpose and its integrity can be enforced easier (preferably with foreign keys or other constraints)?
Edit: Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, which I posted as answers. Please vote which one you like better.
In your parent table, create a unique constraint on (id, version). Add version column to your child table, and use a check constraint to make sure that it is always 0. Use a FK constraint to map (parentid, version) to your parent table.
Alternatively you could maintain a person history table for the data that has historic value. This way you keep your Persons and Dogs table tidy and the references simple but also have access to the historically interesting information.
Okay, first thing is that you need to normalize your tables. Google "database normalization" and you'll come up with plenty of reading. The PERSONS table, in particular, needs attention.
Second thing is that when you're creating foreign key references, 99.999% of the time you want to reference an ID (numeric) value. I.e., [DOGS].[owner] should be a reference to [PERSONS].[id].
Edit: Adding an example schema (forgive the loose syntax). I'm assuming each dog has only a single owner. This is one way to implement Person history. All columns are not-null.
Persons Table:
int Id
varchar(30) name
...
PersonHistory Table:
int Id
int PersonId (foreign key to Persons.Id)
int Version (auto-increment)
varchar(30) name
...
Dogs Table:
int Id
int OwnerId (foreign key to Persons.Id)
varchar(30) name
...
The latest version of the data would be stored in the Persons table directly, with older data stored in the PersonHistory table.
I would use and association table to link the many versions to the one pk.
A project I have worked on addressed a similar problem. It was a biological records database where species names can change over time as new research improved understanding of taxonomy.
However old records needed to remain related to the original species names. It got complicated but the basic solution was to have a NAME table that just contained all unique species names, a species table that represented actual species and a NAME_VERSION table that linked the two together. At any one time there would be a preferred name (ie the currently accepted scientific name for the species) which was a boolean field held in name_version.
In your example this would translate to a Details table (detailsid, otherdetails columns) a link table called DetailsVersion (detailsid, personid) and a Person Table (personid, non-changing data). Relate dogs to Person.
Persons
id (int),
name,
.....
activeVersion (this will be UID from personVersionInfo)
note: Above table will have 1 row for each person. will have original info with which person was created.
PersonVersionInfo
UID (unique identifier to identify person + version),
id (int),
name,
.....
versionId (this will be generated for each person)
Dogs
DogID,
DogName
......
PersonsWithDogs
UID,
DogID
EDIT: You will have to join PersonWithDogs, PersionVersionInfo, Dogs to get the full picture (as of today). This kind of structure will help you link a Dog to the Owner (with a specific version).
In case the Person's info changes and you wish to have latest info associated with the Dog, you will have to Update PersonWithDogs table to have the required UID (of the person) for the given Dog.
You can have restrictions such as DogID should be unique in PersonWithDogs.
And in this structure, a UID (person) can have many Dogs.
Your scenarios (what can change/restrictions etc) will help in designing the schema better.
Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the first of them. Please vote which one you like better.
Solution 1
In PERSONS table, we leave only the name (unique identifier) and a link to current person's data:
TABLE PERSONS:
name: varchar(30),
current_data_id: int
We create a new table, PERSONS_DATA, that contains all data history for that person:
TABLE PERSONS_DATA:
id: int
version: int (auto-generated)
... // some data, like address, etc.
DOGS table stays the same, it still points to a person's name (FK to PERSONS table).
ADVANTAGE: for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)
DISADVANTAGE: if you want to change a person's data, you have to:
add a new PERSONS_DATA row
update PERSONS entry for this person to point to the new PERSONS_DATA row.
Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the second of them. Please vote which one you like better.
Solution 2
In PERSONS table, we leave only the name (unique identifier) and a link to the first (not current!) person's data:
TABLE PERSONS:
name: varchar(30),
first_data_id: int
We create a new table, PERSONS_DATA, that contains all data history for that person:
TABLE PERSONS_DATA:
id: int
name: varchar(30)
version: int (auto-generated)
... // some data, like address, etc.
DOGS table stays the same, it still points to a person's name (FK to PERSONS table).
ADVANTAGES:
for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)
if I want to change a person's data, I don't have to update the PERSONS row, only add a new PERSONS_DATA row
DISADVANTAGE: to retrieve current person's data, I have to either:
choose PERSONS_DATA with given name and highest version (may be expensive)
choose PERSONS_DATA with special version, e.g. "-1", but then I would have to update two PERSONS_DATA rows each time I add new PERSONS_DATA, and in this solution I wanted to avoid having to update 2 rows...
What do you think?

Can a many-to-many join table have more than two columns?

I have some tables that benefit from many-to-many tables. For example the team table.
Team member can hold more than one 'position' in the team, all the positions are listed in the position db table. The previous positions held are also stored for this I have a separate table, so I have
member table (containing team details)
positions table (containing positions)
member_to_positions table (id of member and id of position)
member_to_previous_positions (id of member and id of position)
Simple, however the crux comes now that a team member can belong to many teams aghhh.
I already have a team_to_member look-up table.
Now the problem comes how do I tie a position to a team? A member may have been team leader on one team, and is currently team radio man and press officer on a different team. How do I just pull the info per member to show his current position, but also his past history including past teams.
Do I need to add a position_to team table and somehow cross reference that, or can I add the team to the member to positions table?
It's all very confusing, this normalization.
Yes, a many-to-many junction table can have additional attributes (columns).
For example, if there's a table called PassengerFlight table that's keyed by PassengerID and FlightID, there could be a third column showing the status of the given passenger on the given flight. Two different statuses might be "confirmed" and "wait listed", each of them coded somehow.
In addition, there can be ternary relationships, relationships that involve three entities and not just two. These tables are going to have three foreign keys that taken together are the primary key for the relationship table.
It's perfectly legitimate to have a TeamPositionMember table, with the columns
Team_Id
Position_Code
Member_Id
Start_Date
End_Date NULLABLE
And and a surrogate ID column for Primary Key if you want; otherwise it's a 3-field composite Primary Key. (You'll want a uniqueness constraint on this anyway.)
With this arrangement, you can have a team with any set of positions. A team can have zero or more persons per position. A person can fill zero or more positions for zero or more teams.
EDIT:
If you want dates, just revise as shown above, and add Start_Date to the PK to allow the same person to hold the same position at different times.
My first thought:
Give your many-to-many teams/members table an ID column. Every team-to-member relationship now has an ID.
Then create a many-to-many linking positions to team-member relationships.
This way, teams can have multiple members, members can have multiple teams, and members can have multiple positions on a per-team basis.
Now everything is nice and DRY, and all the linking up seems to work. Does that sound right to anyone else?
Sounds like you need a many-to-many positions to teams table now.
Your team_to_member table can indeed have an extra column position_id to describe (or in this case point to) the position the member has within that team.
Get rid of member_to_previous_position table. Just use member_to_positions and have these columns:
MemberToPositionID (autoincrement OK only)
MemberID
PositionID
StartDate
EndDate
Then to find current positions, you do:
select *
from member_to_positions
where EndDate is null