PHP Junction Table Relations (Many to Many), grasping concept - sql

So I've tried searching and have yet to find out how to grasp this entirely.
I'm reorganising my database because I was storing user id's as comma separated values in a column withing that row to control permissions. To me, this seems like a better and faster(hardware) way, but I'm moving towards this "proper" way now.
I understand that you need 3 tables. This is what I have.
Table 1. members -> ID | user_name
Table 2. teams -> ID | team_name
Table 3. team_members -> ID | team_fk | member_fk
I understand how to store data in another column and use sql data to display it. What I'm confused about is why I have to link(relation) the columns to the ID's of the other tables. I could get the data without using the relation. I'm confused by what it even does.
Furthermore, I would like to have multiple values that determine permissions for each team. Would I do:
Table 3. team_members -> ID | team_fk | member_fk | leader_fk | captain_fk
^setting 0 or 1(true or false) for the leader and captain.
Or would I create a table(like team_leaders, team_captains) for each permission?
Thanks for the help!
Ryan

It seems that "leader", "captain and "regular member" are roles in your team. So you can create table team_roles, or just assign roles as strings to your relation table, i.e.
team_members -> ID | team_fk | member_fk | role
The key thing about this is to keep your database [normalised]https://en.wikipedia.org/wiki/Database_normalization. It is really easier to work with normalised database in most cases.

What I'm confused about is why I have to link(relation) the columns to the ID's of the other tables. I could get the data without using the relation.
You don't have to declare columns as foreign keys. It's just a good idea. It serves the following purposes:
It tells readers of the schema how the tables are related to each other. If you name the columns well, this is redundant -- team_fk is pretty obviously a reference to the teams table.
It enables automatic integrity checks by the database. If you try to create a team_members row that contains a team_fk or member_fk that isn't in the corresponding table, it will report an error. Note that in MySQL, this checking is only done by the InnoDB engine, not MyISAM.
Indexes are automatically created for the foreign key columns, which helps to optimize queries between the tables.
Table 3. team_members -> ID | team_fk | member_fk | leader_fk | captain_fk
If leader and captain are just true/false values, they aren't foreign keys. A foreign key column contains a reference to a key in another table. So I would call these is_leader and is_captain.
But you should only put these values in the team_members table if a team can have multiple captains and leaders. If there's just one of each, they should be in the teams table:
teams -> ID | team_name | leader_fk | captain_fk
where leader_fk and captain_fk are IDs from the members table. This will ensure that you can't inadvertently assign is_captain = 1 to multiple members from the same team.

Related

How to control version of records with referenced records (Foreign keys)

I am developing an application and in which I have multiple tables,
Table Name : User_master
|Id (PK,AI) | Name | Email | Phone | Address
Table Name: User_auth
|Id (PK,AI) | user_id(FK_userMaster) | UserName | password | Active_Status
Table Name: Userbanking_details
|Id (PK,AI) | user_id(FK_userMaster) | Bank Name | account Name | IFSC
Now, what I want is to save all the updates done in records should not be updated directly instead it should control the version that means I want to track the log of all previous updates user has done.
Which means if user updates the address, then also previous address record history should be stored into the table.
I have tried it by adding fields version_name, version_latest, updated_version_of field and insert new record when update like
|Id (PK,AI) | Name | Email | Phone | Address |version_name |version_latest| updated_version_of
1 | ABC |ABC#gm.com|741852|LA |1 |0 |1
2 | ABC |ABC#gm.com|852741|NY |2 |1 |1
Now the problem comes here is the user table is in FK with other two listed tables so when updating the record their relationship will be lost because of new ID.
I want to preserve the old data shown as old and new updated records will be in effect only with new transactions.
How can I achieve this?
Depending upon your use case you can make a json field in your tables for storing the previous states or a new identical history table for each table.
Dump the entire hash into the history column everytime the user updates anything.
Or insert a new row in the history table for each update in the original.
Storing historical records and current records in the same table, is not a good practice, in a transactional system.
The reasons are:
There will be more I/O due to scanning more number of pages to identify a record
Additional maintenance effort on the table
Transactions getting bigger, longer and cause time out issues
Additional effort of cascading referential integrity changes to child tables
I would suggest to keep historical records in a separate table. You can have OUTPUT clause to have the historical records to be captured and inserted into separate table. In that way, your referential integrity will remain the same. In the historical table, you don't need to have PK defined.
A below sample for using OUTPUT clause with UPDATE. You can read more about OUTPUT clause here
DECLARE #Updated table( [ID] int,
[Name_old] varchar(50),
[Email_old] varchar(50),
[Phone_old] varchar(50),
[Address_old] varchar(50),
[ModifiedDate_old] datetime);
Update User_Master
Set Email= 'NewEmail#Email.com', Name = 'newName', Phone='NewPhone', Address='NewAddress'
ModifiedDate=Getdate()
OUTPUT deleted.Id as Id, deleted.Name as Name_old, deleted.Email as email_old ,
deleted.ModifiedDate as ModifiedDate_old, deleted.Phone as phone_old, deleted.Address AS Address_old, deleted.ModifiedDate as modifiedDate_old
INTO #updated
Where [Id]=1;
INSERT INTO User_Master_History
SELECT * FROM #updated;
When I have faced this situation in the past I have solved it in the following ways:
First Method
Recommended method.
Have a second table which acts as a change history. Because you are not adding rows to the main table your foreign keys maintain integrity.
There are now mechanisms in SQL Server to do this automatically.
SQL Server 2016 Temporal Tables
SQL Server 2017 Change Data Capture
Second Method
I don't recommend this as a good design, but it does work.
Treat one record as the primary record, and this record maintains a foreign key relationship with records in other related tables which are subject to change tracking.
Always update this primary record with any changes thereby maintaining integrity of the foreign keys.
Add a self-referencing key to this table e.g. Parent and a date-of-change column.
Each time the primary record is updated, store the old values into a new record in the same table, and set the Parent value to the id of the primary record. Again the primary record never changes, and therefore your foreign key relationships maintain integrity.
Using the date-of-change column in conjunction with the change history allows you to reconstruct the exact values at any point in time.

What should a relationships table look like - Need confirmation of my technique

Lets say I have 3 models:
User
Page
Comments
I asked a question based on if I should have each model keep track of its relationships: SQL relationships and best practices
an example of this would be a "Pages" table that states who its author was... The problem seemed to be that if 2 users were the author of the one page, you'd have to add a new specific table called PageRelationshipsWithUsers that might have a reference to the PageID and the UserID that created it and a separate row for the co-author.
Understandably this sounds a bit naff. I would end up with a heck load of relation tables and most likely, it could be replaced with just the one multi-purpose relationship table... So I decided to come up with a relationships table like the following:
Relationships Table New
RelationshipID | ItemID | LinkID | ItemType | LinkType | Status
-----------------------------------------------------------------------------
1 | 23(PageID) | 7(UserID) | ("Page") | ("User") | TRUE
2 | 22(CommentID) | 7(UserID) | ("Comment") | ("User") | TRUE
3 | 22(CommentID) | 23(PageID) | ("Comment") | ("Page") | TRUE
however, I would very much appreciate some input as to how good of an idea laying out my relationships table like this is.
Any thoughts?
Answer was told to me by a work colleague:
Imagine the above relationships table for the model "Book"
A User can Rent a book, so the relation is User -> Book...
But what if he can buy a book too: User->Book....
Ooops, we need a new relationship... and considering this relationship table was supposed to be the 1 size fits all, we now have a requirement to add a new separate table... whoops.
So the answer is NO NO NO. don't, it's naughty. Keep your relationship tables separate and specific.
Your suggestion for a relationship table is not optimal for several reasons:
It's difficult to write queries that join tables through the relationship table, as you will need filters on the ItemType and LinkType columns, which is not intuitive when writing queries.
If a need arises to add new entities in the future, that use different datatypes for their primary keys, you cannot easily store ID's of various datatypes in your ItemID and LinkID columns.
You cannot create explicit foreign keys in your database, to enforce referential integrity, which is possibly the best reason to avoid the design you suggest.
Query performance might suffer.
When normalizing a database, you should not be afraid to have many tables. Just make sure to use a naming convention that makes sense and is self-documenting. For example, you could name the relation table between authors and pages "PageAuthors", instead of "Pages".

What is the best way to copy data from related tables to another related tables?

What is the best way to copy data from related tables to another related tables with same schema. Table are connected with one-to-many relationship.
Consider following schema
firm
id | name | city.id (FK)
employee
id | lastname | firm.id (FK)
firm2
id | name | city_id (FK)
employee2
id | lastname |firm2.id (FK)
What I want to do is to copy rows from firm with specific city.id to firm2 and and their employees assosiated with firm to table employee2.
I use posgresql 9.0 so I have to call SELECT nextval('seq_name') to get new id for table.
Right now I perform this query simply iterating over all rows in Java backend server, but on huge amount of data (50 000 employee and 2000 of firms) it takes too much time ( 1-3 minutes).
I'm wondering is there another more tricky way to do it, for example select data into temp table? Or probably use store procedure and iterate over rows with cursror to avoid buffering on my backend server?
This is one problem caused by simply using a sequence or identity value as your sole primary key in a table.
If there is a real-life unique index/primary key, then you can join on that. The other option would be to create a mapping table as you fill in the tables with sequences then you can insert into the children tables' FKs by joining to the mapping tables. It doesn't completely remove the need for looping, but at least some of the inserts get moved out into a set-based approach.

Storing a hierarchy tree in SQL where each son can have several fathers

I'm trying to store a hierarchical tree in SQL. In my case, the same son can have many fathers (the tree represents a VLSI design where the same cells can be used several times in different designs).
All models I've found on the web describe the employee/manager relationship where each employee has one manager. In my case, the number of fathers can be quite large and if I try to store all of them in a table field, they can exceed the character limit of the field.
Can anyone suggest a better method for storing this tree ?
Thanks,
Meir
One possible way to store this relationship in a relational database would be to create two tables - EMPLOYEE_TABLE and EMPLOYEE_MANAGERS_TABLE
create table EMPLOYEE_TABLE(
emp_id number,
emp_name varchar(200),
primary key(emp_id)
)
create table EMPLOYEE_MANAGERS_TABLE (
id number,
emp_id number,
manager_id number,
primary key(id),
foreign key(emp_id) references employee_table(emp_id),
foreign key(manager_id) references employee_table(emp_id)
)
EMPLOYEE_MANAGERS_TABLE will contain one row per employee_manager relationship.
You can apply the same schema to store the father-son relationship where a son can have more than 1 father.
Use a link table. I'm going to assume you are talking about people and will use that vernacular.
You have a person_table listing all the people and their respective id's. You then have a father_son_table describing the links between each person. Eg.
person_table
id | Name
1 | Matthew
2 | Mark
3 | Luke
4 | John
Say Matthew is Mark's father and Mark was father to Luke and John. In the father_son_table you would have:
father_son_table
id | father_id | son_id
1 | 1 | 2
2 | 2 | 3
3 | 2 | 4
Here you can define as many fathers and sons as you wish.
the number of fathers can be quite large and if I try to store all of them in a table field
eh? Your data is not normalised if you're trying to put multiple values in the same field.
While you say its hierarchical this usually implies that a node has a single 'parent' and 0 or more descendants. If that's not the case then its NOT a hierarchical data model - its a M:N relationship.
Or do you mean that there each node exists in more than one hierarchy?
The question is imposible to answer unless you provide an accurate description of the relationship between records.
You find some clever methods for tree handling in the book of Joe Celko:
http://www.amazon.com/Joe-Celkos-SQL-Smarties-Programming/dp/1558605762
...however, I don't know if it covers your problem
You might want to consider what queries you will most frequently be running against this table. Different strategies for storing hierarchies have advantages / disadvantages based on how the hierarchy is used.
Also, any single-parent strategy for storing a hierarchy could be adapted to handle multiple parents simply by treating each element of the tree as a pointer. Pointers under different parents could each point to the same record.
I would go for additional many-to-many connection table with father_id and son_id columns.

redundant column

I have a database that has two tables, these tables look like this
codes
id | code | member_id
1 | 123 | 2
2 | 234 | 1
3 | 345 |
4 | 456 | 3
members
id | code_id | other info
1 | 2 | blabla
2 | 1 | blabla
3 | 4 | blabla
the basic idea is that if a code is taken then its member id field is filled in, however this is creating a circle link (members points to codes, codes points to members) is there a different way of doing this? is this actually a bad thing?
Update
To answer your questions there are three different code tables with approx 3.5 million codes each, each table is searched depending on different criteria, if the member_id column is empty then the code is unclaimed, else, the code is claimed, this is done so that when we are searching the database we do not need to include another table to tell if it it claimed.
the members table contains the claimants for every single code, so all 10.5 million members
the additional info has things like mobile, flybuys.
the mobile is how we identify the member, but each entry is considered a different member.
It's a bad thing because you can end up with anomalies. For example:
codes
id | code | member_id
1 | 123 | 2
members
id | code_id | other info
2 | 4 | blabla
See the anomaly? Code 1 references its corresponding member, but that member doesn't reference the same code in return. The problem with anomalies is you can't tell which one is the correct, intended reference and which one is a mistake.
Eliminating redundant columns reduces the chance for anomalies. This is a simple process that follows a few very well defined rules, called rules of normalization.
In your example, I would drop the codes.member_id column. I infer that a member must reference a code, but a code does not necessarily reference a member. So I would make members.code_id reference codes.id. But it could go the other way; you don't give enough information for the reader to be sure (as #OMG Ponies commented).
Yeah, this is not good because it presents opportunities for data integrity problems. You've got a one-to-one relationship, so either remove Code_id from the members table, or member_id from the codes table. (in this case it seems like it would make more sense to drop code_id from members since it sounds like you're more frequently going to be querying codes to see which are not assigned than querying members to see which have no code, but you can make that call)
You could simply drop the member_id column and use a foreign key relationship (or its absence) to signify the relationship or lack thereof. The code_id column would then be used as a foreign key to the code. Personally, I do think it's bad simply because it makes it more work to ensure that you don't have corrupt relationships in the DB -- i.e., you have to check that the two columns are synchronized between the tables -- and it doesn't really add anything in the general case. If you are running into performance problems, then you may need to denormalize, but I'd wait until it was definitely a problem (and you'd likely replicate more than just the id in that case).
It depends on what you're doing. If each member always gets exactly one unique code then just put the actual code in the member table.
If there are a set of codes and several members share a code (but each member still has just one) then remove the member_id from the codes table and only store the unique codes. Access a specific code through a member. (you can still join the code table to search on codes)
If a member can have multiple codes then remove the code_id from the member table and the member_id from the code table can create a third table that relates members to codes. Each record in the member table should be a unique record and each record in the code table should be a unique record.
What is the logic behind having the member code in the code table?
It's unnecessary since you can always just do a join if you need both pieces of information.
By having it there you create the potential for integrity issues since you need to update BOTH tables whenever an update is made.
Yes this is a bad idea. Never set up a database to have circular references if you can help it. Now any change has to be made both places and if one place is missed, you have a severe data integrity problem.
First question can each code be assigned to more than one member? Or can each member have more than one code? (this includes over time as well as at any one moment if you need historical records of who had what code when))If the answer to either is yes, then your current structure cannot work. If the answer to both is no, why do you need two tables?
If you can have mulitple codes and multiple members you need a bridging table that has memberid and code id. If you can have multiple members assigned one code, put the code id in the members table. If it is the other way it should be the memberid in the code table. Then properly set up the foreign key relationship.
#Bill Karwin correctly identifies this as a probably design flaw which will lead to anomalies.
Assuming code and member are distinct entities, I would create a thrid table...
What is the relationship between a code and member called? An oath? If this is a real life relationship, someone with domain knowledge in the business will be able to give it a name. If not look for further design flaws:
oaths
code_id | member_id
1 | 2
2 | 1
4 | 3
The data suggest that a unique constraint is required for (code_id, member_id).
Once the data is 'scrubbed', drop the columns codes.member_id and members.code_id.