Storing a hierarchy tree in SQL where each son can have several fathers - sql

I'm trying to store a hierarchical tree in SQL. In my case, the same son can have many fathers (the tree represents a VLSI design where the same cells can be used several times in different designs).
All models I've found on the web describe the employee/manager relationship where each employee has one manager. In my case, the number of fathers can be quite large and if I try to store all of them in a table field, they can exceed the character limit of the field.
Can anyone suggest a better method for storing this tree ?
Thanks,
Meir

One possible way to store this relationship in a relational database would be to create two tables - EMPLOYEE_TABLE and EMPLOYEE_MANAGERS_TABLE
create table EMPLOYEE_TABLE(
emp_id number,
emp_name varchar(200),
primary key(emp_id)
)
create table EMPLOYEE_MANAGERS_TABLE (
id number,
emp_id number,
manager_id number,
primary key(id),
foreign key(emp_id) references employee_table(emp_id),
foreign key(manager_id) references employee_table(emp_id)
)
EMPLOYEE_MANAGERS_TABLE will contain one row per employee_manager relationship.
You can apply the same schema to store the father-son relationship where a son can have more than 1 father.

Use a link table. I'm going to assume you are talking about people and will use that vernacular.
You have a person_table listing all the people and their respective id's. You then have a father_son_table describing the links between each person. Eg.
person_table
id | Name
1 | Matthew
2 | Mark
3 | Luke
4 | John
Say Matthew is Mark's father and Mark was father to Luke and John. In the father_son_table you would have:
father_son_table
id | father_id | son_id
1 | 1 | 2
2 | 2 | 3
3 | 2 | 4
Here you can define as many fathers and sons as you wish.

the number of fathers can be quite large and if I try to store all of them in a table field
eh? Your data is not normalised if you're trying to put multiple values in the same field.
While you say its hierarchical this usually implies that a node has a single 'parent' and 0 or more descendants. If that's not the case then its NOT a hierarchical data model - its a M:N relationship.
Or do you mean that there each node exists in more than one hierarchy?
The question is imposible to answer unless you provide an accurate description of the relationship between records.

You find some clever methods for tree handling in the book of Joe Celko:
http://www.amazon.com/Joe-Celkos-SQL-Smarties-Programming/dp/1558605762
...however, I don't know if it covers your problem

You might want to consider what queries you will most frequently be running against this table. Different strategies for storing hierarchies have advantages / disadvantages based on how the hierarchy is used.
Also, any single-parent strategy for storing a hierarchy could be adapted to handle multiple parents simply by treating each element of the tree as a pointer. Pointers under different parents could each point to the same record.

I would go for additional many-to-many connection table with father_id and son_id columns.

Related

PHP Junction Table Relations (Many to Many), grasping concept

So I've tried searching and have yet to find out how to grasp this entirely.
I'm reorganising my database because I was storing user id's as comma separated values in a column withing that row to control permissions. To me, this seems like a better and faster(hardware) way, but I'm moving towards this "proper" way now.
I understand that you need 3 tables. This is what I have.
Table 1. members -> ID | user_name
Table 2. teams -> ID | team_name
Table 3. team_members -> ID | team_fk | member_fk
I understand how to store data in another column and use sql data to display it. What I'm confused about is why I have to link(relation) the columns to the ID's of the other tables. I could get the data without using the relation. I'm confused by what it even does.
Furthermore, I would like to have multiple values that determine permissions for each team. Would I do:
Table 3. team_members -> ID | team_fk | member_fk | leader_fk | captain_fk
^setting 0 or 1(true or false) for the leader and captain.
Or would I create a table(like team_leaders, team_captains) for each permission?
Thanks for the help!
Ryan
It seems that "leader", "captain and "regular member" are roles in your team. So you can create table team_roles, or just assign roles as strings to your relation table, i.e.
team_members -> ID | team_fk | member_fk | role
The key thing about this is to keep your database [normalised]https://en.wikipedia.org/wiki/Database_normalization. It is really easier to work with normalised database in most cases.
What I'm confused about is why I have to link(relation) the columns to the ID's of the other tables. I could get the data without using the relation.
You don't have to declare columns as foreign keys. It's just a good idea. It serves the following purposes:
It tells readers of the schema how the tables are related to each other. If you name the columns well, this is redundant -- team_fk is pretty obviously a reference to the teams table.
It enables automatic integrity checks by the database. If you try to create a team_members row that contains a team_fk or member_fk that isn't in the corresponding table, it will report an error. Note that in MySQL, this checking is only done by the InnoDB engine, not MyISAM.
Indexes are automatically created for the foreign key columns, which helps to optimize queries between the tables.
Table 3. team_members -> ID | team_fk | member_fk | leader_fk | captain_fk
If leader and captain are just true/false values, they aren't foreign keys. A foreign key column contains a reference to a key in another table. So I would call these is_leader and is_captain.
But you should only put these values in the team_members table if a team can have multiple captains and leaders. If there's just one of each, they should be in the teams table:
teams -> ID | team_name | leader_fk | captain_fk
where leader_fk and captain_fk are IDs from the members table. This will ensure that you can't inadvertently assign is_captain = 1 to multiple members from the same team.

Relational model : Company has multiple companies

my problem is the following :
How should I represent in a relational model :
A HQ has at least 0 or more (0,N) companies and those depend of 1 and only 1 HQ.
Knowing that : HQ has many fields similar to companies.
A) Should I create 2 tables ? One called HQ and another company.
B) Should it be a recursive on the same table ?
C) Is there another way to represent this relation ?
Using the same table with a parent field works very well on its own if the HQ has all the same fields as the rest. However, if there attributes of a HQ that are not shared by a company as you say, then you'll also need to have a separate table for the HQ-specific data. So yes, 2 tables. But take jbarker's idea as a starting point. Then add an HQ table with a companyID foreign key. An HQ record will have the companyID of the company that is a HQ, which as he says will have a value of NULL for the parent.
As for your question about recursivity, you'll have recursive relationships or "self joins" for the company data, and not for HQ-specific data.

What should a relationships table look like - Need confirmation of my technique

Lets say I have 3 models:
User
Page
Comments
I asked a question based on if I should have each model keep track of its relationships: SQL relationships and best practices
an example of this would be a "Pages" table that states who its author was... The problem seemed to be that if 2 users were the author of the one page, you'd have to add a new specific table called PageRelationshipsWithUsers that might have a reference to the PageID and the UserID that created it and a separate row for the co-author.
Understandably this sounds a bit naff. I would end up with a heck load of relation tables and most likely, it could be replaced with just the one multi-purpose relationship table... So I decided to come up with a relationships table like the following:
Relationships Table New
RelationshipID | ItemID | LinkID | ItemType | LinkType | Status
-----------------------------------------------------------------------------
1 | 23(PageID) | 7(UserID) | ("Page") | ("User") | TRUE
2 | 22(CommentID) | 7(UserID) | ("Comment") | ("User") | TRUE
3 | 22(CommentID) | 23(PageID) | ("Comment") | ("Page") | TRUE
however, I would very much appreciate some input as to how good of an idea laying out my relationships table like this is.
Any thoughts?
Answer was told to me by a work colleague:
Imagine the above relationships table for the model "Book"
A User can Rent a book, so the relation is User -> Book...
But what if he can buy a book too: User->Book....
Ooops, we need a new relationship... and considering this relationship table was supposed to be the 1 size fits all, we now have a requirement to add a new separate table... whoops.
So the answer is NO NO NO. don't, it's naughty. Keep your relationship tables separate and specific.
Your suggestion for a relationship table is not optimal for several reasons:
It's difficult to write queries that join tables through the relationship table, as you will need filters on the ItemType and LinkType columns, which is not intuitive when writing queries.
If a need arises to add new entities in the future, that use different datatypes for their primary keys, you cannot easily store ID's of various datatypes in your ItemID and LinkID columns.
You cannot create explicit foreign keys in your database, to enforce referential integrity, which is possibly the best reason to avoid the design you suggest.
Query performance might suffer.
When normalizing a database, you should not be afraid to have many tables. Just make sure to use a naming convention that makes sense and is self-documenting. For example, you could name the relation table between authors and pages "PageAuthors", instead of "Pages".

More joins or more columns?

I have a very basic question, which would be a more efficient design, something that involves more joins, or just adding columns to one larger table?
For instance, if we had a table that stored relatives like below:
Person | Father | Mother | Cousing | Etc.
________________________________________________
Would it be better to list the name, age, etc. directly in that table.. or better to have a person table with their name, age, etc., and linked by person_id or something?
This may be a little simplistic of an example, since there are more than just those two options. But for the sake of illustration, assume that the relationships cannot be stored in the person table.
I'm doing the latter of the two choices above currently, but I'm curious if this will get to a point where the performance will suffer, either when the person table gets large enough or when there are enough linked columns in the relations table.
Id' go for more "Normality" to increase flexibility and reduce data duplication.
PERSON:
ID
First Name
Last Name
Person_Relations
PersonID
RelationID
TypeID
Relation_Type
TypeID
Description
This way you could support any relationship (4th cousin mothers side once removed) without change code.
It is a much more flexible design to separate out the details of each person from the table relating them together. Typically, this will lead to less data consumption.
You could even go one step further and have three tables: one for people, one for relationship_types, and one for relationships.
People would have all the individual identifying info -- age, name, etc.
Relationship_types would have a key, a label, and potentially a description. This table is for elaborating the details of each possible relationship. So you would have a row for 'parent', a row for 'child', a row for 'sibling', etc.
Then the Relationships table has a four fields: one for the key of each person in the relationship, one for the key of the relationship_type, and one for its own key. Note that you need to be explicit in how you name the person columns to make it clear which party is which part of the relationship (i.e. saying that A and B have a 'parent' relationship only makes sense if you indicate which person is the parent vs which has the parent).
Depending on how you plan to use the data a better structure may be
a table for Person ( id , name etc )
a table for relationships (person_a_id, person_b_id, relation_type
etc)
where person_a_id and person_b_id relate to id in person
sample data may look like
Person
ID Name
1 Frank
2 Suzy
3 Emma
Relationship
A B Relationship
1 2 Wife
2 1 Husband
1 3 Daughter
2 3 Daughter
3 1 Father
3 2 Mother

Question on Database Modeling

Tables:
Students
Professors
Entries (there’s no physical table intry in the database for entries yet, this table is on the front-end, so it is probably composed from multiple helper tables if we need them. Just need to create valid erd)
Preambula:
One student can have an association to many professors
One professor can have an association to many students
One entry can have 0,1 or more Students or professors in it.
Professor is required to be associated with one or more students
Student is not required to have an association with any professor
It should be more like this (front-end entry table):
Any professor in this table must have an associated name in the table.( For example Wandy is associated to Alex)
It is not required for student (but possible) to have associated professors in this table
One row (for example Linda (Student), Kelly (Professor),Victor (Professor))
Cannot be associated between each other in any manner.
But it is absolutely fine if Linda associated with David.
The problem is that I do not quite understand how one column can have ids of different tables (And those are multiple!) And do not quite understand how to build valid erd for that.
I will answer any additional questions you need. Thanks a lot!
If you simply want an association between Students and Professors - just make a many-to-many relationship in ERD. In logical (relational) schema it will make an intermediate table with foreign keys to Student and Professor tables.
But from your example it looks like you need to design the DB for your "PeopleEntries", which is not straightforward. ERD seems to have the following entities:
Students(ID, name)
Professors(ID,
name)
PeopleEntries(ID, LoveCats,
LoveDogs, LoveAnts)
Relationships (considering people cannot appear in entries more than once):
Students Many - 1 PeopleEntries
Professors Many - 1 PeopleEntries
Students Many - Many Professors
Relational schema would contain tables (foreign keys according to erd relationships):
Students(ID, name, PeopleEntryID FK)
Professors(ID, name, PeopleEntryID
FK)
PeopleEntries(ID, LoveCats, LoveDogs,
LoveAnts)
StudentProfessor(StudentID FK,
ProfessorID FK)
I don't know how to implement the constraint, disallowing association between people from the same entry, on conceptual level (ER-diagram). On physical level you can implement the logic in triggers or update procedures to check this.
As per my quick understanding,
Create a table with following columns
PersonName
Designation
.....
Create one more table
PersonName
LinksTo
In the second table each person entry will have multiple records based on the relation
You want a junction table:
ID StudentID ProfessorID
0 23 34
1 22 34
2 12 33
3 12 34
In the table above, one professor has 3 students, one student has two professors.
StudentID and ProfessorID should together be a unique index to avoid duplicate relationships.