If in a databse we have a parent table and two children tables. Is it better to use joins to get the children or add a flag to distinguish them ?
For example, the parent table is Person[Person_Name, Person_ID]. The first child table is Employee[Person_ID, Employee_ID, Department] and the other child is Customer[Person_ID, Location, Rank].
So, is it a good thing to add flag [isEmployee] or [isCustomer] to the parent table (Person) and save the effort of Joining the tables on "person_Id" ?
Another case would be with one child, for example, the parent table would be Member[Member_Name, Member_ID] and a child table GoldenMember[Member_ID, Phone_Number, EMail].
Now in this case, if I want to show the info of a specific Member, I need to do a join between tables to see whether it's a Golden Memmber or not, but if the flag "isGolden" was in the table (Member) it would save us a join?
So, which is better and why ??
Thanks in advance :)
There is no "better" unless you provide criteria for measurement of "goodness".
SQL's support for entity subtyping is inadequate. You can hack your way around any of the shortcomings that there are, but each hack will do no more than introduce new problems of its own.
Additional "Type" columns on the top level introduce the problem of database updating becoming more complex. Defective update procedures will corrupt the database's integrity.
Leaving out the additional "Type" columns at the top level will make the problem of formulating read queries more complex (more joins, notably). Many people would add here "and degrade performance", but it's unlikely that you will suffer noticeably from this.
Choose which difficulty is the easiest to live with in your particular use case.
Related
Let's say I have a database in which one entity (i.e. table) inherits from another one, for example:
Table 1, named person: (name,surname)
Table 2, named car_owner
In this case, car_owner inherits from person, i.e. a car-owner IS a person.
I'm now in a point where I have to decide whether I should:
create the table car_owner, even though it has no extra columns except the ones in person, although in the future this might change => doing this results in car_owner = table with columns (id,person_id), where person_id is FK to person
or
leave only the person table for now and only do (1) when/if extra information regarding a car-owner will appear => note that if I do this FKs to a car-owner from other tables would actually be FKs to the person table
The tables I'm dealing with have different names and semantics and the choice between (1) and (2) is not clear, because the need for extra columns in car_owner might never pop-up.
Conceptually, (1) seems to be the right choice, but I guess what I'm asking is if there are any serious issues I might run into later if I instead resort to (2)
I would suggest that option 1 is the better answer. While it creates more work to join the tables for queries, it is neater to put "optional" data in it's own table. And if more types of persons are required (pedestrian, car_driver, car_passenger) they can be accommodated with more child tables. You can always use a view to make them look like one table.
BTW for databases, we say Parent and Child, not "inherets".
To answer the part about problems/consequences of option 2 - well, none too serious. This is a database, so you can always re-arrange things later, but there will be a price to pay in rewriting queries and code if you restructure tables. Why I don't like Option 2 is because it can lead to extra tables not re-using the person part. If it looks like that table is for car_owners, I might make an entirely new table for car_passengers with duplication of all the person columns.
In short, nothing too tragic should happen with either approach, they are each preferable for different reasons and the drawbacks are mainly inconvenience and potential future messiness.
i am trying to design a table which contains sections and each section contains tasks and each task contains sub tasks and so on. I would like to do it under one table. Please let me know the best single table approach which is scalable. I am pretty new to database design. Also please suggest if single table is not the best approach then what could be the best approach to do this. I am using db2.
Put quite simply, I would say use 1 table for tasks.
In addition to all its various other attributes, each task should have a primary identifier, and another column to optionally contain the identifier of its parent task.
If you are using DB2 for z/OS, then you will use a recursive query with a common table expression. Otherwise you you can use a hierarchical recursive query in DB2 for i, or possibly in DB2 for LUW (Linux, Unix, Windows).
Other designs requiring more tables, each specializing in a certain part of the task:subtask relationship, may needlessly introduce issues or limitations.
There are a few ways to do this.
One idea is to use two tables: Sections and Tasks
There could be a one to many relationship between the two. The Task table could be designed as a tree with a TaskId and a ParentTaksId which means you can have Tasks that go n-levels deep (sub tasks of sub tasks og sub tasks etc). Every Task except for the root task will have a parent.
I guess you can also solve this by using a single table where you just add a section column to the Task table I described above.
If you are going to put everything into one table although convenient will be inefficient in the long run. This would mean you will be storing unnecessary repeated groups of data in your database which would not be processor and memory friendly at all. It would in fact violate the Normalization rules and to be more specific the 1st Normal Form which says that there should be no repeating groups that could be found in your table. And it would actually also violate the 3rd Normal Form which means there will be no (transitional) dependency of a non-primary key to another non-primary key.
To give you an illustration, I will put your design into one table. Although I will be guessing on the possible fields but just bear with it because this is for the sake of discussion. Look at the graphics below:
If you look the graphics above (although this is rather small you could download the image and see it closer for yourself), the SectionName, Taskname, TaskInitiator, TaskStartDate and TaskEndDate are unnecessary repeated which as I mentioned earlier a violation of the 1st Normal Form.
Secondly, Taskname, TaskInitiator, TaskStartDate and TaskEndDate are functionally dependent on TaskID which is not a primary key instead of SectionID which in this case should be the primary key (if on a separate table). This is violation of 3rd Normal Form which says that there should be no Transitional Dependence or non-primary key should be dependent on
another non-primary key.
Although there are instances that you have to de-normalized but I believe this one should be normalized. In my own estimation there should be three tables involved in your design, namely, Sections,Tasks and SubTasks that would like the one below.
Section is related to Tasks, that is, a section could have many Tasks.
And Task is related to Sub-Tasks, that is, a Task could have many Sub-tasks.
If I understand correctly the original poster does not know, how many levels of hierarchy will be needed (hence "and so on"). His problem is to create a design that can hold a structure of any depth.
Imho that is a complex issue that does not have a single answer. When implementing such a design you need to count such factors as:
Will the structure be fairly constant? (How many writes?)
How often will this structure be read?
What operations will need to be possible? (Get all children objects of a given object? Get the parent object? Get the direct children?)
If the structure will be constant You could use the nested set model (http://en.wikipedia.org/wiki/Nested_set_model)
In this way the table has a 'left' and 'right' column. The parent object has its left and right column encompasing the values of any of its children object.
In that way you can list all the children of an object using a query like this:
SELECT child.id
FROM table AS parent
JOIN table AS child
ON child.left BETWEEN parent.left AND parent.right
AND child.right BETWEEN parent.left AND parent.right
WHERE
parent.id = #searchId
This design can be VERY fast to read, but is also EXTREMELY costly when the structure changes (for example when adding a child to any object You will have to update any object with a 'right' value that is higher than the inserted one).
If you need to be able to make changes to structure in real time you should probably use a design with two tables - one holding the objects, the second the structure (something like parentId, childId, differenceInHierarchyLevels).
I'm having what some would call a rather strange problem/question.
Suppose I have a table, which may reference one (and only one) of many different other tables. How would I do that in the best way?? I'm looking for a solution which should work in a majority of databases (MS SQL, MySQL, PostgreSQL etc). The way I see it, there are a couple of different solutions (is any better than the other?):
Have one column for each possible reference. Only one of these columns may contain a value for any given row, all others are null. Allows for strict foreign keys, but it gets tedious when the number of "many" (possible referenced tables) gets large
Have a two column relationship, i.e. one column "describing" which table is referenced, and one referencing the instance (row in that table). Easily extended when the number of "many" (referenced tables) grows, though I can't perform single query lookup in a straightforward way (either left join all possible tables, or union multiple queries which joins towards one table each)
??
Make sense? What's best practise (if any) in this case?
I specifically want to be able to query data from the referenced entity, without really knowing which of the tables are being referenced.
How would you do?
Both of these methods are suitable in any relational database, so you don't have to worry about that consideration. Both result in rather cumbersome queries. For the first method:
select . . .
from t left outer join
ref1
on t.ref1id = ref1.ref1id left outer join
ref2
on t.ref2id = ref2.ref2id . . .
For the second method:
select . . .
from t left outer join
ref1
on t.anyid = ref1.ref1id and anytype = 'ref1' left outer join
ref2
on t.anyid = ref2.ref2id and anytype = 'ref2' . . .
So, from the perspective of query simplicity, I don't see a major advantage for one versus the other. The second version has a small disadvantage -- when writing queries, you have to remember what the name is for the join. This might get lost over time. (Of course, you can use constraints or triggers to ensure that only a fixed set of values make it into the column.)
From the perspective of query performance, the first version has a major advantage. You can identify the column as a foreign key and the database can keep statistics on it. This can help the database choose the right join algorithm, for instance. The second method does not readily offer this possibility.
From the perspective of data size, the first version requires storing the id for each of the possible values. The second is more compact. From the perspective of maintainability, the first is hard to add a new object type; the second is easy.
If you have a set of things that are similar to each other, then you can consider storing them in a single table. Attributes that are not appropriate can be NULLed out. You can even create views for the different flavors of the thing. One table may or may not be an option.
In other words, there is no right answer to this question. As with many aspects of database design, it depends on how the data is going to be used. Absent other information, I would probably first try to coerce the data into a single table. If that is just not reasonable, I would go with the first option if the number of tables can be counted on one hand, and the second if there are more tables.
1)
This is legitimate for small number of static tables. If you anticipate a number of new tables might need to be added in the future, take a look at 3) below...
2)
Please don't do that. You'd be forfeiting the declarative FOREIGN KEYs, which is one of the most important mechanisms for maintaining data integrity.
3)
Use inheritance. More info in this post:
What is the best design for a database table that can be owned by two different resources, and therefore needs two different foreign keys?
You might also be interested in looking at:
Implementing comments and Likes in database
Multiple one to many relationship design
How to avoid multiple tables tables to relations M: M?
database table design thoughts
Relating two database tables (associating an employee with an activity)
How to structure table Activities in a database?
Currently I have multiple n:m relationships between 2 tables:
Users --> Favourites(user_id,post_id) <-- Posts
Users --> Follow(user_id,post_id) <-- Posts
Would you rather have 2 join tables or just one join table with an attribute which marks the type of the join, so something like:
Users --> Users_Posts (user_id,post_id,type(VALUES="favourite,follow") <-- Posts
It's not exactly the same example as I have in my application, but I think you can get the idea.
I don't think there is one "right" answer here. I think it depends. If you use a single table with a "relationship type" column and frequently want to extract just relationships of a single type--say just favorites--then each query against that table will need to apply a WHERE clause to filter out the types you don't want. That might make for slow queries if you don't properly index the non-key "relationship type" column. Also, it makes it so future developers will always need to know and remember to filter to the types they want or they may unexpectedly get relationships data back that they don't intend. Having two separate tables is easier to understand. For example, it is easier for me to quickly know what to expect in a "Favorites" table than in a "Users_Posts" table, so separate tables may communicate the differences more quickly.
On the other hand, if you frequently need to select both relationship types in a single set, then having them in a single table is simpler because you don't need to worry about doing a UNION to combine the data from two tables into a single view. What if there were 10,000 different possible relationship types? Would you want 10,000 different tables, or would you prefer a single table? Most people would prefer a single table in that case.
So I think it depends on many factors, such as expected usage, size, etc. The "right" answer is more of an art than a science.
I need to store info about county, municipality and city in Norway in a mysql database. They are related in a hierarchical manner (a city belongs to a municipality which again belongs to a county).
Is it best to store this as three different tables and reference by foreign key, or should I store them in one table and relate them with a parent_id field?
What are the pros and cons of either solution? (both structural end efficiency wise)
If you've really got a limit of these three levels (county, municipality, city), I think you'll be happiest with three separate tables with foreign keys reaching up one level each. This will make queries almost trivial to write.
Using a single table with a parent_id field referencing the same table allows you to represent arbitrary tree structures, but makes querying to extract the full path from node to root an iterative process best handled in your application code.
The separate table solution will be much easier to use.
three different tables:
more efficient, if your application mostly accesses information about only one entity (county, municipality, city)
owner-member-relationship is a clear and elegant model ;)
County, Municipality, and City don't sound like they are the same kind of data ; so, I would use three different tables : one per data-type.
And, then, I would indeed use foreign keys between those.
Efficiency-speaking, not sure it'll change much :
you'll do joins on 3 tables instead of joining 3 times on the same table ; I suppose it's quite the same.
it might make a little difference when you need to work on only one of those three type of data ; but with the right indexes, the differences should be minimal.
But, structurally speaking, if those are three different kind of entities, it makes sense to use three different tables.
I would recommend for using three different tables as they are three different entities.
I would use only one table in those cases you don´t know the depth of the hierarchy, but it is not case.
I would put them in three different tables, just on the grounds that it is 3 different concepts. This will hamper speed and will complicate your queries. However given that MySQL does not have any special support for hirachical queries (like Oracle's connect by statement) these would be complicated anyway.
Different tables: it's just "right". I doubt you'll see any performance gains/losses either way but this is one where modelling it properly up-front will probably save you lots of headaches later on. For one thing it'll make SQL SELECTs easier to write and read.
You'll get different opinions coming back to you on this but my personal preference would be to have separate tables because they are separate entities.
In reality you need to think about the queries you will doing on this data and usually your answer will come from that. With separate tables your queries will look much cleaner and in the end your not saving yourself anything because you'll still be joining tables together, even if they are the same table.
I would use three separate tables, since you know exactly what categories of information you are working with, and won't need to dynamically alter the 'depth' of your hierarchy.
It'll also make the data simpler to manage, as you'll be able to tell if the data is for a city, municipality or a county just by knowing the table (and without having to discern the 'depth' of a record in the hierarchy first!).
Since you'll probably be doing self joins anyway to get the hierarchy to work, I'd doubt there would be any benefits from having all the data in a single table.
In dataware housing applications, adherents of the Kimball methodology might place these fields in the same attribute table:
create table city (
id int not null,
county varchar(50) not null,
municipality varchar(50),
city varchar(50),
primary key(id)
);
The idea being that attibutes should never be more than l join away from the fact table.
I just state this as an alternative view. I would go with the 3 table design personally.
This is a case of ‘Database Normalization’, which is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. The purpose is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Multiple tables will help in the situation if the task has been distributed among different developers, or users at different levels require different rights to view and change the data or the small tables help when you need this data for other purposes as well or so.
My vote would be for multiple tables - with data appropriately distributed.