Imagine I have a Parent.hasMany(Child) relationship, if I have an API to query a Parent but also need to surface how many children this parent has, I have 2 immediate options:
Run a query on COUNT(child.id) (I feel this must be very hard to scale as we add more and more children in for a given Parent
Maybe have a n_count attribute defined on the Parent and do a SQL transaction to modify the count on the parent every time a Child is created/deleted
Which is the better option here, or is there a third and best way?
Storing n_count in the parent is generally considered undesirable because it is redundant information (information which can be obtained more reliably by counting the child records). Having said that, if the updates to the parent and child rows (n_count) is controlled to guarantee correct updates (by database triggers, for example), then this can be called a type of 'controlled de-normalisation', and used for performance improvement (only improves read-queries, updates/inserts will be slower, of course).
Related
I want to have a database with two tables, like so:
Parent
-------
Pk
UgliestChildPk
StrongestChildPk
Child
------
Pk
ParentPk
I'm working with SQLite, although that probably doesn't pertain much.
In my model, a Parent always has at least one Child, in which case the ugliest one and the strongest one would be the Parent's only Child.
I need to be able to retrieve all of a Parent's Children (via the Child.ParentPk foreign key). I also need to be able to efficiently retrieve a Parent's ugliest and strongest child.
I need to be able to add and remove children, as well as change which ones are the ugliest and strongest.
I gather it throws a flag that the Parent and Child tables both refer to each other. Is there a better way to accomplish these sorts of relationships?
I could add an IsUgliest and IsStrongest column to the Child table, but I want to efficiently ensure that there is one and only one ugliest and strongest child for each Parent (although they could be the same).
I also don't want to have to add indexes to the IsUgliest and IsStrongest columns in order to retrive them quickly. The schema I've described prevents that since the Pk columns are implicitly indexed anyway.
Is there a better way?
Any suggestions?
I prefer option 2 below as it would be the easiest to maintain and craft logic for.
Option 1
One way to accomplish this would be to specify the [strongest] not null unique in the table creation. You would then decide how you wanted to list the strongest, e.g. 1 being the strongest.
Although this would allow you to do what you are asking, you would need some logic to rearrange the values in the table so that you could add or remove other children. This would allow you to easily reassign the ugliest child based on the ascending/descending order you set on your ugly/strength score
Option 2
A second way to do this would be to still use the IsStrongest and IsUgliest columns, but set their type to Boolean, and then perform a check to make sure there is only one person set to true when attempting to add a new ugliest or strongest child.
Option 3
Use the table structure you provided to quickly keep track of the strongest and ugliest child right in the parent table. You would probably need an alternate way of tracking the strength and beauty of a child so that you can easily determine who is the parent's ugliest/strongest child
Additionally
If you have the possibility of having more than one parent, you should use a linked table that looks like this
Parent_Child
------------
PCPK
Parent_FK
Child_FK
If in a databse we have a parent table and two children tables. Is it better to use joins to get the children or add a flag to distinguish them ?
For example, the parent table is Person[Person_Name, Person_ID]. The first child table is Employee[Person_ID, Employee_ID, Department] and the other child is Customer[Person_ID, Location, Rank].
So, is it a good thing to add flag [isEmployee] or [isCustomer] to the parent table (Person) and save the effort of Joining the tables on "person_Id" ?
Another case would be with one child, for example, the parent table would be Member[Member_Name, Member_ID] and a child table GoldenMember[Member_ID, Phone_Number, EMail].
Now in this case, if I want to show the info of a specific Member, I need to do a join between tables to see whether it's a Golden Memmber or not, but if the flag "isGolden" was in the table (Member) it would save us a join?
So, which is better and why ??
Thanks in advance :)
There is no "better" unless you provide criteria for measurement of "goodness".
SQL's support for entity subtyping is inadequate. You can hack your way around any of the shortcomings that there are, but each hack will do no more than introduce new problems of its own.
Additional "Type" columns on the top level introduce the problem of database updating becoming more complex. Defective update procedures will corrupt the database's integrity.
Leaving out the additional "Type" columns at the top level will make the problem of formulating read queries more complex (more joins, notably). Many people would add here "and degrade performance", but it's unlikely that you will suffer noticeably from this.
Choose which difficulty is the easiest to live with in your particular use case.
i am trying to design a table which contains sections and each section contains tasks and each task contains sub tasks and so on. I would like to do it under one table. Please let me know the best single table approach which is scalable. I am pretty new to database design. Also please suggest if single table is not the best approach then what could be the best approach to do this. I am using db2.
Put quite simply, I would say use 1 table for tasks.
In addition to all its various other attributes, each task should have a primary identifier, and another column to optionally contain the identifier of its parent task.
If you are using DB2 for z/OS, then you will use a recursive query with a common table expression. Otherwise you you can use a hierarchical recursive query in DB2 for i, or possibly in DB2 for LUW (Linux, Unix, Windows).
Other designs requiring more tables, each specializing in a certain part of the task:subtask relationship, may needlessly introduce issues or limitations.
There are a few ways to do this.
One idea is to use two tables: Sections and Tasks
There could be a one to many relationship between the two. The Task table could be designed as a tree with a TaskId and a ParentTaksId which means you can have Tasks that go n-levels deep (sub tasks of sub tasks og sub tasks etc). Every Task except for the root task will have a parent.
I guess you can also solve this by using a single table where you just add a section column to the Task table I described above.
If you are going to put everything into one table although convenient will be inefficient in the long run. This would mean you will be storing unnecessary repeated groups of data in your database which would not be processor and memory friendly at all. It would in fact violate the Normalization rules and to be more specific the 1st Normal Form which says that there should be no repeating groups that could be found in your table. And it would actually also violate the 3rd Normal Form which means there will be no (transitional) dependency of a non-primary key to another non-primary key.
To give you an illustration, I will put your design into one table. Although I will be guessing on the possible fields but just bear with it because this is for the sake of discussion. Look at the graphics below:
If you look the graphics above (although this is rather small you could download the image and see it closer for yourself), the SectionName, Taskname, TaskInitiator, TaskStartDate and TaskEndDate are unnecessary repeated which as I mentioned earlier a violation of the 1st Normal Form.
Secondly, Taskname, TaskInitiator, TaskStartDate and TaskEndDate are functionally dependent on TaskID which is not a primary key instead of SectionID which in this case should be the primary key (if on a separate table). This is violation of 3rd Normal Form which says that there should be no Transitional Dependence or non-primary key should be dependent on
another non-primary key.
Although there are instances that you have to de-normalized but I believe this one should be normalized. In my own estimation there should be three tables involved in your design, namely, Sections,Tasks and SubTasks that would like the one below.
Section is related to Tasks, that is, a section could have many Tasks.
And Task is related to Sub-Tasks, that is, a Task could have many Sub-tasks.
If I understand correctly the original poster does not know, how many levels of hierarchy will be needed (hence "and so on"). His problem is to create a design that can hold a structure of any depth.
Imho that is a complex issue that does not have a single answer. When implementing such a design you need to count such factors as:
Will the structure be fairly constant? (How many writes?)
How often will this structure be read?
What operations will need to be possible? (Get all children objects of a given object? Get the parent object? Get the direct children?)
If the structure will be constant You could use the nested set model (http://en.wikipedia.org/wiki/Nested_set_model)
In this way the table has a 'left' and 'right' column. The parent object has its left and right column encompasing the values of any of its children object.
In that way you can list all the children of an object using a query like this:
SELECT child.id
FROM table AS parent
JOIN table AS child
ON child.left BETWEEN parent.left AND parent.right
AND child.right BETWEEN parent.left AND parent.right
WHERE
parent.id = #searchId
This design can be VERY fast to read, but is also EXTREMELY costly when the structure changes (for example when adding a child to any object You will have to update any object with a 'right' value that is higher than the inserted one).
If you need to be able to make changes to structure in real time you should probably use a design with two tables - one holding the objects, the second the structure (something like parentId, childId, differenceInHierarchyLevels).
I would like to refresh an entity and all its child collections. What is the best way to do this? I'm talking about nhibernate:)
I've read about session.Evict, session.Refresh...
But I'm still not sure if doing like:
RefreshEntity<T>(T entity)
{
session.Evict(entity);
session.Refresh(entity);
}
would work exactly how I want it to work
Is it going to work? If not What else I can do?
Refresh after Evict probably won't work.
Theoretically, Refresh alone should be enough. However, it has known issues when elements of child collections have been deleted.
Evict followed by Get usually gets things done.
Refresh(parentObject) would be a good option, but for me, it first fetched all children one by one with single requests. No batching, no subquery, no join. Very bad!
It helped to .Clear() the child collection of the parent object; I also evicted the child objects before.
(these had been changed by a HQL update before where multiple inserts by parent/children SaveOrUpdate would cause expensive clustered index rebuilds).
EDIT: I removed the HQL update again, since the query (decrement index by a unique, large number) was more expensive than hundreds of single row updates in a batch. So I ended up in a simple SaveOrUpdate(parentObject), with no need to refresh.
The reason was a child collection with unique constraint on ParentID and Index (sequential number), which would result in uniqueness violations while updating the changed children items. So the index was first incremented by 1000000 (or an arbitrary high number) for all children, then after changes, decremented again.
If I have a parent and a child table filled with data, is it trivial to add a new table between them?
For example, before introduction the relationship is:
Parent -> Child
Then:
Parent -> New Table -> Child
In this case I'm referring to SQLite3 so a Child in this schema has a Foreign Key which matches the Primary Key of the Parent Table.
Thanks!
This may be too obvious, but...
How trivial it is will be dependent on how much code has already been written that needs to change with this. If this is a new app, with little code written, and you're just beginning to work on the design, then yes it's trivial. If you have tons of functions (whether it's external code, or DB code like stored procedures and views) accessing these tables expecting the original relationship then it becomes less trivial.
Changing it in the database should be relatively non-trivial, assuming you know enough SQL to populate the new table and set up the relations.
As with all development challenges, you just need to look at what will be affected, how, and determine how you're going to account for those changes.
All of this is a really long-winded way of saying "it depends on your situaiton".
I am not disagreeing with David at all, just being precise re a couple of aspects of the change.
If you have implemented reasonable standards, then the only code affected with be code that addresses the changed columns in Child (not New_Table). If you have not, then an unknown amount of code, which should not need to change, will have to change.
The second consideration is the quality of the Primary Key in Child. If you have Natural Relational Keys, the addition of New_Table has less impact, not data changes required. If you have IDENTITY type keys, then you may need to reload, or worse, "re-factor" the keys.
Last, introducing New_Table is a correction to a Normalisation error, which is a good thing. Consequentially, certain Child.columns will become New_Table.columns, and New_Table can be loaded from the existing data. You need to do that correctly and completely, in order to realise the performance gain from the correction. That may mean changing a couple more code segments.
If you have ANSI SQL, all the tasks are fairly straight-forward and easy.