TASK
I am currently trying to work out a viable structure for a simple application for the costing of jobs. I have decided to create one table to house all the operations and then link the operation together via a ParentID field. Below is a simplified structure of this table:
As you can see, the primary key is an integer field that does auto increment to keep it unique. Any operations that stem off another operation will have it under the parent ID field to create a simplistic breakdown of work flow. Also on this data table is a field for costs, this is a field that I am most interested in.
THE PROBLEM
I would like to run a query where I could throw in an operation ID and it would recursively run through that operation AND all of its children and its children's children etc. This would then accumulate all of the cost fields in the records that it retrieves. The only way I can think to do this is through recursive loops which in my opinion are not the best way to do this.
THE QUESTION
So, my question is, is there a way to do this without recursive loops? If there is not, can anyone suggest the cleanest and quickest way with the loops?
This kind of query is recursive by definition. There is no way to get that information using that table structure.
You could make another table in which you would store all hierarchy information. On inserting an Operation you would have to add a parent, grandparent, grand-...-parent recursively, which may also not be a good idea, because the table would grow very large very quickly. It would make the queries much simpler though.
And a side note: I'd suggest naming the ParentID ParentOperationID. ParentID is too general.
Related
I am attempting to store hierarchical data in SQL and have resolved to use
an object table, where all of the main data will be
and a closure table, defining the relationships between the objects (read more on closure tables here [slides 40 to 68]).
After quite a bit of research, a closure table seemed to suit my needs well. One thing that I kept reading, however, is that if you want to query the direct ancestor / descendant of a particular node - then you can use a depth column in your closure table (see slide 68 from the above link). I have a need for this depth column to facilitate this exact type of query. This is all well and good, but one of the main attractions to the closure table in the first place was the ease by which one could both query and modify data contained there in. And adding a depth column seems to complete destroy the ease by which one can modify data (imagine adding a new node and offsetting an entire branch of the tree).
So - I'm considering modifying my closure table to define relations only between a node and its immediate ancestor / descendant. This allows me to still easily traverse the tree. Querying data seems relatively easy. Modifying data is not as easy as the original closure table without the depth field, but significantly easier than the one with the depth field. It seems like a fair compromise (almost between a closure table and an adjacency list).
Am I overlooking something though? Am I loosing one of the key advantages of the closure table by doing it this way? Does anyone see any inherent risks in doing it this way that may come to haunt me later?
I believe the key advantage you are losing is that if you want to know all of the descendants or ancestors of a node, you now have to do a lot more traversals.
For example, if you start with the following simple tree:
A->B->C->D
To get all descendants of A you have to go A->B then B->C then C->D. So, three queries, as opposed to a single query if following the normal pattern.
I need to create a sql stored procedure (Sql Server 2008 - T-SQL) which copies a node in an adjacency model.
Table can be seen as having two columns, Id and ParentId (FK to Id). Copying means that also all subordinates need to be copied.
I think that using WITH is a good start, but I'm curious if I can do this copy without using Cursors.
The fundamental problem with adjacency lists is there is no general way in SQL to extract an entire sub tree, so you already have a problem of identifying all the rows you need to duplicate without resorting to a cursor.
If possible migrate your adjacency list to a nested set model which allows you to easily identify all the nodes of a subtree. However, the maintenance of a nested set model is more complex for general inserts and deletes.
EDIT: As pointed out by 'a_horse_with_no_name' there is a way in general SQL to process adjacency lists, recursive common table expressions.
Copying a whole sub-tree is a bit of a problem because when you copy your sub-tree you are either
denormalizing data or
using it as a template of some sorts.
In either case you are dragging data through inconsistent state at some point - which indicates some problems with your design (for example do your records need to have multiple parents or not? if yes, then you should consider redesigning).
So, you should update the answer with a more complete example of what you are trying to do.
One solution would be to have a temporary table, selecting for the insert should not be a problem, it is just updating the referenced IDs that would be a problem.
So
WITH INSERT into temporary table
UPDATE the IDs
INSERT into original table
DELETE temp records
The procedure needs to go like this because it would be hard to change the IDs (both record IDs and ID referring to parent) in initial WITH INSERT. However it might be possible, if there was a nice function that depended only on max_id or only on old IDs.
I have a need to build a schema structure to support table of contents (so the level of sections / sub-sections could change for each book or document I add)...one of my first thoughts was that I could use a recursive table to handle that. I want to make sure that my structure is normalized, so I was trying to stay away from deonormalising the table of contents data into a single table (then have to add columns when there are more sub-sections).
It doesn't seem right to build a recursive table and could be kind of ugly to populate.
Just wanted to get some thoughts on some alternate solutions or if a recursive table is ok.
Thanks,
S
It helps that SQL Server 2008 has both the recursive WITH clause and hierarchyid to make working with hierarchical data easier - I was pointing out to someone yesterday that MySQL doesn't have either, making things difficult...
The most important thing is to review your data - if you can normalize it to be within a single table, great. But don't shoehorn it in to fit a single table setup - if it needs more tables, then design it that way. The data & usage will show you the correct way to model things.
When in doubt, keep it simple. Where you've a collection of similar items, e.g. employees then a table that references itself makes sense. Whilst here you can argue (quite rightly) that each item within the table is a 'section' of some form or another, unless you're comfortable with modelling the data as sections and handling the different types of sections through relationships to these entities, I would avoid the complexity of a self-referencing table and stick with a normalized approach.
I've got a database table that represents a bunch of trees. The first three columns are GUIDs that look like this:
NODE_ID (PK)
PARENT_NODE_ID (FK to same table, references NODE_ID)
TREE_ID (FK to another table)
It's possible to move a node to a different tree. The tricky part is bringing all its child-nodes with it. That takes a recursive update. (And yes, I realize this is kinda bad design in the first place. I didn't design it. I just have to maintain it, and I can't change the database schema.)
It would be nice if I could do the update in SQL, as a stored procedure. But I can't think of how to implement the recursive operation required in set logic, without employing a cursor. Does anyone know of a reasonably simple way to pull this off?
If you are using Postgres or MS SQL 2005 you can use a recursive update, otherwise, you may want to consider using a method other than an adjacency list. I saw a presentation a few weeks ago speaking about these issues and storing hierarchical data. Here is a link:
http://www.slideshare.net/billkarwin/practical-object-oriented-models-in-sql
Start # slide 40
I feel that this is likely a common problem, but from my google searching I can't find a solution quite as specific to my problem.
I have a list of Organizations (table) in my database and I need to be able to run queries based on their hierarchy. For example, if you query the highest Organization, I would want to return the Id's of all the Organizations listed under that Organization. Further, if I query an organization sort of mid-range, I want only the Organization Id's listed under that Organization.
What is the best way to a) set up the database schema and b) query? I want to only have to send the topmost Organization Id and then get the Id's under that Organization.
I think that makes sense, but I can clarify if necessary.
As promised in my comment, I dug up an article on how to store hierarchies in a database that allows constant-time retrieval of arbitrary subtrees. I think it will suit your needs much better than the answer currently marked as accepted, both in ease of use and speed of access. I could swear I saw this same concept on wikipedia originally, but I can't find it now. It's apparently called a "modified preorder tree traversal". The gist of it is you number each node in the tree twice, while doing a depth-first traversal, once on the way down, and once on the way back up (i.e. when you're unrolling the stack, in a recursive implementation). This means that the children of a given node have all their numbers in between the two numbers of that node. Throw an index on those columns and you've got really fast lookups. I'm sure that's a terrible explanation, so read the article, which goes into more depth and includes pictures.
One simple way is to store the organization's parentage in a text field, like:
SALES-EUROPE-NORTH
To search for every sales organization, you can query on SALES-%. For each European sales org, query on SALES-EUROPE-%.
If you rename an organization, take care to update its child organizations as well.
This keeps it simple, without recursion, at the cost of some flexibility.
The easy way is to have a ParentID column, which is a foreign key to the ID column in the same table, NULL for root nodes. But this method has some drawbacks.
Nested sets are an efficient way to store trees in an relational database.
You could have an Organization have an id PK and a parent FK reference to the id. Then for the query, use (if your database backend supports them) recursive queries, aka Common Table Expressions.