I've got a database table that represents a bunch of trees. The first three columns are GUIDs that look like this:
NODE_ID (PK)
PARENT_NODE_ID (FK to same table, references NODE_ID)
TREE_ID (FK to another table)
It's possible to move a node to a different tree. The tricky part is bringing all its child-nodes with it. That takes a recursive update. (And yes, I realize this is kinda bad design in the first place. I didn't design it. I just have to maintain it, and I can't change the database schema.)
It would be nice if I could do the update in SQL, as a stored procedure. But I can't think of how to implement the recursive operation required in set logic, without employing a cursor. Does anyone know of a reasonably simple way to pull this off?
If you are using Postgres or MS SQL 2005 you can use a recursive update, otherwise, you may want to consider using a method other than an adjacency list. I saw a presentation a few weeks ago speaking about these issues and storing hierarchical data. Here is a link:
http://www.slideshare.net/billkarwin/practical-object-oriented-models-in-sql
Start # slide 40
Related
TASK
I am currently trying to work out a viable structure for a simple application for the costing of jobs. I have decided to create one table to house all the operations and then link the operation together via a ParentID field. Below is a simplified structure of this table:
As you can see, the primary key is an integer field that does auto increment to keep it unique. Any operations that stem off another operation will have it under the parent ID field to create a simplistic breakdown of work flow. Also on this data table is a field for costs, this is a field that I am most interested in.
THE PROBLEM
I would like to run a query where I could throw in an operation ID and it would recursively run through that operation AND all of its children and its children's children etc. This would then accumulate all of the cost fields in the records that it retrieves. The only way I can think to do this is through recursive loops which in my opinion are not the best way to do this.
THE QUESTION
So, my question is, is there a way to do this without recursive loops? If there is not, can anyone suggest the cleanest and quickest way with the loops?
This kind of query is recursive by definition. There is no way to get that information using that table structure.
You could make another table in which you would store all hierarchy information. On inserting an Operation you would have to add a parent, grandparent, grand-...-parent recursively, which may also not be a good idea, because the table would grow very large very quickly. It would make the queries much simpler though.
And a side note: I'd suggest naming the ParentID ParentOperationID. ParentID is too general.
I am attempting to store hierarchical data in SQL and have resolved to use
an object table, where all of the main data will be
and a closure table, defining the relationships between the objects (read more on closure tables here [slides 40 to 68]).
After quite a bit of research, a closure table seemed to suit my needs well. One thing that I kept reading, however, is that if you want to query the direct ancestor / descendant of a particular node - then you can use a depth column in your closure table (see slide 68 from the above link). I have a need for this depth column to facilitate this exact type of query. This is all well and good, but one of the main attractions to the closure table in the first place was the ease by which one could both query and modify data contained there in. And adding a depth column seems to complete destroy the ease by which one can modify data (imagine adding a new node and offsetting an entire branch of the tree).
So - I'm considering modifying my closure table to define relations only between a node and its immediate ancestor / descendant. This allows me to still easily traverse the tree. Querying data seems relatively easy. Modifying data is not as easy as the original closure table without the depth field, but significantly easier than the one with the depth field. It seems like a fair compromise (almost between a closure table and an adjacency list).
Am I overlooking something though? Am I loosing one of the key advantages of the closure table by doing it this way? Does anyone see any inherent risks in doing it this way that may come to haunt me later?
I believe the key advantage you are losing is that if you want to know all of the descendants or ancestors of a node, you now have to do a lot more traversals.
For example, if you start with the following simple tree:
A->B->C->D
To get all descendants of A you have to go A->B then B->C then C->D. So, three queries, as opposed to a single query if following the normal pattern.
SQL Server 2005.
In our application, we have an entity with a parent table as well as several child tables. We would like to track revisions made to this entity. After going back and forth, we've narrowed it down to two approaches to choose from.
Have one history table for the entity. Before a sproc updates the table, retrieve the entire current state of the entity from the parent table and all child tables. XMLize it and stick it into the history table as the XML data type. Include some columns to query by, as well as a revision number/created date.
For each table, create a matching history table with the same columns. Also have a revision number/created date. Before a sproc updates a single table, retrieve the existing state of the record for that one table, and copy it into the history table. So, it's a little bit like SVN. If I want to get an entity at revision Y, I need to get the history record in each table with the maximum revision number that is not greater than Y. An entity might have 50 revision records in one table, but only 3 revision records in a child table, etc. I would probably want to persist the revision counter for the entire entity somewhere.
Both approaches seem to have their headaches, but I still prefer solution #2 to solution #1. This is a database that's already huge, and already suffers from performance issues. Bloating it with XML blobs on every revision (and there will be plenty) seems like a horrible way to go. Creating history tables for everything is a cost I'm willing to eat, as long as there's not a better way to do this.
Any suggestions?
Thanks,
Tedderz
Number 2 is almost certainly the way to go, and I do something like this with my history tables, though I use an "events" table as well to correlate the changes with one another instead of using a timestamp. I guess this is what you mean by a "revision counter". My "events" table contains a unique ID, a timestamp (of course), the application user responsible for the change, and an "action" designator which represents the application-level action that the user made which caused the change to happen.
Why #2? Because you can more easily partition the table to archive or roll-off old entries. Because it's easier to index. Because it's a WHOLE lot easier to query. Because it has less overhead than XML and is a lot smaller.
Also, consider using triggers instead of coding a stored procedure to do all of this. Triggers are almost always to be avoided, but for things like this, they're a fairly lightweight and robust way to perform this kind of thing.
I need to create a sql stored procedure (Sql Server 2008 - T-SQL) which copies a node in an adjacency model.
Table can be seen as having two columns, Id and ParentId (FK to Id). Copying means that also all subordinates need to be copied.
I think that using WITH is a good start, but I'm curious if I can do this copy without using Cursors.
The fundamental problem with adjacency lists is there is no general way in SQL to extract an entire sub tree, so you already have a problem of identifying all the rows you need to duplicate without resorting to a cursor.
If possible migrate your adjacency list to a nested set model which allows you to easily identify all the nodes of a subtree. However, the maintenance of a nested set model is more complex for general inserts and deletes.
EDIT: As pointed out by 'a_horse_with_no_name' there is a way in general SQL to process adjacency lists, recursive common table expressions.
Copying a whole sub-tree is a bit of a problem because when you copy your sub-tree you are either
denormalizing data or
using it as a template of some sorts.
In either case you are dragging data through inconsistent state at some point - which indicates some problems with your design (for example do your records need to have multiple parents or not? if yes, then you should consider redesigning).
So, you should update the answer with a more complete example of what you are trying to do.
One solution would be to have a temporary table, selecting for the insert should not be a problem, it is just updating the referenced IDs that would be a problem.
So
WITH INSERT into temporary table
UPDATE the IDs
INSERT into original table
DELETE temp records
The procedure needs to go like this because it would be hard to change the IDs (both record IDs and ID referring to parent) in initial WITH INSERT. However it might be possible, if there was a nice function that depended only on max_id or only on old IDs.
I have a need to build a schema structure to support table of contents (so the level of sections / sub-sections could change for each book or document I add)...one of my first thoughts was that I could use a recursive table to handle that. I want to make sure that my structure is normalized, so I was trying to stay away from deonormalising the table of contents data into a single table (then have to add columns when there are more sub-sections).
It doesn't seem right to build a recursive table and could be kind of ugly to populate.
Just wanted to get some thoughts on some alternate solutions or if a recursive table is ok.
Thanks,
S
It helps that SQL Server 2008 has both the recursive WITH clause and hierarchyid to make working with hierarchical data easier - I was pointing out to someone yesterday that MySQL doesn't have either, making things difficult...
The most important thing is to review your data - if you can normalize it to be within a single table, great. But don't shoehorn it in to fit a single table setup - if it needs more tables, then design it that way. The data & usage will show you the correct way to model things.
When in doubt, keep it simple. Where you've a collection of similar items, e.g. employees then a table that references itself makes sense. Whilst here you can argue (quite rightly) that each item within the table is a 'section' of some form or another, unless you're comfortable with modelling the data as sections and handling the different types of sections through relationships to these entities, I would avoid the complexity of a self-referencing table and stick with a normalized approach.