What is the best way to setup hierarchy in sql server 2005? - sql-server-2005

I am trying to setup an hierarchial structure for company. I was using SQL Server 2008 and used hierarchy id to set it up. But now I need to move back to SQL Server 2005...and "just do it"... Anyways I thought of setting it up this simple way -
Id | ParentId | CompanyName | Desc
where ParentId is a int field which will store id of the parent. I guess the root will have its ParentId as zero. Are there better ways to setup the hierarchy structure?
I really don't have that complex requirements for hierarchy... I want to know just what would make I guess traversing though the hierarchy easier and working with it more efficient.

The "simple way" is fine and works well with CTEs (Common Table Expression). However, as suggested by Kev, there are other ways which have their pros and cons.
So in the end it depends on your exact requirements, and how much insert vs. hierarchical queries will be done on the data because the performance of the different approaches can vary a lot in this regard.

Joe Celko's method of using nested sets where your table has a "left" and "right" column referring to the hierarchy is how I have usually seen it done
Joe Celko will probably explain it better than I can
Nested sets

Unfortunately, as far as I know the way you are setting it up is the correct way. You can't traverse the links as easily now because you lose GetAncestor and GetDescendant. A decent replacement is to use CTEs to replace GetAncestor and GetDescendant, and use them recursively.
Here is an example (using a menu hierarchy):
WITH MenuCTE(MenuKey, ParentMenuKey, MenuName) AS
(
-- Anchor Query
SELECT MenuKey, ParentMenuKey, MenuName FROM Menu WHERE MenuKey = 1
UNION ALL
-- Recursive Query
SELECT m.MenuKey, m.ParentMenuKey, m.MenuName FROM Menu m
INNER JOIN MenuCTE r ON m.ParentMenuKey = r.MenuKey
)
SELECT MenuKey, ParentMenuKey, MenuName FROM MenuCTE
This article should help (example is from here):
http://www.infoq.com/news/2007/10/CTE

Related

store hierarchy information in a MS Access DB for faster queries

I'm trying to figure out how I can store hierarchical type information in a MS Access DB so that queries will be faster. An use case example might make more sense.
I have a table that has two fields
a name
a hierarchy
a hierarchy is an X # of level folder structure:
\a\b\c\d
\a\b\c\d\e
\a\b\c\d\f\g
\a\b\h
\a\b\i\j
you get the idea
the table will be filled with 300,000 rows
each row will have a name and a hierarchy
At this point:
if I want to find all the names that are in a hierarchy, including sub-hierarchies I can run a like query: where [hierarchy] like '\a\b\*'
I can even do wildcard joins even though MS Access's query design GUI doesn't handle it and I have to use the SQL view: join on [hierarchy] like '\a\b\*'.
But it can be very slow. Especially if my joins get complex.
So I thought maybe there is a way to create another table that would all the hierarchies and it would maintain parent/child relationships and the first table would reference a row in it. And then, somehow, I could use it to find rows in the first table that match hierarchies and sub-hierarchies in the second table.
However, I have no clue if this is even possible and how I would go about it. Any advice is appreciated.
In Oracle we use the hierarchal structure where each row has a reference to its parent. Then with the CONNECT BY clause you can connect these rows to each-other.
You should take a look here: simulation of connect-by in sql-server

Selecting all descendant rows from an Oracle table representing a tree structure

I have a table MYTYPE in Oracle 10g representing a tree structure, which is something like this:
ID | PARENTID | DETAIL
I would like to select all rows in MYTYPE which are descendants of a particular ID, such that I can create queries elsewhere such as:
SELECT *
FROM MYDETAIL
WHERE MYTYPEID IN [all MYTYPE which are descendants of some ID];
What is a cost-efficient way of building the descendant set, preferably without using PL/SQL?
Oracle didn't support the ANSI hierarchical syntax of using a recursive Subquery Factoring (CTE in SQL Server syntax) until 11g R2, so you have to use Oracle's native CONNECT BY syntax (supported since v2):
SELECT t.*
FROM MYTABLE t
START WITH t.parentid = ?
CONNECT BY PRIOR t.id = t.parentid
Replace the question mark with the parent you want to find the hierarchical data based on.
Reference:
ASK TOM: "connect by"
Managing hierarchical data using ID,ParentID columns in an RDBMS is known as the Adjacency List model. While very easy to implement and maintain (i.e. insert, update, delete), it's expensive to determine lineage (i.e. ancestors and descendants). As other answers already have written, Oracle's CONNECT BY will work, but this is an expensive operation. You may be better off representing your data differently.
For your case, the easiest solution might adding a what's called a Hierarchy Bridge table to your schema and adding a LEVEL column to your original table. The table has columns ID,DescendantID whereby selecting on ID gives all descendant records, and selecting by DescentantID gives all ancestor records. LEVEL is necessary on the base table to order records. In this way you make a tradeoff of expensive updates for cheap reads, which is what your question implies you want.
Other possibilities that involve changing your base data include Nested Set and Materialized Path representations. That offer similar tradeoffs of more expensive writes for much cheaper reads. For a complete list of options, pros and cons, and some implementation notes, see my previous question on the topic.
Oracle can do recursive queries.
Try looking into start with ... connect by, something like this:
Select *
from MYDETAIL
Starting with PARENTID= 1 --or whatever the root ID is
connect by PARENTID = prior ID
http://psoug.org/reference/connectby.html
Here is the details for 'connect by' features in oracle.
http://psoug.org/reference/connectby.html

When to use recursive table

I have a need to build a schema structure to support table of contents (so the level of sections / sub-sections could change for each book or document I add)...one of my first thoughts was that I could use a recursive table to handle that. I want to make sure that my structure is normalized, so I was trying to stay away from deonormalising the table of contents data into a single table (then have to add columns when there are more sub-sections).
It doesn't seem right to build a recursive table and could be kind of ugly to populate.
Just wanted to get some thoughts on some alternate solutions or if a recursive table is ok.
Thanks,
S
It helps that SQL Server 2008 has both the recursive WITH clause and hierarchyid to make working with hierarchical data easier - I was pointing out to someone yesterday that MySQL doesn't have either, making things difficult...
The most important thing is to review your data - if you can normalize it to be within a single table, great. But don't shoehorn it in to fit a single table setup - if it needs more tables, then design it that way. The data & usage will show you the correct way to model things.
When in doubt, keep it simple. Where you've a collection of similar items, e.g. employees then a table that references itself makes sense. Whilst here you can argue (quite rightly) that each item within the table is a 'section' of some form or another, unless you're comfortable with modelling the data as sections and handling the different types of sections through relationships to these entities, I would avoid the complexity of a self-referencing table and stick with a normalized approach.

Any way to recursively update a tree in SQL?

I've got a database table that represents a bunch of trees. The first three columns are GUIDs that look like this:
NODE_ID (PK)
PARENT_NODE_ID (FK to same table, references NODE_ID)
TREE_ID (FK to another table)
It's possible to move a node to a different tree. The tricky part is bringing all its child-nodes with it. That takes a recursive update. (And yes, I realize this is kinda bad design in the first place. I didn't design it. I just have to maintain it, and I can't change the database schema.)
It would be nice if I could do the update in SQL, as a stored procedure. But I can't think of how to implement the recursive operation required in set logic, without employing a cursor. Does anyone know of a reasonably simple way to pull this off?
If you are using Postgres or MS SQL 2005 you can use a recursive update, otherwise, you may want to consider using a method other than an adjacency list. I saw a presentation a few weeks ago speaking about these issues and storing hierarchical data. Here is a link:
http://www.slideshare.net/billkarwin/practical-object-oriented-models-in-sql
Start # slide 40

Which are the SQL improvements you are waiting for?

Dealing with SQL shows us some limitations and gives us an opportunity to imagine what could be.
Which improvements to SQL are you waiting for? Which would you put on top of the wish list?
I think it can be nice if you post in your answer the database your feature request lacks.
T-SQL Specific: A decent way to select from a result set returned by a stored procedure that doesn't involve putting it into a temporary table or using some obscure function.
SELECT * FROM EXEC [master].[dbo].[xp_readerrorlog]
I know it's wildly unrealistic, but I wish they'd make the syntax of INSERT and UPDATE consistent. Talk about gratuitous non-orthogonality.
Operator to manage range of dates (or numbers):
where interval(date0, date1) intersects interval(date3, date4)
EDIT: Date or numbers, of course are the same.
EDIT 2: It seems Oracle have something to go, the undocumented OVERLAPS predicate. More info here.
A decent way of walking a tree with hierarchical data. Oracle has CONNECT BY but the simple and common structure of storing an object and a self-referential join back to the table for 'parent' is hard to query in a natural way.
More SQL Server than SQL but better integration with Source Control. Preferably SVN rather than VSS.
Implicit joins or what it should be called (That is, predefined views bound to the table definition)
SELECT CUSTOMERID, SUM(C.ORDERS.LINES.VALUE) FROM
CUSTOMER C
A redesign of the whole GROUP BY thing so that every expression in the SELECT clause doesn't have to be repeated in the GROUP BY clause
Some support for let expressions or otherwise more legal places to use an alias, a bit related to the GROUP BY thing, but I find other times what I just hate Oracle for forcing me to use an outer select just to reference a big expression by alias.
I would like to see the ability to use Regular Expressions in string handling.
A way of dynamically specifying columns/tables without having to resort to full dynamic sql that executes in another context.
Ability to define columns based on other columns ad infinitum (including disambiguation).
This is a contrived example and not a real world case, but I think you'll see where I'm going:
SELECT LTRIM(t1.a) AS [a.new]
,REPLICATE(' ', 20 - LEN([a.new])) + [a.new] AS [a.conformed]
,LEN([a.conformed]) as [a.length]
FROM t1
INNER JOIN TABLE t2
ON [a.new] = t2.a
ORDER BY [a.new]
instead of:
SELECT LTRIM(t1.a) AS [a.new]
,REPLICATE(' ', 20 - LEN(LTRIM(t1.a))) + LTRIM(t1.a) AS [a.conformed]
,LEN(REPLICATE(' ', 20 - LEN(LTRIM(t1.a))) + LTRIM(t1.a)) as [a.length]
FROM t1
INNER JOIN TABLE t2
ON LTRIM(t1.a) = t2.a
ORDER BY LTRIM(t1.a)
Right now, in SQL Server 2005 and up, I would use a CTE and build up in successive layers.
I'd like the vendors to actually standardise their SQL. They're all guilty of it. The LIMIT/OFFSET clause from MySQL and PostGresql is a good solution that no-one else appears to do. Oracle has it's own syntax for explicit JOINs whilst everyone else uses ANSI-92. MySQL should deprecate the CONCAT() function and use || like everyone else. And there are numerous clauses and statements that are outside the standard that could be wider spread. MySQL's REPLACE is a good example. There's more, with issues about casting and comparing types, quirks of column types, sequences, etc etc etc.
parameterized order by, as in:
select * from tableA order by #columName
Support in SQL to specify if you want your query plan to be optimized to return the first rows quickly, or all rows quickly.
Oracle has the concept of FIRST_ROWS hint, but a standard approach in the language would be useful.
Automatic denormalization.
But I may be dreaming.
Improved pivot tables. I'd like to tell it to automatically create the columns based on the keys found in the data.
On my wish list is a database supporting sub-queries in CHECK-constraints, without having to rely on materialized view tricks. And a database which supports the SQL standard's "assertions", i.e. constraints which may span more than one table.
Something else: A metadata-related function which would return the possible values of a given column, if the set of possible values is low. I.e., if a column has a foreign key to another column, it would return the existing values in the column being referred to. Of if the column has a CHECK-constraint like "CHECK foo IN(1,2,3)", it would return 1,2,3. This would make it easier to create GUI elements based on a table schema: If the function returned a list of two values, the programmer could decide that a radio button widget would be relevant - or if the function returned - e.g. - 10 values, the application showed a dropdown-widget instead. Etc.
UPSERT or MERGE in PostgreSQL. It's the one feature whose absence just boggles my mind. Postgres has everything else; why can't they get their act together and implement it, even in limited form?
Check constraints with subqueries, I mean something like:
CHECK ( 1 > (SELECT COUNT(*) FROM TABLE WHERE A = COLUMN))
These are all MS Sql Server/T-SQL specific:
"Natural" joins based on an existing Foreign Key relationship.
Easily use a stored proc result as a resultset
Some other loop construct besides while
Unique constraints across non NULL values
EXCEPT, IN, ALL clauses instead of LEFT|RIGHT JOIN WHERE x IS [NOT] NULL
Schema bound stored proc (to ease #2)
Relationships, schema bound views, etc. across multiple databases
WITH clause for other statements other than SELECT, it means for UPDATE and DELETE.
For instance:
WITH table as (
SELECT ...
)
DELETE from table2 where not exists (SELECT ...)
Something which I call REFERENCE JOIN. It joins two tables together by implicitly using the FOREIGN KEY...REFERENCES constraint between them.
A relational algebra DIVIDE operator. I hate always having to re-think how to do all elements of table a that are in all of given from table B.
http://www.tc.umn.edu/~hause011/code/SQLexample.txt
String Agregation on Group by (In Oracle is possible with this trick):
SELECT deptno, string_agg(ename) AS employees
FROM emp
GROUP BY deptno;
DEPTNO EMPLOYEES
---------- --------------------------------------------------
10 CLARK,KING,MILLER
20 SMITH,FORD,ADAMS,SCOTT,JONES
30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD
More OOP features:
stored procedures and user functions
CREATE PROCEDURE tablename.spname ( params ) AS ...
called via
EXECUTE spname
FROM tablename
WHERE conditions
ORDER BY
which implicitly passes a cursor or a current record to the SP. (similar to inserted and deleted pseudo-tables)
table definitions with inheritance
table definition as derived from base table, inheriting common columns etc
Btw, this is not necessarily real OOP, but only syntactic sugar on existing technology, but it would simplify development a lot.
Abstract tables and sub-classing
create abstract table person
(
id primary key,
name varchar(50)
);
create table concretePerson extends person
(
birth date,
death date
);
create table fictionalCharacter extends person
(
creator int references concretePerson.id
);
Increased temporal database support in Sql Server. Intervals, overlaps, etc.
Increased OVER support in Sql Server, including LAG, LEAD, and TOP.
Arrays
I'm not sure what's holding this back but lack of arrays lead to temp tables and related mess.
Some kind of UPGRADE table which allows to make changes on the table to be like the given:
CREATE OR UPGRADE TABLE
(
a VARCHAR,
---
)
My wish list (for SQLServer)
Ability to store/use multiple execution plans for a stored procedure concurrently and have the system automatically understand the best stored plan to use at each execution.
Currently theres one plan - if it is no longer optimal its used anyway or a brand new one is computed in its place.
Native UTF-8 storage
Database mirroring with more than one standby server and the ability to use a recovery model approaching 'simple' provided of course all servers are up and the transaction commits everywhere.
PCRE in replace functions
Some clever way of reusing fragments of large sql queries, stored match conditions, select conditions...etc. Similiar to functions but actually implemented more like preprocessor macros.
Comments for check constraints. With this feature, an application (or the database itself when raising an error) can query the metadata and retrieve that comment to show it to the user.
Automated dba notification in the case where the optimizer generates a plan different that the plan that that the query was tested with.
In other words, every query can be registered. At that time, the plan is saved. Later when the query is executed, if there is a change to the plan, the dba receives a notice, that something unexpected occurred.