Building a hierarchical tree with a single SQL query - sql

I have a SQL table with the following structure.
id - int
par - int (relational to id)
name - varchar
Column par contains references to id or NULL if no reference, this table is meant to build an hierarchical tree.
Then, given the data:
id par name
1 NULL John
2 NULL Mario
3 1 George
4 3 Alfred
5 4 Nicole
6 2 Margaret
I want to retrieve a hierarchical tree, up to the last parent, from a given single id.
Example, I want to know the tree from Nicole to the last parent. So the query result will be:
id par name
5 4 Nicole
4 3 Alfred
3 1 George
1 NULL John
I would normally do this with a SQL query repeating over and over and building the tree server side but I do not want that now.
Is there any way to achieve this with a single SQL query?
I need this for either MySQL or PgSQL.
And I want to know also, if possible, is it also widely supported? In which versions of either MySQL or PgSQL can I expect support?

It is possible with a single query in Postgres using a recursive common table expression. This is not possible in MySQL as it is one of the few database to not support recursive CTEs.
It would look something like this (not tested)
WITH RECURSIVE tree (id, par, name) AS (
SELECT id, par, name
FROM the_table
WHERE name = 'Nicole'
UNION ALL
SELECT id, par, name
FROM the_table tt
JOIN tree tr ON tr.id = tt.par
)
SELECT *
FROM tree

For Postgres, see http://www.postgresql.org/docs/8.4/static/queries-with.html
MySQL doesn't support this syntax (unless it's in a beta/development tree somewhere). Oracle has something similar using connect by prior.

This article is probably what you need to look at:
http://explainextended.com/2009/03/17/hierarchical-queries-in-mysql/

In Oracle, this is done via:
SELECT [[LEVEL,]] id, par, name FROM my_table
START WITH name = 'Nicole'
CONNECT BY [[NOCYCLE]] id = PRIOR par
[[ORDER SIBLINGS BY name ASC]]
(my [[…]] syntax denotes optional query bits.
MySQL is planning to integrate such a feature. For PostgreSQL there is another answer helping you.

Related

How to join 2 tables without common fields?

There are 2 tables:
Table 1: first_names
id | first_name
1 | Joey
7 | Ross
17| Chandler
Table 2: last_names
id | first_name
2 | Tribbiani
7 | Geller
25| Bing
Desired result:
id | full_name
1 | Joey Tribbiani
2 | Ross Geller
3 | Chandler Bing
Task:
Write the solution using only the simplest SQL syntax. Using store procedures, declaring variables, ROW_NUMBER(), RANK() functions are forbidden.
I have solution using ROW_NUMBER() function, but no ideas about solving this task using only the simplest SQL syntax.
P.S. I'm only trainee and it's my first question on stackoverflow
Simple join will suffice here
select * from first_names fn
join last_names ln on fn.id = ln.id - 1
But your question is very unclear though. Because join here is based rather on knowledge about Friends series rather than concrete logic...
You must create an id to join the tables.
This can be the order number in the table based in ids:
select
f.counter id, concat(f.first_name, ' ', l.last_name) full_name
from (
select t.*, (select count(*) from first_names where id < t.id) + 1 counter
from first_names t
) f inner join (
select t.*, (select count(*) from last_names where id < t.id) + 1 counter
from last_names t
) l
on l.counter = f.counter
See the demo.
Results:
> id | full_name
> -: | :-------------
> 1 | Joey Tribbiani
> 2 | Ross Geller
> 3 | Chandler Bing
Honestly, this is a stupid solution; it's vastly inefficient to ROW_NUMBER, and I wouldn't be surprised if LEAD is "not allowed" as ROW_NUMBER isn't. The fact that you were told to "use the simpliest SQL" means that the SQL you want to use is a subquery/CTE and ROW_NUMBER; that is as simple as this can really go. Anything else add a layer on unneeded complexity and will likely just make the query suffer from performance degradation. This one, for example, means you need to scan both tables twice; where as with ROW_NUMBER it would be once.
CREATE TABLE FirstNames (id int, FirstName varchar(10));
CREATE TABLE LastNames (id int, LastName varchar(10));
INSERT INTO FirstNames
VALUES(1,'Joey'),
(7,'Ross'),
(17,'Chandler');
INSERT INTO LastNames
VALUES (2,'Tribbiani'),
(7,'Geller'),
(25,'Bing');
GO
WITH CTE AS(
SELECT FN.id,
FN.FirstName,
LN.LastName
FROM FirstNames FN
LEFT JOIN LastNames LN ON FN.id = LN.id
UNION ALL
SELECT LN.id,
FN.FirstName,
LN.LastName
FROM LastNames LN
LEFT JOIN FirstNames FN ON LN.id = FN.id
WHERE FN.id IS NULL),
FullNames AS(
SELECT C.id,
C.FirstName,
ISNULL(C.LastName, LEAD(C.LastName) OVER (ORDER BY id)) AS LastName
FROM CTE C)
SELECT *
FROM FullNames FN
WHERE FN.FirstName IS NOT NULL
ORDER BY FN.id;
GO
DROP TABLE FirstNames;
DROP TABLE LastNames;
To answer the "Task" given:
"Task: Write the solution using only the simplest SQL syntax. Using store procedures, declaring variables, ROW_NUMBER(), RANK() functions are forbidden."
My answer would be the below?
"Why is this a requirement? SQL Server has supported ROW_NUMBER for 14 years, since SQL Server 2005. If you can't use ROW_NUMBER this infers you're using SQL Server 2000. This is actually a big security problem for the company, as 2000 has been out of support for close to a decade. Legislation like GDPR require a company to keep the technology they use secure, and it is very unlikely that this is therefore being met.
If this is the case, the solution if not the find a way around using ROW_NUMBER but to get the company back up to do date. The latest version of SQL Server that you can upgrade to from SQL Server 2000 is 2008; which also runs out of support on July 16 of this year. We'll need to get an instance up and running and get the existing features into this new server ASAP and get QA testing done as soon as possible. This needs to be the highest priority thing. After that we need to repeat the cycle to another version of SQL Server. The latest is 2017, which does support migration from 2008.
Once we've done that, we can then actually make use of ROW_NUMBER in the query; providing the simplest solution and also bringing the company back into a secure environment."
Sometimes requirements need to be challenged. From experience management can make some "stupid" requirements, because they don't understand the technology. When you're in an IT role, sometimes you will need to question those requirements and explain why the requirement isn't actually a good idea. Then, instead, you can aid Management to find the correct solution for the problem. At the end of the day, what they might be trying to fix could be an XY problem; and part of your troubleshooting will be to find out what X really is.

Recursive query

I have a table which contains the following fields
Supervisorid
Empid
This is just like a referral program. A guy can refer 3 guys under him i.e, 3 is referring three guys namely 4 5 8 similarly 4 is referring 9 10 and 11 likewise 8 is referring 12, 13 it goes like this..
I want a query to get all EmpId under Supervisor 3
Do you want us to write the solution for you, or explain a bit how recursive queries can be built up ?
An example of how they are built up is on http://publib.boulder.ibm.com/infocenter/db2luw/v8//topic/com.ibm.db2.udb.doc/ad/samples/clp/s-flt-db2.htm.
The IBM DB2 redbook has an entire chapter on SQL recursion.
The gist is that the following steps are generally involved:
you define a "seed". SELECT SUPID, EMPID, 1 AS LVL FROM EMP WHERE SUPID = 3;
you assign to this a name. WITH SRC AS <your seed here>
you define the way to go to the 'next level', starting from the seed, using the assigned name. SELECT SRC.SUPID, F.EMPID, SRC.LVL+1 FROM SRC, EMP WHERE SRC.EMPID=EMP.SUPID
you combine the two together (inside the WITH clause) WITH SRC AS <your seed here> UNION ALL <the other SELECT here>
(optionally) you define which columns to select. SELECT EMPID, LVL FROM SRC.

What the simplest way to sub-query a variable number of rows into fields of the parent query?

What the simplest way to sub-query a variable number of rows into fields of the parent query?
PeopleTBL
NameID int - unique
Name varchar
Data: 1,joe
2,frank
3,sam
HobbyTBL
HobbyID int - unique
HobbyName varchar
Data: 1,skiing
2,swimming
HobbiesTBL
NameID int
HobbyID int
Data: 1,1
2,1
2,2
The app defines 0-2 Hobbies per NameID.
What the simplest way to query the Hobbies into fields retrieved with "Select * from PeopleTBL"
Result desired based on above data:
NameID Name Hobby1 Hobby2
1 joe skiing
2 frank skiing swimming
3 sam
I'm not sure if I understand correctly, but if you want to fetch all the hobbies for a person in one row, the following query might be useful (MySQL):
SELECT NameID, Name, GROUP_CONCAT(HobbyName) AS Hobbies
FROM PeopleTBL
JOIN HobbiesTBL USING NameID
JOIN HobbyTBL USING HobbyID
Hobbies column will contain all hobbies of a person separated by ,.
See documentation for GROUP_CONCAT for details.
I don't know what engine are you using, so I've provided an example with MySQL (I don't know what other sql engines support this).
Select P.NameId, P.Name
, Min( Case When H2.HobbyId = 1 Then H.HobbyName End ) As Hobby1
, Min( Case When H2.HobbyId = 2 Then H.HobbyName End ) As Hobby2
From HobbyTbl As H
Join HobbiesTbl As H2
On H2.HobbyId = H.HobbyId
Join PeopleTbl As P
On P.NameId = H2.NameId
Group By P.NameId, P.Name
What you are seeking is called a crosstab query. As long as the columns are static, you can use the above solution. However, if you want to dynamic build the columns, you need to build the SQL statement in middle-tier code or use a reporting tool.

Loop through without Cursor in SQL Server 2005

I have a table OrganisationStructure like this:
OrganisationID INT
ParentOrganisationID INT
OrganisationName VARCHAR(64)
1 | 0 | Company
2 | 1 | IT DIVISION
3 | 2 | IT SYSTEM BUSINESS UNIT
4 | 1 | MARKETING DIVISION
5 | 4 | DONATION BUSINESS UNIT
I want to have a query that if the app passing let say OrganisatinID = 1 means that it will loop (looking at parent/child) through till end of this table and grap all possible Returned OrganisatioIDs = (1, 2, 3, 4, 5).
Other if passing OrganisationID = 2 then Returned OrganisationID = (2, 3)
Other if passing OrganisationID = 3 then Returned OrganisationID = 3
Any ideas to do this without cursor?
Thanks
You can use SQL 2005 CTEs to make the SQL engine do it recursively.
An enumeration of basic approaches is at http://blogs.msdn.com/anthonybloesch/archive/2006/02/15/Hierarchies-in-SQL-Server-2005.aspx
Celko also has a trees in SQL book which covers all of this to the nth degree.
Or you can brute force it by selecting each level into a local table variable and then looping, inserting children with a select, until your ##ROWCOUNT is zero (i.e., you're not finding any more children). If you don't have a lot of data, this is easy to code, but you hinted that you're looking for performance by saying you dont want a cursor.
declare #rootID int;
select #rootID = 4;
with cte_anchor as (
SELECT OrganisationID
, ParentOrganisationID
, OrganisationName
FROM Organisation
WHERE OrganisationID = #rootID)
, cte_recursive as (
SELECT OrganisationID
, ParentOrganisationID
, OrganisationName
FROM cte_anchor
UNION ALL
SELECT o.OrganisationID
, o.ParentOrganisationID
, o.OrganisationName
FROM Organisation o JOIN cte_recursive r
ON o.ParentOrganisationID = r.OrganisationID)
SELECT * FROM cte_recursive
In SqlServer 2005 with Common Table Expressions is possible to do recursive queries. For an example see 'Recursive Common Table Expressions' in Common Table Expressions (CTE) in SQL Server 2005 from 4guysfromrolla.
How many levels deep can your parent child structure go ?
You could do a self-join on the table to line up grand-parent / parent / child entities, but that's limited by the number of levels deep your parent/child relationships can go.
I know you've stated SQL 2005 but just so you're aware this kind of tree structure mapping is exactly what the new HierarchyID (Video Here) in Sql 2008 is for.
Try this for 3 levels using plain vanilla simple brute force - you can add levels as required.
SELECT DISTINCT OrganizationID
FROM
(
SELECT
ParentOrganizationID
FROM OrganizationStructure
WHERE ParentOrganizationID = #arg
UNION ALL
SELECT
OrganizationID
FROM OrganizationStructure
WHERE ParentOrganizationID = #arg
UNION ALL
SELECT os2.OrganizationID
FROM OrganizationStructure os
JOIN OrganizationStructure os2 ON os.OrganizationID = is2.ParentOrganizationID
WHERE os.ParentOrganizationID = #arg
) data
I believe the question is answered well enough, however if you're interested in alternative methods of structuring your data for better effect, google for 'evolt ways to work with hierarchical data'
I'm not allowed to post links yet :)

SQL - How to store and navigate hierarchies?

What are the ways that you use to model and retrieve hierarchical info in a database?
I like the Modified Preorder Tree Traversal Algorithm. This technique makes it very easy to query the tree.
But here is a list of links about the topic which I copied from the Zend Framework (PHP) contributors webpage (posted there by Posted by Laurent Melmoux at Jun 05, 2007 15:52).
Many of the links are language agnostic:
There is 2 main representations and algorithms to represent hierarchical structures with databases :
nested set also known as modified preorder tree traversal algorithm
adjacency list model
It's well explained here:
http://www.sitepoint.com/article/hierarchical-data-database
Managing Hierarchical Data in MySQL
http://www.evolt.org/article/Four_ways_to_work_with_hierarchical_data/17/4047/index.html
Here are some more links that I've collected:
http://en.wikipedia.org/wiki/Tree_%28data_structure%29
http://en.wikipedia.org/wiki/Category:Trees_%28structure%29
adjacency list model
http://www.sqlteam.com/item.asp?ItemID=8866
nested set
http://www.sqlsummit.com/AdjacencyList.htm
http://www.edutech.ch/contribution/nstrees/index.php
http://www.phpriot.com/d/articles/php/application-design/nested-trees-1/
http://www.dbmsmag.com/9604d06.html
http://en.wikipedia.org/wiki/Tree_traversal
http://www.cosc.canterbury.ac.nz/mukundan/dsal/BTree.html (applet java montrant le fonctionnement )
Graphes
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html
Classes :
Nested Sets DB Tree Adodb
http://www.phpclasses.org/browse/package/2547.html
Visitation Model ADOdb
http://www.phpclasses.org/browse/package/2919.html
PEAR::DB_NestedSet
http://pear.php.net/package/DB_NestedSet
utilisation : https://www.entwickler.com/itr/kolumnen/psecom,id,26,nodeid,207.html
PEAR::Tree
http://pear.php.net/package/Tree/download/0.3.0/
http://www.phpkitchen.com/index.php?/archives/337-PEARTree-Tutorial.html
nstrees
http://www.edutech.ch/contribution/nstrees/index.php
The definitive pieces on this subject have been written by Joe Celko, and he has worked a number of them into a book called Joe Celko's Trees and Hierarchies in SQL for Smarties.
He favours a technique called directed graphs. An introduction to his work on this subject can be found here
What's the best way to represent a hierachy in a SQL database? A generic, portable technique?
Let's assume the hierachy is mostly read, but isn't completely static. Let's say it's a family tree.
Here's how not to do it:
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date,
mother integer,
father integer
);
And inserting data like this:
person_id name dob mother father
1 Pops 1900/1/1 null null
2 Grandma 1903/2/4 null null
3 Dad 1925/4/2 2 1
4 Uncle Kev 1927/3/3 2 1
5 Cuz Dave 1953/7/8 null 4
6 Billy 1954/8/1 null 3
Instead, split your nodes and your relationships into two tables.
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date
);
create table ancestor (
ancestor_id integer,
descendant_id integer,
distance integer
);
Data is created like this:
person_id name dob
1 Pops 1900/1/1
2 Grandma 1903/2/4
3 Dad 1925/4/2
4 Uncle Kev 1927/3/3
5 Cuz Dave 1953/7/8
6 Billy 1954/8/1
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
1 3 1
2 3 1
1 4 1
2 4 1
1 5 2
2 5 2
4 5 1
1 6 2
2 6 2
3 6 1
you can now run arbitary queries that don't involve joining the table back on itself, which would happen if you have the heirachy relationship in the same row as the node.
Who has grandparents?
select * from person where person_id in
(select descendant_id from ancestor where distance=2);
All your descendants:
select * from person where person_id in
(select descendant_id from ancestor
where ancestor_id=1 and distance>0);
Who are uncles?
select decendant_id uncle from ancestor
where distance=1 and ancestor_id in
(select ancestor_id from ancestor
where distance=2 and not exists
(select ancestor_id from ancestor
where distance=1 and ancestor_id=uncle)
)
You avoid all the problems of joining a table to itself via subqueries, a common limitation is 16 subsuqeries.
Trouble is, maintaining the ancestor table is kind of hard - best done with a stored procedure.
I've got to disagree with Josh. What happens if you're using a huge hierarchical structure like a company organization. People can join/leave the company, change reporting lines, etc... Maintaining the "distance" would be a big problem and you would have to maintain two tables of data.
This query (SQL Server 2005 and above) would let you see the complete line of any person AND calculates their place in the hierarchy and it only requires a single table of user information. It can be modified to find any child relationship.
--Create table of dummy data
create table #person (
personID integer IDENTITY(1,1) NOT NULL,
name varchar(255) not null,
dob date,
father integer
);
INSERT INTO #person(name,dob,father)Values('Pops','1900/1/1',NULL);
INSERT INTO #person(name,dob,father)Values('Grandma','1903/2/4',null);
INSERT INTO #person(name,dob,father)Values('Dad','1925/4/2',1);
INSERT INTO #person(name,dob,father)Values('Uncle Kev','1927/3/3',1);
INSERT INTO #person(name,dob,father)Values('Cuz Dave','1953/7/8',4);
INSERT INTO #person(name,dob,father)Values('Billy','1954/8/1',3);
DECLARE #OldestPerson INT;
SET #OldestPerson = 1; -- Set this value to the ID of the oldest person in the family
WITH PersonHierarchy (personID,Name,dob,father, HierarchyLevel) AS
(
SELECT
personID
,Name
,dob
,father,
1 as HierarchyLevel
FROM #person
WHERE personID = #OldestPerson
UNION ALL
SELECT
e.personID,
e.Name,
e.dob,
e.father,
eh.HierarchyLevel + 1 AS HierarchyLevel
FROM #person e
INNER JOIN PersonHierarchy eh ON
e.father = eh.personID
)
SELECT *
FROM PersonHierarchy
ORDER BY HierarchyLevel, father;
DROP TABLE #person;
FYI: SQL Server 2008 introduces a new HierarchyID data type for this sort of situation. Gives you control over where in the "tree" your row sits, horizontally as well as vertically.
Oracle: SELECT ... START WITH ... CONNECT BY
Oracle has an extension to SELECT that allows easy tree-based retrieval. Perhaps SQL Server has some similar extension?
This query will traverse a table where the nesting relationship is stored in parent and child columns.
select * from my_table
start with parent = :TOP
connect by prior child = parent;
http://www.adp-gmbh.ch/ora/sql/connect_by.html
I prefer a mix of the techinques used by Josh and Mark Harrison:
Two tables, one with the data of the Person and other with the hierarchichal info (person_id, parent_id [, mother_id]) if the PK of this table is person_id, you have a simple tree with only one parent by node (which makes sense in this case, but not in other cases like accounting accounts)
This hiarchy table can be transversed by recursive procedures or if your DB supports it by sentences like SELECT... BY PRIOR (Oracle).
Other posibility is if you know the max deep of the hierarchy data you want to mantain is use a single table with a set of columns per level of hierarchy
We had the same issue when we implemented a tree component for [fleXive] and used the nested set tree model approach mentioned by tharkun from the MySQL docs.
In addition to speed things (dramatically) up we used a spreaded approach which simply means we used the maximum Long value for the top level right bounds which allows us to insert and move nodes without recalculating all left and right values. Values for left and right are calculated by dividing the range for a node by 3 und use the inner third as bounds for the new node.
A java code example can be seen here.
If you're using SQL Server 2005 then this link explains how to retrieve hierarchical data.
Common Table Expressions (CTEs) can be your friends once you get comfortable using them.