Generating a hierarchy - sql

I got the following question at a job interview and it completely stumped me, so I'm wondering if anybody out there can help explain it to me. Say I have the following table:
employees
--------------------------
id | name | reportsTo
--------------------------
1 | Alex | 2
2 | Bob | NULL
3 | Charlie | 5
4 | David | 2
5 | Edward | 8
6 | Frank | 2
7 | Gary | 8
8 | Harry | 2
9 | Ian | 8
The question was to write a SQL query that returned a table with a column for each employee's name and a column showing how many people are above that employee in the organization: i.e.,
hierarchy
--------------------------
name | hierarchyLevel
--------------------------
Alex | 1
Bob | 0
Charlie | 3
David | 1
Edward | 2
Frank | 1
Gary | 2
Harry | 1
Ian | 2
I can't even figure out where to begin writing this as a SQL query (a cursor, maybe?). Can anyone help me out in case I get asked a similar question to this again? Thanks.

The simplest example would be to use a (real or temporary) table, and add one level at a time (fiddle):
INSERT INTO hierarchy
SELECT id, name, 0
FROM employees
WHERE reportsTo IS NULL;
WHILE ((SELECT COUNT(1) FROM employees) <> (SELECT COUNT(1) FROM hierarchy))
BEGIN
INSERT INTO hierarchy
SELECT e.id, e.name, h.hierarchylevel + 1
FROM employees e
INNER JOIN hierarchy h ON e.reportsTo = h.id
AND NOT EXISTS(SELECT 1 FROM hierarchy hh WHERE hh.id = e.id)
END
Other solutions will be slightly different for each RDBMS. As one example, in SQL Server, you can use a recursive CTE to expand it (fiddle):
;WITH expanded AS
(
SELECT id, name, 0 AS level
FROM employees
WHERE reportsTo IS NULL
UNION ALL
SELECT e.id, e.name, level + 1 AS level
FROM expanded x
INNER JOIN employees e ON e.reportsTo = x.id
)
SELECT *
FROM expanded
ORDER BY id
Other solutions include recursive stored procedures, or even using dynamic SQL to iteratively increase the number of joins until everybody is accounted for.
Of course all these examples assume there are no cycles and everyone can be traced up the chain to a head honcho (reportsTo = NULL).

Related

In a query (no editing of tables) how do I join data without any similarities?

I Have a query that finds a table, here's an example one.
Name |Age |Hair |Happy | Sad |
Jon | 15 | Black |NULL | NULL|
Kyle | 18 |Blonde |YES |NULL |
Brad | 17 | Blue |NULL |YES |
Name and age come from one table in a database, hair color comes from a second which is joined, and happy and sad come from a third table.My goal would be to make the first line of the chart like this:
Name |Age |Hair |Happy |Sad |
Jon | 15 |Black |Yes |Yes |
Basically I want to get rid of the rows under the first and get the non NULL data joined to the right. The problem is that there is no column where the Yes values are on the Jon row, so I have no idea how to get them there. Any suggestions?
PS. With the data I am using I can't just put a 'YES' in the 'Jon' row and call it a day, I would need to find the specific value from the lower rows and somehow get that value in the boxes that are NULL.
Do you just want COALESCE()?
COALESCE(Happy, 'Yes') as happy
COALESCE() replaces a NULL value with another value.
If you want to join on a NULL value work with nested selects. The inner select gets an Id for NULLs, the outer select joins
select COALESCE(x.Happy, yn_table.description) as happy, ...
from
(select
t1.Happy,
CASE WHEN t1.Happy is null THEN 1 END as happy_id
from t1 ...) x
left join yn_table
on x.xhappy_id = yn_table.id
If you apply an ORDER BY to the query, you can then select the first row relative to this order with WHERE rownum = 1. If you don't apply an ORDER BY, then the order is random.
After reading your new comment...
the sense is that in my real data the yes under the other names will be a number of a piece of equipment. I want the numbers of the equipment in one row instead of having like 8 rows with only 4 ' yes' values and the rest null.
... I come to the conclusion that this a XY problem.
You are asking about a detail you think will solve your problem, instead of explaining the problem and asking how to solve it.
If you want to store several pieces of equipment per person, you need three tables.
You need a Person table, an Article table and a junction table relating articles to persons to equip them. Let's call this table Equipment.
Person
------
PersonId (Primary Key)
Name
optional attributes like age, hair color
Article
-------
ArticleId (Primary Key)
Description
optional attributes like weight, color etc.
Equipment
---------
PersonId (Primary Key, Foreign Key to table Person)
ArticleId (Primary Key, Foreign Key to table Article)
Quantity (optional, if each person can have only one of each article, we don't need this)
Let's say we have
Person: PersonId | Name
1 | Jon
2 | Kyle
3 | Brad
Article: ArticleId | Description
1 | Hat
2 | Bottle
3 | Bag
4 | Camera
5 | Shoes
Equipment: PersonId | ArticleId | Quantity
1 | 1 | 1
1 | 4 | 1
1 | 5 | 1
2 | 3 | 2
2 | 4 | 1
Now Jon has a hat, a camera and shoes. Kyle has 2 bags and one camera. Brad has nothing.
You can query the persons and their equipment like this
SELECT
p.PersonId, p.Name, a.ArticleId, a.Description AS Equipment, e.Quantity
FROM
Person p
LEFT JOIN Equipment e
ON p.PersonId = e.PersonId
LEFT JOIN Article a
ON e.ArticleId = a.ArticleId
ORDER BY p.Name, a.Description
The result will be
PersonId | Name | ArticleId | Equipment | Quantity
---------+------+-----------+-----------+---------
3 | Brad | NULL | NULL | NULL
1 | Jon | 4 | Camera | 1
1 | Jon | 1 | Hat | 1
1 | Jon | 5 | Shoes | 1
2 | Kyle | 3 | Bag | 2
2 | Kyle | 4 | Camera | 1
See example: http://sqlfiddle.com/#!4/7e05d/2/0
Since you tagged the question with the oracle tag, you could just use NVL(), which allows you to specify a value that would replace a NULL value in the column you select from.
Assuming that you want the 1st row because it contains the smallest age:
- wrap your query inside a CTE
- in another CTE get the 1st row of the query
- in another CTE get the max values of Happy and Sad of your query (for your sample data they both are 'YES')
- cross join the last 2 CTEs.
with
cte as (
<your query here>
),
firstrow as (
select name, age, hair from cte
order by age
fetch first row only
),
maxs as (
select max(happy) happy, max(sad) sad
from cte
)
select f.*, m.*
from firstrow f cross join maxs m
You can try this:
SELECT A.Name,
A.Age,
B.Hair,
C.Happy,
C.Sad
FROM A
INNER JOIN B
ON A.Name = B.Name
INNER JOIN C
ON A.Name = B.Name
(Assuming that Name is the key columns in the 3 tables)

How do I query for records that all have the same relationships (MS Access)?

I'm trying to query for records that share common relationships. In order to avoid gratuitous context, here is a hypothetical example from my favorite old-school Nintendo game:
Consider a table of boxers:
tableBoxers
ID | boxerName
----------------
1 | Little Mac
2 | King Hippo
3 | Von Kaiser
4 | Don Flamenco
5 | Bald Bull
Now I have a relationship table that links them together
boxingMatches
boxerID1 | boxerID2
-----------------------
1 | 3
2 | 5
2 | 4
5 | 1
4 | 1
Since I don't want to discriminate between ID1 and ID2, I create a query that UNIONs them together:
SELECT firstID AS boxerID1, secondID AS boxerID2 FROM
(
SELECT boxerID1 AS firstID, boxerID2 AS boxerID FROM boxingMatches
UNION ALL
SELECT boxerID2 AS firstID, boxerID1 AS secondID FROM boxingMatches
) ORDER BY firstID, secondID
I get:
queryBoxingMatches
boxerID1 | boxerID2
-----------------------
1 | 3
1 | 4
1 | 5
2 | 4
2 | 5
3 | 1
4 | 1
4 | 2
5 | 1
5 | 2
Now I have VBA script where a user can select the boxers he's interested in. Let's say he selects Little Mac (1) and King Hippo (2). This gets appended into a temporary table:
summaryRequest
boxerID
--------
1
2
Using table [summaryRequest] and [queryBoxingMatches], how do I find out whom Little Mac (1) and King Hippo (2) have similarly fought against? The result should be Bald Bull (5) and Don Flamenco (4).
Bear in mind that [summaryRequest] could have 0 or more records. I have considered an INTERSECT, but I'm not sure that's the right function for this. I've tried using COUNT numerous ways, but it gives undesired data when there are multiple relationships (e.g. if Little Mac fought Bald Bull twice and King Hippo only fought him once).
I can't help but feel like the answer is plain and simple and I'm just overthinking it. Any help is appreciated. Thanks.
Not sure if that works like this in MS Access:
SELECT boxerID2, COUNT(*) as cnt
FROM queryBoxingMatches
WHERE boxerID1 IN (SELECT boxerID FROM summaryRequest)
GROUP BY boxerID2
HAVING COUNT(DISTINCT boxerID1) = (SELECT COUNT(DISTINCT boxerID)
FROM summaryRequest)
Okay, I think I found the answer:
SELECT boxerID2 FROM
(
SELECT boxerID2, COUNT(boxerID2) AS getCount FROM queryBoxingMatches
WHERE boxerID1 IN (SELECT ID FROM summaryRequest)
GROUP BY boxerID2
)
WHERE getCount = (SELECT COUNT(*) FROM summaryRequest)
It's a play off of COUNT(), which I don't like because COUNT() doesn't guarantee a full-fledged relationship--just a pattern.

CTE to represent a logical table for the rows in a table which have the max value in one column

I have an "insert only" database, wherein records aren't physically updated, but rather logically updated by adding a new record, with a CRUD value, carrying a larger sequence. In this case, the "seq" (sequence) column is more in line with what you may consider a primary key, but the "id" is the logical identifier for the record. In the example below,
This is the physical representation of the table:
seq id name | CRUD |
----|-----|--------|------|
1 | 10 | john | C |
2 | 10 | joe | U |
3 | 11 | kent | C |
4 | 12 | katie | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
This is the logical representation of the table, considering the "most recent" records:
seq id name | CRUD |
----|-----|--------|------|
2 | 10 | joe | U |
3 | 11 | kent | C |
5 | 12 | sue | U |
6 | 13 | jill | C |
7 | 14 | bill | C |
In order to, for instance, retrieve the most recent record for the person with id=12, I would currently do something like this:
SELECT
*
FROM
PEOPLE P
WHERE
P.ID = 12
AND
P.SEQ = (
SELECT
MAX(P1.SEQ)
FROM
PEOPLE P1
WHERE P.ID = 12
)
...and I would receive this row:
seq id name | CRUD |
----|-----|--------|------|
5 | 12 | sue | U |
What I'd rather do is something like this:
WITH
NEW_P
AS
(
--CTE representing all of the most recent records
--i.e. for any given id, the most recent sequence
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
The first SQL example using the the subquery already works for us.
Question: How can I leverage a CTE to simplify our predicates when needing to leverage the "most recent" logical view of the table. In essence, I don't want to inline a subquery every single time I want to get at the most recent record. I'd rather define a CTE and leverage that in any subsequent predicate.
P.S. While I'm currently using DB2, I'm looking for a solution that is database agnostic.
This is a clear case for window (or OLAP) functions, which are supported by all modern SQL databases. For example:
WITH
ORD_P
AS
(
SELECT p.*, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY seq DESC) rn
FROM people p
)
,
NEW_P
AS
(
SELECT * from ORD_P
WHERE rn = 1
)
SELECT
*
FROM
NEW_P P2
WHERE
P2.ID = 12
PS. Not tested. You may need to explicitly list all columns in the CTE clauses.
I guess you already put it together. First find the max seq associated with each id, then use that to join back to the main table:
WITH newp AS (
SELECT id, MAX(seq) AS latestseq
FROM people
GROUP BY id
)
SELECT p.*
FROM people p
JOIN newp n ON (n.latestseq = p.seq)
ORDER BY p.id
What you originally had would work, or moving the CTE into the "from" clause. Maybe you want to use a timestamp field rather than a sequence number for the ordering?
Following up from #Glenn's answer, here is an updated query which meets my original goal and is on par with #mustaccio's answer, but I'm still not sure what the performance (and other) implications of this approach vs the other are.
WITH
LATEST_PERSON_SEQS AS
(
SELECT
ID,
MAX(SEQ) AS LATEST_SEQ
FROM
PERSON
GROUP BY
ID
)
,
LATEST_PERSON AS
(
SELECT
P.*
FROM
PERSON P
JOIN
LATEST_PERSON_SEQS L
ON
(
L.LATEST_SEQ = P.SEQ)
)
SELECT
*
FROM
LATEST_PERSON L2
WHERE
L2.ID = 12

How can I find all columns A whose subcategories B are all related to the same column C?

I'm trying to better understand relational algebra and am having trouble solving the following type of question:
Suppose there is a column A (Department), a column B (Employees) and a column C (Managers). How can I find all of the departments who only have one manager for all of their employees? An example is provided below:
Department | Employees | Managers
-------------+-------------+----------
A | John | Bob
A | Sue | Sam
B | Jim | Don
B | Alex | Don
C | Jason | Xie
C | Greg | Xie
In this table, the result I should get are all tuples containing departments B and C because all of their employees are managed by the same person (Don and Xie respectively). Department A however, would not be returned because it's employees have multiple managers.
Any help or pointers would be appreciated.
Such problems usually call for a self-join.
Joining the relation onto itself on Department, then filtering out the tuples where the Managers are equal would yield us all the unwanted tuples, which we can just subtract from the original relations.
Here's how I'd do it:
First we make a copy of table T, and call it T2, then take a cross product of T and T2. From the result we select all the rows where T1.Manager /= T2.Manager but T1.Department=T2.Department, yielding us these tuples:
T1.Department | T1.Employees| T1.Managers | T2.Managers | T2.Employees | T2.Department
--------------+-------------+-------------+-------------+--------------+--------------
A | John | Bob | Sam | Sue | A
A | Sue | Sam | Bob | John | A
Departments A and B aren't present because their T1.Manager always equals T2.Manager.
Then we just subtract this result the original set to get the answer.
If your RDBMS supports common table expressions:
with C as (
select department, manager, count(*) as cnt
from A
group by department, manager
),
B as (
select department, count(*) as cnt
from A group by department
)
select A.*
from A
join C on A.department = C.department
join B on A.department = B.department
where B.cnt = C.cnt;

SQL Server: Select hierarchically related items from one table

Say, I have an organizational structure that is 5 levels deep:
CEO -> DeptHead -> Supervisor -> Foreman -> Worker
The hierarchy is stored in a table Position like this:
PositionId | PositionCode | ManagerId
1 | CEO | NULL
2 | DEPT01 | 1
3 | DEPT02 | 1
4 | SPRV01 | 2
5 | SPRV02 | 2
6 | SPRV03 | 3
7 | SPRV04 | 3
... | ... | ...
PositionId is uniqueidentifier. ManagerId is the ID of employee's manager, referring PositionId from the same table.
I need a SQL query to get the hierarchy tree going down from a position, provided as parameter, including the position itself. I managed to develop this:
-- Select the original position itself
SELECT
'Rank' = 0,
Position.PositionCode
FROM Position
WHERE Position.PositionCode = 'CEO' -- Parameter
-- Select the subordinates
UNION
SELECT DISTINCT
'Rank' =
CASE WHEN Pos2.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos3.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos4.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos5.PositionCode IS NULL THEN 0 ELSE 1
END
END
END
END,
'PositionCode' = RTRIM(ISNULL(Pos5.PositionCode, ISNULL(Pos4.PositionCode, ISNULL(Pos3.PositionCode, Pos2.PositionCode)))),
FROM Position Pos1
LEFT JOIN Position Pos2
ON Pos1.PositionId = Pos2.ManagerId
LEFT JOIN Position Pos3
ON Pos2.PositionId = Pos3.ManagerId
LEFT JOIN Position Pos4
ON Pos3.PositionId = Pos4.ManagerId
LEFT JOIN Position Pos5
ON Pos4.PositionId = Pos5.ManagerId
WHERE Pos1.PositionCode = 'CEO' -- Parameter
ORDER BY Rank ASC
It works not only for 'CEO' but for any position, displaying its subordinates. Which gives me the following output:
Rank | PositionCode
0 | CEO
... | ...
2 | SPRV55
2 | SPRV68
... | ...
3 | FRMN10
3 | FRMN12
... | ...
4 | WRKR01
4 | WRKR02
4 | WRKR03
4 | WRKR04
My problems are:
The output does not include intermediate nodes - it will only output end nodes, i.e. workers and intermediate managers which have no subordinates. I need all intermediate managers as well.
I have to manually UNION the row with original position on top of the output. I there any more elegant way to do this?
I want the output to be sorted in hieararchical tree order. Not all DeptHeads, then all Supervisors, then all Foremen then all workers, but like this:
Rank | PositionCode
0 | CEO
1 | DEPT01
2 | SPRV01
3 | FRMN01
4 | WRKR01
4 | WRKR02
... | ...
3 | FRMN02
4 | WRKR03
4 | WRKR04
... | ...
Any help would be greatly appreciated.
Try a recursive CTE, the example on TechNet is almost identical to your problem I believe:
http://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
Thx, everyone suggesting CTE. I got the following code and it's working okay:
WITH HierarchyTree (PositionId, PositionCode, Rank)
AS
(
-- Anchor member definition
SELECT PositionId, PositionCode,
0 AS Rank
FROM Position AS e
WHERE PositionCode = 'CEO'
UNION ALL
-- Recursive member definition
SELECT e.PositionId, e.PositionCode,
Rank + 1
FROM Position AS e
INNER JOIN HierarchyTree AS d
ON e.ManagerId = d.PositionId
)
SELECT Rank, PositionCode
FROM HierarchyTree
GO
I had a similar problem to yours on a recent project but with a variable recursion length - typically between 1 and 10 levels.
I wanted to simplify the SQL side of things so I put some extra work into the logic of storing the recursive elements by storing a "hierarchical path" in addition to the direct manager Id.
So a very contrived example:
Employee
Id | JobDescription | Hierarchy | ManagerId
1 | DIRECTOR | 1\ | NULL
2 | MANAGER 1 | 1\2\ | 1
3 | MANAGER 2 | 1\3\ | 1
4 | SUPERVISOR 1 | 1\2\4 | 2
5 | SUPERVISOR 2 | 1\3\5 | 3
6 | EMPLOYEE 1 | 1\2\4\6 | 4
7 | EMPLOYEE 2 | 1\3\5\7 | 5
This means you have the power to very quickly query any level of the tree and get all descendants by using a LIKE query on the Hierarchy column
For example
SELECT * FROM dbo.Employee WHERE Hierarchy LIKE '\1\2\%'
would return
MANAGER 1
SUPERVISOR 1
EMPLOYEE 1
Additionally you can also easily get one level of the tree by using the ManagerId column.
The downside to this approach is you have to construct the hierarchy when inserting or updating records but believe me when I say this storage structure saved me a lot of pain later on without the need for unnecessary query complexity.
One thing to note is that my approach gives you the raw data - I then parse the result set into a recursive strongly typed structure in my services layer. As a rule I don't tend to format output in SQL.