How To Traverse a Tree/Work With Hierarchical data in SQL Code - sql

Say I have an employee table, with a record for each employee in my company, and a column for supervisor (as seen below). I would like to prepare a report, which lists the names and title for each step in a supervision line. eg for dick robbins, 1d #15, i'd like a list of each supervisor in his "chain of command," all the way to the president, big cheese. I'd like to avoid using cursors, but if that's the only way to do this then that's ok.
id fname lname title supervisorid
1 big cheese president 1
2 jim william vice president 1
3 sally carr vice president 1
4 ryan allan senior manager 2
5 mike miller manager 4
6 bill bryan manager 4
7 cathy maddy foreman 5
8 sean johnson senior mechanic 7
9 andrew koll senior mechanic 7
10 sarah ryans mechanic 8
11 dana bond mechanic 9
12 chris mcall technician 10
13 hannah ryans technician 10
14 matthew miller technician 11
15 dick robbins technician 11
The real data probably won't be more than 10 levels deep...but I'd rather not just do 10 outside joins...I was hoping there was something better than that, and less involved than cursors.
Thanks for any help.

This is basically a port of the accepted answer on my question that I linked to in the OP comments.
you can use common-table expressions
WITH Family As
(
SELECT e.id, e.supervisorid, 0 as Depth
FROM Employee e
WHERE id = #SupervisorID
UNION All
SELECT e2.ID, e2.supervisorid, Depth + 1
FROM Employee e2
JOIN Family
On Family.id = e2.supervisorid
)
SELECT*
FROM Family
For more:
Recursive Queries Using Common Table Expressions

You might be interested in the "Materialized Path" solution, which does slightly de-normalize the table but can be used on any type of SQL database and prevents you from having to do recursive queries. In fact, it can even be used on no-SQL databases.
You just need to add a column which holds the entire ancestry of the object. For example, the table below includes a column named tree_path:
+----+-----------+----------+----------+
| id | value | parent | tree_path|
+----+-----------+----------+----------+
| 1 | Some Text | 0 | |
| 2 | Some Text | 0 | |
| 3 | Some Text | 2 | -2-|
| 4 | Some Text | 2 | -2-|
| 5 | Some Text | 3 | -2-3-|
| 6 | Some Text | 3 | -2-3-|
| 7 | Some Text | 1 | -1-|
+----+-----------+----------+----------+
Selecting all the descendants of the record with id=2 looks like this:
SELECT * FROM comment_table WHERE tree_path LIKE '-2-%' ORDER BY tree_path ASC
To build a tree, you can sort by tree_path to get an array that's fairly easy to convert to a tree.
You can also index tree_path and the index can be used when the wildcard is not at the beginning.
For example, tree_path LIKE '-2-%' can use the index, but tree_path LIKE '%-2-' cannot.

Some recursive function which either return the supervisor (if any) or null. Could be a SP which invokes itself as well, and using UNION.

SQL is a language for performing set operations and recursion is not one of them. Further, many database systems have limitations on recursion using stored procedures as a safety measure to prevent rogue code from running away with precious server resources.
So, when working with SQL always think 'flat', not 'hierarchical'. So I would highly recommend the 'tree_path' method that has been suggested. I have used the same approach and it works wonderfully and crucially, very robustly.

Related

Postgres rank() without duplicates

I'm ranking race data for series of cycling events. Racers win various amounts of points for their position in races. I want to retain the discrete event scoring, but also rank the racer in the series. For example, considering a sub-query that returns this:
License #
Rider Name
Total Points
Race Points
Race ID
123
Joe
25
5
567
123
Joe
25
12
234
123
Joe
25
8
987
456
Ahmed
20
12
567
456
Ahmed
20
8
234
You can see Joe has 25 points, as he won 5, 12, and 8 points in three races. Ahmed has 20 points, as he won 12 and 8 points in two races.
Now for the ranking, what I'd like is:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
2
456
Ahmed
20
12
567
2
456
Ahmed
20
8
234
But if I use rank() and order by "Total Points", I get:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
4
456
Ahmed
20
12
567
4
456
Ahmed
20
8
234
Which makes sense, since there are three "ties" at 25 points.
dense_rank() solves this problem, but if there are legitimate ties across different racers, I want there to be gaps in the rank (e.g if Joe and Ahmed both had 25 points, the next racer would be in third place, not second).
The easiest way to solve this I think would be to issue two queries, one with the "duplicate" racers eliminated, and then a second one where I can retain the individual race data, which I need for the points break down display.
I can also probably, given enough effort, think of a way to do this in a single query, but I'm wondering if I'm not just missing something really obvious that could accomplish this in a single, relatively simple query.
Any suggestions?
You have to break this into steps to get what you want, but that can be done in a single query with common table expressions:
with riders as ( -- get individual riders
select distinct license, rider, total_points
from racists
), places as ( -- calculate non-dense rankings
select license, rider, rank() over (order by total_points desc) as place
from riders
)
select p.place, r.* -- join rankings into main table
from places p
join racists r on (r.license, r.rider) = (p.license, p.rider);
db<>fiddle here

PowerBI Report or SQL Query Grouping Data Spanning Columns

I'm wracking my brain trying to figure this out. I have a dataset / table that looks like this:
ID | Person1 | Person2 | Person3 | EffortPerPerson
01 | Bob | Ann | Frank | 2
02 | Frank | Bob | Joe | 3
03 | Ann | Joe | Beth | 1
I'm trying add up "Effort" for each person. For example, Bob is 2+3, Joe is 3+1, etc. My goal is to produce a PowerBI scatter plot showing total Effort for each person.
In a perfect world, the query shouldn't care how many "Person" fields there are. It should just count up the Effort value for every row that the individual's name appears.
I thought GROUP BY would work, but obviously that's only for one column, and I can't wrap my head around how to make nested queries work here.
Any one have any ideas? Thanks in advance!
As Nick suggested, you should go with the Unpivot transformation. Go to Edit Queries and select Transform tab:
Select columns you want to transform in rows, open dropdown menu under Unpivot Columns and select "Unpivot Only Selected Columns":
And that's it! Power BI will aggregate values for you:

SQL join tables with wildcard (MS Access)

how do i join following tables with wildcards? I would like to get all distinct rows from People table which contains SearchedName from SearchedPeople table.
SearchedPeople:
SearchedName
--------
Andrew
John
John Smith
People:
ID PersonName Attribute Age
----------------------------------------
1 John Smith 1 23
2 John Smith Jr 3 25
3 John Smith Jr II 4 73
4 Kevin 2 21
5 Andrew Smith 1 14
6 Marco 5 90
Desired Output:
PersonName Attribute Age
----------------------------------------
John Smith 1 23
John Smith Jr 3 25
John Smith Jr II 4 73
Andrew Smith 1 14
Code i got so far which doesnt wor. It returns three empty rows(why is that?).
SELECT b.PersonName, b.Attribute, b.Age
FROM SearchedPeople a
LEFT JOIN People b ON "%"&a.SearchedName&"%" like b.PersonName
It returns three empty rows because you don't have any columns from table a (SearchedPeople) and the LEFT JOIN didn't produce a match.
The reason is your criteria is in the wrong order you are searching for PersonName in the string %Searchedname% you need to switch that around. Also Access doesn't like the % as much as it likes the asteriks * for wilcard unless you make some changes to the query or configuration of MS-Access see below comment from Parafait.
I just tested this:
SELECT a.SearchedName
,b.PersonName, b.Attribute, b.Age
FROM
SearchedPeople a
LEFT JOIN People b
ON b.PersonName LIKE ("*" & a.SearchedName & "*")
Edit:
Good Ms Access specific information from a comment from #Parafait pasting in answer in case comment every got deleted.:
Use ALIKE and percents work. And if OP connects to MS Access via OLEDB and not the GUI .exe program, the % operator is required for LIKE statements in coded SQL. OP can also change database settings to ANSI-92 mode to always use % wildcards.

How do you read two-way tables?

I want to know what is two-way tables in SQL?
And how can i read these two-way tables
Two-way tables is no way of storing data, but of displaying data. It doesn't say anything about how the data is stored.
Let's say we store persons along with their IQ and the country they live in. The table may look like this:
name iq country
John Smith 125 GB
Mary Jones 150 GB
Juan Lopez 150 ES
Liz Allen 125 GB
The two-way table to show the relation between IQ and country would be:
| 125 | 150
---+------+----
GB | 2 | 1
ES | 0 | 1
or
| GB | ES
----+-----+---
125 | 2 | 0
150 | 1 | 1
In order to retrieve this data from the database you might write this query:
select iq, country, count(*)
from persons
group by iq, country;
SQL is meant to retrieve data; it is not really meant to care about it's presentation, the layout. So you'd write a program (in PHP, C#, Java, whatever) sending the query to the database, receiving the data and filling a GUI grid in a loop.
In rare cases SQL can provide the layout itself, i.e. give you the data in columns and rows. This is when the attributes of one dimensions are known beforehand. This is usually not the case with IQs or countries as in the example given (i.e. you wouldn't have known which countries and which IQs are present in the table, if I hadn't shown you). But of course you could have retrieved either the countries or the IQs first and then build the main query dynamically (with conditional aggregation or pivot). Another case when values are known beforehand is booleans, e.g. a flag in the persons table to show whether a person is homeless. If you wanted results for how many homeless persons in which countries, you could easily write a query with two columns for homeless and not homeless.
As mentioned: that you can display data in a two-way table doesn't say anything about how this data is stored. Above I showed you a one table example. But let's say you have stores in many cities and want to know in which cities live thinner or thicker people. You decide to check which t-shirt sizes you sold in which cities. So you take your clients orders, look up the clients and the cities they live in. You also look up the order details and the items they refer to, then take all items of type t-shirt. There are many tables involved, but the result is again a two-sided table showing the relation of two attributes. E.g:
city | S | M | L | XL
------------+-----+-----+-----+-----
New York | 5% | 8% | 7% | 10%
Los Angeles | 10% | 7% | 7% | 8%
Chicago | 1% | 4% | 6% | 11%
Houston | 2% | 2% | 5% | 7%

SQL : Group By on range of dynamic values

This is similar to some other questions here, but those use a CASE which I cannot. This is on Oracle, and I will be running the query from an excel sheet. (And by the way these do not support WITH, which makes life much harder)
I have a range of dates in one big table - like 1/3/2011, 4/5/2012, 7/1/2013, 9/1/2013.....
Then I have another table with hours worked by employees on certain dates. So what I need to do is get a sum of number of hours worked by each employee in each intervening time period. So the tables are like
Dates
1-May-2011
5-Aug-2011
4-Apr-2012
....
and another
Employee Hours Date
Sam 4 1-Jan-2011
Sam 7 5-Jan-2011
Mary 12 7-Jan-2012
Mary 5 12-Dec-2013
......
so the result should be
Employee Hours In Date Range Till
Sam 11 1-May-2011
Sam 0 5-Aug-2011
Sam 0 4-Apr-2012
Mary 0 1-May-2011
Mary 0 5-Aug-2011
Mary 12 4-Apr-2012
....
Any pointers on how to achieve this please?
I'm unfamiliar with Oracle SQL and it's abilities/limitations, but since you asked for pointers, here's my take:
Join the tables (INNER JOIN) with the join rule being EmployeeHours.Date < Dates.Dates. Then GROUP BY Employee, Dates.Dates and select the grouping columns + SUM(Hours). What you'd end up with (Using your sample data) is:
Employee | Dates | Hours
Sam | 1-May-2011 | 11
Sam | 5-Aug-2011 | 11
Sam | 4-Apr-2012 | 11
Mary | 1-May-2011 | 0
Mary | 5-Aug-2011 | 0
Mary | 4-Apr-2012 | 12
With other (more complex) data, there will be more "interesting" results, but basically each row contains total hours up to that point.
You could then use that as an input to an outer query to find MAX(Hours) for all rows where Dates < currentDates and subtract that from your result.
Again, this is not a complete answer, but it's a direction that should work.