Recursive CTE Concept Confusion - sql
I am trying to understand the concepts of using CTE in my SQL code. I have gone through a number of online posts explaining the concept but I cannot grasp how it iterates to present the hierarchical data. One of the widely used examples to explain the R-CTE is the Employee and ManagerID Example as below:
USE AdventureWorks
GO
WITH Emp_CTE AS (
SELECT EmployeeID, ContactID, LoginID, ManagerID, Title, BirthDate
FROM HumanResources.Employee
WHERE ManagerID IS NULL
UNION ALL
SELECT e.EmployeeID, e.ContactID, e.LoginID, e.ManagerID, e.Title, e.BirthDate
FROM HumanResources.Employee e
INNER JOIN Emp_CTE ecte ON ecte.EmployeeID = e.ManagerID
)
SELECT *
FROM Emp_CTE
GO
The anchor query will grab the manager. After that I can't understand how it would bring the other employees if the recursive query is calling the anchor query again and again and the anchor query just has a single record which is the manager.
So you want to understand a recursive CTE.
It's simple really.
First there's the seed query which gets the original records.
SELECT EmployeeID, ContactID, LoginID, ManagerID, Title, BirthDate
FROM HumanResources.Employee
WHERE ManagerID IS NULL
In your case it's the employees without a manager.
Which would be the boss(es)
To demonstrate with a simplified example:
EmployeeID LoginID ManagerID Title
---------- ------- --------- ------------
101 boss NULL The Boss
The second query looks for employees that have the previous record as a manager.
SELECT e.EmployeeID, e.ContactID, e.LoginID, e.ManagerID, e.Title, e.BirthDate
FROM HumanResources.Employee e
INNER JOIN Emp_CTE ecte ON ecte.EmployeeID = e.ManagerID
Since it's a recursive CTE, the CTE uses itself in the second query.
You could see it as a loop, where it uses the previous records to get the next.
For the first iteration of that recursive loop you could get something like this:
EmployeeID LoginID ManagerID Title
---------- ------- --------- ------------
102 head1 101 Top Manager 1
103 head2 101 Top Manager 2
For the second iteration it would use the records from that first iteration to find the next.
EmployeeID LoginID ManagerID Title
---------- ------- --------- ------------
104 bob 102 Department Manager 1
105 hilda 102 Department Manager 2
108 john 103 Department Manager 4
109 jane 103 Department Manager 5
For the 3th iteration it would use the records from the 2nd iteration.
...
And this continues till there are no more employees to join on the ManagerID
Then after all the looping, the CTE will return all the records that were found through all those iterations.
Well, a short introduction to recursive CTEs:
A recursive CTE is rather something iterativ, than really recursive. The anchor query is taken to get some initial result set. With this set we can dive deeper. Try these simple cases:
Just a counter, not even a JOIN needed...
The 1 of the anchor will lead to a 2 in the UNION ALL. This 2 is passed into the UNION ALL again and will be returned as a 3 and so on...
WITH recCTE AS
(
SELECT 1 AS Mycounter
UNION ALL
SELECT recCTE.MyCounter+1
FROM recCTE
WHERE recCTE.MyCounter<10
)
SELECT * FROM recCTE;
A counter of 2 columns
This is exactly the same as above. But we have two columns and deal with them separately.
WITH recCTE AS
(
SELECT 1 AS Mycounter1, 10 AS MyCounter2
UNION ALL
SELECT recCTE.MyCounter1+1,recCTE.MyCounter2+1
FROM recCTE
WHERE recCTE.MyCounter1<10
)
SELECT * FROM recCTE;
Now we have two rows in the initial query
Running alone, the initial query will return two rows. Both with the counter==1 and two different values for the Nmbr-column
WITH recCTE AS
(
SELECT MyCounter=1, Nmbr FROM(VALUES(1),(10)) A(Nmbr)
UNION ALL
SELECT recCTE.MyCounter+1, recCTE.Nmbr+1
FROM recCTE
WHERE recCTE.MyCounter<10
)
SELECT * FROM recCTE ORDER BY MyCounter,Nmbr;
Now we get 20 rows back, not 10 as in the examples before. This is, because both rows of the anchor are used independently.
We can use the recursive CTE in a JOIN
In this example we will create a derived set first, then we will join this to the recursive CTE. Guess why the first row carries "X" instead of "A"?
WITH SomeSet AS (SELECT * FROM (VALUES(1,'A'),(2,'B'),(3,'C'),(4,'D'),(5,'E'),(6,'F'),(7,'G'),(8,'H'),(9,'I'),(10,'J')) A(id,Letter))
,recCTE AS
(
SELECT MyCounter=1, Nmbr,'X' AS Letter FROM(VALUES(1),(10)) A(Nmbr)
UNION ALL
SELECT recCTE.MyCounter+1, recCTE.Nmbr+1, SomeSet.Letter
FROM SomeSet
INNER JOIN recCTE ON SomeSet.id=recCTE.MyCounter+1
WHERE recCTE.MyCounter<10
)
SELECT * FROM recCTE ORDER BY MyCounter,Nmbr;
This will use a self-referring join to simulate your hierarchy, but with one gap-less chain
WITH SomeSet AS (SELECT * FROM (VALUES(1,'A',NULL),(2,'B',1),(3,'C',2),(4,'D',3),(5,'E',4),(6,'F',5),(7,'G',6),(8,'H',7),(9,'I',8),(10,'J',9)) A(id,Letter,Previous))
,recCTE AS
(
SELECT id,Letter,Previous,' ' PreviousLetter FROM SomeSet WHERE Previous IS NULL
UNION ALL
SELECT SomeSet.id,SomeSet.Letter,SomeSet.Previous,recCTE.Letter
FROM SomeSet
INNER JOIN recCTE ON SomeSet.Previous=recCTE.id
)
SELECT * FROM recCTE:
And now almost the same as before, but with several elements with the same "previous".
This is - in principles - your hierarchy
WITH SomeSet AS (SELECT * FROM (VALUES(1,'A',NULL),(2,'B',1),(3,'C',2),(4,'D',2),(5,'E',2),(6,'F',3),(7,'G',3),(8,'H',4),(9,'I',1),(10,'J',9)) A(id,Letter,Previous))
,recCTE AS
(
SELECT id,Letter,Previous,' ' PreviousLetter FROM SomeSet WHERE Previous IS NULL
UNION ALL
SELECT SomeSet.id,SomeSet.Letter,SomeSet.Previous,recCTE.Letter
FROM SomeSet
INNER JOIN recCTE ON SomeSet.Previous=recCTE.id
)
SELECT * FROM recCTE
Conclusio
The key points
The anchor query must return at least one row, but may return many
The second part must match the column list (as any UNION ALL query)
The second part must refer to the cte in its FROM-clause
either directly, or
through a JOIN
The second part will be called over and over using the result of the call before
Each row is handled separately (a hidden RBAR)
You can start with a Manager (top-most-node) and walk down by querying for employees with this manager id, or
You can start with the lowest in hierarchy (the ones, where no other row exists, using their id as manager id) and move up the list
As it is a hidden RBAR you can use this for row-by-row actions like string cummulation.
An example for the last statement
See how the column LetterPath is built.
WITH SomeSet AS (SELECT * FROM (VALUES(1,'A',NULL),(2,'B',1),(3,'C',2),(4,'D',2),(5,'E',2),(6,'F',3),(7,'G',3),(8,'H',4),(9,'I',1),(10,'J',9)) A(id,Letter,Previous))
,recCTE AS
(
SELECT id,Letter,Previous,' ' PreviousLetter,CAST(Letter AS VARCHAR(MAX)) AS LetterPath FROM SomeSet WHERE Previous IS NULL
UNION ALL
SELECT SomeSet.id,SomeSet.Letter,SomeSet.Previous,recCTE.Letter,recCTE.LetterPath + SomeSet.Letter
FROM SomeSet
INNER JOIN recCTE ON SomeSet.Previous=recCTE.id
)
SELECT * FROM recCTE
It's all about recursive step: firstly, root is used to proceed first step of recursion, so:
SELECT EmployeeID, ContactID, LoginID, ManagerID, Title, BirthDate
FROM HumanResources.Employee
WHERE ManagerID IS NULL
This provides first set of records.
Second set of records will be queried based on first set (anchor), so it will query all employees, which have manager in first set.
Second step of recursion will be based on second result set, not the anchor.
Third step will be based on third result set, etc.
Related
SQL query to clean up/omit missing values depending on another table
I'm looking for a query that is able to omit certain values (which are missing in another table). Trying to explain it using an example: Table 1 - Person ID Name 1 Jane 2 Joe 3 Jose Table 2 - Schedule Date Employees 1/1 Jane,Joe,Jose 2/1 Alice,Jane 3/1 Joe,Bob,Jose 4/1 Alice,Bob Expected result - missing values omitted Date Employees 1/1 Jane,Joe,Jose 2/1 Jane 3/1 Joe,Jose 4/1 Is that even possible to achieve with SQL, and if so, how? Disclaimer: I do not have any impact on the design of the tables. I know that the structure is far from ideal, but there is no way to change it.
You want a normalized schedule table. You can create that on-the-fly with a recursive query or a combination of a lateral cross join and unnesting an array that you create from the substrings. Put this in a CTE (WITH clause) and then do your aggregation. With an array and UNNEST with good_schedule as ( select s.date, e.employee_name from schedule s cross join lateral unnest(string_to_array(employees, ',')) as e(employee_name) ) select s.date, string_agg(p.name, ',' order by p.name) as employees from good_schedule s left outer join person p on p.name = s.employee_name group by s.date order by s.date; With a recursive CTE with recursive good_schedule(date, employees, employee_name, pos) as ( select date, employees, split_part(employees, ',', 1), 1 from schedule s union all select date, employees, split_part(employees, ',', pos+1) as employee_name, pos+1 from good_schedule where split_part(employees, ',', pos+1) <> '' ) select s.date, string_agg(p.name, ',' order by p.name) as employees from good_schedule s left outer join person p on p.name = s.employee_name group by s.date order by s.date; Demo: https://dbfiddle.uk/A42E5oYh
Get result from Multiple Select statement on a single Query
I'm trying to create a stored procedure that will display the result of users managed by a manager. I have tried to use CTE but still was unsuccessful. What I want, The first query to select the row whose user name = #name Second query: return users that are managed by (first query ManagerId) Third query: return all users that are managed by each (second query ManagerId) This is the structure of the data: SQL query: WITH EmployeeCTE AS ( (SELECT UserId, Email, ManagerId, Name FROM Table1 WHERE DisplayName LIKE '%Paul%') tbl1 (SELECT UserId, Email, ManagerId, Name FROM Table1 WHERE ManagerId = tbl1.UserId) tbl2 (SELECT UserId, Email, ManagerId, Name FROM Table1 WHERE ManagerId = tbl1.UserId) tbl3 ) --Lastly SELECT * FROM EmployeeCTE Please help anyone.
You just simply need a proper recursive CTE - something like this: WITH EmployeeCTE AS ( -- "anchor" for the query SELECT UserId, Email, ManagerId, Name, [Level] = 1 FROM dbo.Table1 WHERE Name LIKE '%Paul%' -- I would personally probably use this condition instead -- ManagerId IS NULL UNION ALL -- recursive part SELECT t1.UserId, t1.Email, t1.ManagerId, t1.Name, e.[Level] + 1 FROM dbo.Table1 t1 INNER JOIN EmployeeCTE e ON t1.ManagerId = e.UserId ) SELECT * FROM EmployeeCTE This selects the "anchor" row (or rows), and then "recurses" the manager/employee relationship based on Employee.ManagerId = Manager.UserId. I've also added the Level column so you see on which level (of the hierarchy) each entry is located - the "anchor" will be level 1, each further level down is incremented by 1. PS: if you need to limit the returned data set to the root level + max. of 2 levels down, you can use the Level column to do so in your final SELECT that selects from the CTE: WITH EmployeeCTE AS ( --- as above ) SELECT * FROM EmployeeCTE WHERE [Level] <= 3 -- select root level (1) and max. of 2 levels down
SQL CASE Statement - Return first match (ignore other matches)
I have a simple problem - I have two tables (table A and B) with records for staff members in each. A staff member may be reflected in both tables. I'm trying to put together a case statement that returns the first match for an employee from Table A and then exits the case statement (i.e., do not try to find that same employee in Table B). Right now, my current code returns matches from both Table A and Table B for that employee. How can I stop this?
How about something like this: with AllStaff as ( select 1 as Level, StaffId, Name from TableA union all select 2 as Level, StaffId, Name from TableB ), DistinctStaff as ( select distinct StaffId from AllStaff ) select s.StaffId, sRec.* from DistinctStaff as s outer apply (select top(1) * from AllStaff as a where a.StaffId = s.StaffId order by a.Level) as sRec group by s.StaffId
Sql max trophy count
I Create DataBase in SQL about Basketball. Teacher give me the task, I need print out basketball players from my database with the max trophy count. So, I wrote this little bit of code: select surname ,count(player_id) as trophy_count from dbo.Players p left join Trophies t on player_id=p.id group by p.surname and SQL gave me this: but I want, that SQL will print only this: I read info about select in selects, but I don't know how it works, I tried but it doesn't work.
Use TOP: SELECT TOP 1 surname, COUNT(player_id) AS trophy_count -- or TOP 1 WITH TIES FROM dbo.Players p LEFT JOIN Trophies t ON t.player_id = p.id GROUP BY p.surname ORDER BY COUNT(player_id) DESC; If you want to get all ties for the highest count, then use SELECT TOP 1 WITH TIES.
;WITH CTE AS ( select surname ,count(player_id) as trophy_count from dbo.Players p group by p.surname; ) select * from CTE where trophy_count = (select max(trophy_count) from CTE) While select top with ties works (and is probably more efficient) I would say this code is probably more useful in the real world as it could be used to find the max, min or specific trophy count if needed with a very simple modification of the code. This is basically getting your group by first, then allowing you to specify what results you want back. In this instance you can use max(trophy_count) - get the maximum min(trophy_count) - get the minimum # i.e. - where trophy_count = 3 - to get a specific trophy count avg(trophy_count) - get the average trophy_count There are many others. Google "SQL Aggregate functions"
You will eventually go down the rabbit hole of needing to subsection this (examples are by week or by league). Then you are going to want to use windows functions with a cte or subquery) For your example: ;with cte_base as ( -- Set your detail here (this step is only needed if you are looking at aggregates) select surname,Count(*) Ct left join Trophies t on player_id=p.id group by p.surname , cte_ranked as -- Dense_rank is chosen because of ties -- Add to the partition to break out your detail like by league, surname ( select * , dr = DENSE_RANK() over (partition by surname order by Ct desc) from cte_base ) select * from cte_ranked where dr = 1 -- Bring back only the #1 of each partition This is by far overkill but helping you lay the foundation to handle much more complicated queries. Tim Biegeleisen's answer is more than adequate to answer you question.
Random join in oracle
I have three queries and I want result rows consisting of entries of these queries randomly joined next to each other. I dont't want to union the results, but to join them in a more or less random way (oroginal distribution may be kept, or can be unified across all). I tried the following: select * from ( SELECT street, number FROM Addresses WHERE valid = '1' order by Dbms_Random.Value ) q1 , ( select prename from person order by Dbms_Random.Value ) q2 , ( select surname from person order by Dbms_Random.Value ) q3 My result set however looks not random at all: Main street, 1, Andre, Smith Main street, 1, Andre, Warnes Main street, 1, Andre, Jackson Main street, 1, Andre, Macallister Removing the ORDER BY from the queries and applying it to the result of the cartesian product is extremely inefficient as the tables are large, and espacially their cartesian product.
Colin 't Hart diagnosed the problem, and suggested a work around using rownum. But the solution is slightly more complicated then that because the ROWNUM is assigned before the ORDER BY if they both appear in the same SELECT. The solution is to add one extra subquery level. with randomAddress as( select rownum id, street, num from ( select * from addresses where valid=1 order by dbms_random.random ) ), randomPrename as( select rownum id, prename from( select * from person order by dbms_random.random ) ), randomSurname as( select rownum id, surname from( select * from person order by dbms_random.random ) ) select street, num, prename, surname from randomAddress join randomPrename using(id) join randomSurname using(id) ; This solution will always return a number of random rows that is equal to the number of rows in the smallest table. No row will be used more than once. Here is the SQL Fiddle. The number of rows returned by the GWu solution will vary depending on how many rows are assigned the same random number. Some rows may be used multiple times, and other rows not at all. You should also have an idea of how many rows are in the tables to use that solution.
You could move Dbms_Random.Value to a column in your subquery and join by it. This will randomize the result and also get rid of the order by: select * from ( SELECT street, snumber, ROUND(Dbms_Random.Value(1,10)) n FROM Addresses WHERE valid = '1' ) q1 , ( select prename, ROUND(Dbms_Random.Value(1,10)) n from person ) q2 , ( select surname, ROUND(Dbms_Random.Value(1,10)) n from person ) q3 where q1.n = q2.n and q2.n = q3.n ; (see also http://www.sqlfiddle.com/#!4/a26d0/9) The value 10 in ROUND(Dbms_Random.Value(1,10)) is just an assumption, change it to your number of expected or available records. Note that this solution reuses results from each subquery, so for example prename might be used more than once or not at all, but that was also the case in your original cartesian join. Colin's approach ensures uniqueness, if you need that.
The problem you're having is that while each table is being ordered randomly, you still have a cartesian product so that the tops rows will have the first 2 columns having the same values with only the last column varying. If you select the pseudo column ROWNUM (you'll need to alias it as eg row_number), and then join the 3 tables on row_number, you should get a random combination of data from your 3 tables. But you'll be limited to a total number of rows equal to the number of rows in the smallest table.