Recursive CTE with three tables - sql

I'm using SQL Server 2008 R2 SP1.
I would like to recursively find the first non-null manager for a certain organizational unit by "walking up the tree".
I have one table containing organizational units "ORG", one table containing parents for each org. unit in "ORG", lets call that table "ORG_PARENTS" and one table containing managers for each organizational unit, lets call that table "ORG_MANAGERS".
ORG has a column ORG_ID:
ORG_ID
1
2
3
ORG_PARENTS has two columns.
ORG_ID, ORG_PARENT
1, NULL
2, 1
3, 2
MANAGERS has two columns.
ORG_ID, MANAGER
1, John Doe
2, Jane Doe
3, NULL
I'm trying to create a recursive query that will find the first non-null manager for a certain organizational unit.
Basically if I do a query today for the manager for ORG_ID=3 I will get NULL.
SELECT MANAGER FROM ORG_MANAGERS WHERE ORG_ID = '3'
I want the query to use the ORG_PARENTS table to get the parent for ORG_ID=3, in this case get "2" and repeat the query against the ORG_MANAGERS table with ORG_ID=2 and return in this example "Jane Doe".
In case the query also returns NULL I want to repeat the process with the parent of ORG_ID=2, i.e. ORG_ID=1 and so on.
My CTE attempts so far have failed, one example is this:
WITH BOSS (MANAGER, ORG_ID, ORG_PARENT)
AS
( SELECT m.MANAGER, m.ORG_ID, p.ORG_PARENT
FROM dbo.MANAGERS m INNER JOIN
dbo.ORG_PARENTS p ON p.ORG_ID = m.ORG_ID
UNION ALL
SELECT m1.MANAGER, m1.ORG_ID, b.ORG_PARENT
FROM BOSS b
INNER JOIN dbo.MANAGERS m1 ON m1.ORG_ID = b.ORG_PARENT
)
SELECT * FROM BOSS WHERE ORG_ID = 3
It returns:
Msg 530, Level 16, State 1, Line 4
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
MANAGER ORG_ID ORG_PARENT
NULL 3 2

You need to keep track of the original ID you start with. Try this:
DECLARE #ORG_PARENTS TABLE (ORG_ID INT, ORG_PARENT INT )
DECLARE #MANAGERS TABLE (ORG_ID INT, MANAGER VARCHAR(100))
INSERT #ORG_PARENTS (ORG_ID, ORG_PARENT)
VALUES (1, NULL)
, (2, 1)
, (3, 2)
INSERT #MANAGERS (ORG_ID, MANAGER)
VALUES (1, 'John Doe')
, (2, 'Jane Doe')
, (3, NULL)
;
WITH BOSS
AS
(
SELECT m.MANAGER, m.ORG_ID AS ORI, m.ORG_ID, p.ORG_PARENT, 1 cnt
FROM #MANAGERS m
INNER JOIN #ORG_PARENTS p
ON p.ORG_ID = m.ORG_ID
UNION ALL
SELECT m1.MANAGER, b.ORI, m1.ORG_ID, OP.ORG_PARENT, cnt +1
FROM BOSS b
INNER JOIN #ORG_PARENTS AS OP
ON OP.ORG_ID = b.ORG_PARENT
INNER JOIN #MANAGERS m1
ON m1.ORG_ID = OP.ORG_ID
)
SELECT *
FROM BOSS
WHERE ORI = 3
Results in:
+----------+-----+--------+------------+-----+
| MANAGER | ORI | ORG_ID | ORG_PARENT | cnt |
+----------+-----+--------+------------+-----+
| NULL | 3 | 3 | 2 | 1 |
| Jane Doe | 3 | 2 | 1 | 2 |
| John Doe | 3 | 1 | NULL | 3 |
+----------+-----+--------+------------+-----+
General tips:
Don't predefine the columns of a CTE; it's not necessary, and makes maintenance annoying.
With recursive CTE, always keep a counter, so you can limit the recursiveness, and you can keep track how deep you are.
edit:
By the way, if you want the first not null manager, you can do for example (there are many ways) this:
SELECT BOSS.*
FROM BOSS
INNER JOIN (
SELECT BOSS.ORI
, MIN(BOSS.cnt) cnt
FROM BOSS
WHERE BOSS.MANAGER IS NOT NULL
GROUP BY BOSS.ORI
) X
ON X.ORI = BOSS.ORI
AND X.cnt = BOSS.cnt
WHERE BOSS.ORI IN (3)

Related

Text search in PostgreSQL: How to order rows by column

I have a table with people's three desired job positions, ranked from first to third.
The job positions are in a separate table called "job_positions":
job_position_id job_position_title
1 bar manager
2 barista
3 waiter
4 server
The "people" table contains the person_id with the IDs of the job positions they have chosen.
person_id first_position_id second_position_id third_position_id
1 1 2 3
2 2 4
I want to search this table for a job position and order the results so that the person who has that job in their first_position, will be ranked higher than those who have it in their second or third position.
So in this example, if I search for "barista", I expect the person_id 2 to be displayed first, then person_id 1.
This is my SQL code:
SELECT person_id
TS_RANK_CD(TO_TSVECTOR('english', a.job_position_title), query_first, 1) AS first,
TS_RANK_CD(TO_TSVECTOR('english', b.job_position_title), query_second, 1) AS second,
TS_RANK_CD(TO_TSVECTOR('english', c.job_position_title), query_third, 1) AS third
FROM people
LEFT JOIN job_positions a
ON people.first_position_id = a.job_position_id
LEFT JOIN PHRASETO_TSQUERY ('barista') AS query_first
ON TO_TSVECTOR ('english', a.job_position_title) ## query_first
LEFT JOIN job_positions b
ON people.second_position_id = b.job_position_id
LEFT JOIN PHRASETO_TSQUERY ('barista') AS query_second
ON TO_TSVECTOR ('english', b.job_position_title) ## query_second
LEFT JOIN job_positions c
ON people.third_position_id = c.job_position_id
LEFT JOIN PHRASETO_TSQUERY ('barista') AS query_third
ON TO_TSVECTOR ('english', c.job_position_title) ## query_third
WHERE (TO_TSVECTOR (a.job_position_title) ## query_first OR TO_TSVECTOR (b.job_position_title) ## query_second OR TO_TSVECTOR (c.job_position_title) ## query_third)
The SQL returns the correct matches, but not ranked like they should be. Can I add some kind of score/weight to the columns, to rank them by that score?
I replicated your case with
create table job_positions (job_position_id int, job_position_title varchar);
insert into job_positions values (1, 'bar manager');
insert into job_positions values (2, 'barista');
insert into job_positions values (3, 'waiter');
insert into job_positions values (4, 'server');
create table people (person_id int, first_position_id int, second_position_id int, third_position_id int);
insert into people values (1,1,2,3);
insert into people values (2,2,4,NULL);
If I understand you correctly you want to order based on the position, if that is true, you can simply solve the problem with the following
with unpivoting as (
select person_id, 1 as position, first_position_id as job_position_id from people UNION ALL
select person_id, 2 as position, second_position_id as job_position_id from people UNION ALL
select person_id, 3 as position, third_position_id as job_position_id from people
)
select job_position_title, unpivoting.job_position_id, position, person_id from unpivoting join job_positions on unpivoting.job_position_id = job_positions.job_position_id order by unpivoting.job_position_id, position, person_id;
with the expected result
job_position_title | job_position_id | position | person_id
--------------------+-----------------+----------+-----------
bar manager | 1 | 1 | 1
barista | 2 | 1 | 2
barista | 2 | 2 | 1
waiter | 3 | 3 | 1
server | 4 | 2 | 2
(5 rows)
You want to unpivot the job preferences for each user. In fact, you might want to store the data in the unpivoted way -- which is more commonly called normalized.
In Postgres, you can use a lateral join to unpivot:
select p.*
from people p cross join lateral
(values (1, p.first_position_id),
(2, p.second_position_id),
(3, p.third_position_id)
) v(ord, job_position_id) join
job_positions jp
using (job_position_id)
where jp.job_position_title = ?
order by v.ord;

Find Last Record in Chain - a Customer Merge Process

I am importing customer data from a another vendor's system and we have merge processes that we use to identify potential duplicate customer accounts and them merge them if they meet certain criteria - like same first name, last name, SSN and DOB. In this process, I am seeing where we are creating chains - for instance, Customer A is merged to Customer B who is then merged to Customer C.
What I am hoping to do it to identify these chains and update the customer record to point to the last record in the chain. So in my example above, Customer A and Customer B would both have Customer C's id in their merged To field.
CustID FName LName CustStatusType isMerged MergedTo
1 Kevin Smith M 1 2
2 Kevin Smith M 1 3
3 Kevin Smith M 1 4
4 Kevin Smith O 0 NULL
5 Mary Jones O 0 NULL
6 Wyatt Earp M 1 7
7 Wyatt Earp O 1 NULL
8 Bruce Wayn M 1 10
9 Brice Wayne M 1 10
10 Bruce Wane M 1 11
11 Bruce Wayne O 1 NULL
CustStatusType indicates if the customer account is open ("O") or merged ("M"). And then we have an isMerged field as a BIT field that indicates whether the account has been merged and finally the MergedTo field that indicates what customer account the record was merged to.
With the example provided, what I would like to achieve would to have the CustID records of 1 & 2 have their MergedTo record set to 3 - while CustID 3 could either be updated or left as is. For Cust IDs 4, 5, and 6 - these records are find and do not need to be updated. But on Cust IDs 8 - 10, I would like these records to be set to 11 - like the table below.
CustID FName LName CustStatusType isMerged MergedTo
1 Kevin Smith M 1 4
2 Kevin Smith M 1 4
3 Kevin Smith M 1 4
4 Kevin Smith O 0 NULL
5 Mary Jones O 0 NULL
6 Wyatt Earp M 1 7
7 Wyatt Earp O 1 NULL
8 Bruce Wayn M 1 11
9 Brice Wayne M 1 11
10 Bruce Wane M 1 11
11 Bruce Wayne O 1 NULL
I haven't been able to figure out how to achieve this with TSQL - suggestions?
Test Data:
DROP TABLE IF EXISTS #Customers;
CREATE TABLE #Customers
(
CustomerID INT ,
FirstName VARCHAR (25) ,
LastName VARCHAR (25) ,
CustomerStatusTypeID VARCHAR (1) ,
isMerged BIT ,
MergedTo INT
);
INSERT INTO #Customers
VALUES ( 1, 'Kevin', 'Smith', 'M', 1, 2 ) ,
( 2, 'Kevin', 'Smith', 'M', 1, 3 ) ,
( 3, 'Kevin', 'Smith', 'M', 1, 4 ) ,
( 4, 'Kevin', 'Smith', 'O', 0, NULL ) ,
( 5, 'Mary', 'Jones', 'O', 0, NULL ) ,
( 6, 'Wyatt', 'Earp', 'M', 1, 7 ) ,
( 7, 'Wyatt', 'Earp', 'O', 1, NULL ) ,
( 8, 'Bruce', 'Wayn', 'M', 1, 10 ) ,
( 9, 'Brice', 'Wayne', 'M', 1, 10 ) ,
( 10, 'Bruce', 'Wane', 'M', 1, 11 ) ,
( 11, 'Bruce', 'Wayne', 'O', 1, NULL );
SELECT *
FROM #Customers;
DROP TABLE #Customers;
For your example soundex() seems good enough. It returns a code, that is based on the word's pronunciation in English. Use it on the first and last name to join the customer table and a subquery which queries the customer table adding the row_number() partitioned by the Soundex of the names and order descending by the ID -- to number the "latest" record with 1. For the join condition use the Soundex of the names, a row number of 1 and of course inequality of the IDs.
UPDATE c1
SET c1.mergedto = x.customerid
FROM #customers c1
LEFT JOIN (SELECT c2.customerid,
soundex(c2.firstname) sefn,
soundex(c2.lastname) seln,
row_number() OVER (PARTITION BY soundex(c2.firstname),
soundex(c2.lastname)
ORDER BY c2.customerid DESC) rn
FROM #customers c2) x
ON x.sefn = soundex(c1.firstname)
AND x.seln = soundex(c1.lastname)
AND x.rn = 1
AND x.customerid <> c1.customerid;
db<>fiddle
I don't really get the concept behind the customerstatustypeid and ismerged columns. As what I understand, they're all derived from whether mergedto is null or not. But the sample data neither the expected result doesn't support that. But as these columns apparently don't change between your sample input and output I guess it's alright, that I just left them alone.
If Soundex proves to be insufficient for your needs you may want to look for other string distance metrics, like the Levenshtein distance. AFAIK there's no implementation of that included in SQL Server but search engines may spit out implementations by third parties or maybe there's something that can used via CLR. Or you roll your own, of course.
Below query finds the latest CustomerID which is match to each customer and returns the id in Ref column
select *
, Ref = (select top 1 CustomerID from #Customers where soundex(FirstName) = soundex(ma.FirstName) and soundex(LastName) = soundex(ma.LastName) order by CustomerID desc)
from #Customers ma
using below update, you can update MergedTo column
;with ct as (
select *
, Ref = (select top 1 CustomerID from #Customers where soundex(FirstName) = soundex(ma.FirstName) and soundex(LastName) = soundex(ma.LastName) order by CustomerID desc)
from #Customers ma
)
update c1
set c1.MergedTo = iif(c1.CustomerID = ct.Ref, null, ct.Ref)
from #Customers c1
inner join ct on ct.CustomerID = c1.CustomerID
Final data in Customer table after update
Recursion can be used for this:
WITH CTE as
(
SELECT P.CustomerID, P.MergedTo, CAST(P.CustomerID AS VarChar(Max)) as Levels
FROM #Customers P
WHERE P.MergedTo IS NULL
UNION ALL
SELECT P1.CustomerID, P1.MergedTo, M.Levels + ', ' + CAST(P1.CustomerID AS VarChar(Max))
FROM #Customers P1
INNER JOIN CTE M ON M.CustomerID = P1.MergedTo
)
SELECT
CustomerID
, MergedTo
, x -- "end of chain"
, Levels
FROM CTE
CROSS APPLY (
SELECT LEFT(levels,charindex(',',levels+',')-1) x
) a
WHERE MergedTo IS NOT NULL
Result:
+----+------------+----------+----+------------+
| | CustomerID | MergedTo | x | levels |
+----+------------+----------+----+------------+
| 1 | 10 | 11 | 11 | 11, 10 |
| 2 | 8 | 10 | 11 | 11, 10, 8 |
| 3 | 9 | 10 | 11 | 11, 10, 9 |
| 4 | 6 | 7 | 7 | 7, 6 |
| 5 | 3 | 4 | 4 | 4, 3 |
| 6 | 2 | 3 | 4 | 4, 3, 2 |
| 7 | 1 | 2 | 4 | 4, 3, 2, 1 |
+----+------------+----------+----+------------+
Note the string levels is formed by the recursion, and in the manner this is concatenated the first part will be the "end of chain" (see column x). That first part is extracted using a cross apply although using an apply isn't essential.
Available as a demo

Displaying whole table after stripping characters in SQL Server

This question has 2 parts.
Part 1
I have a table "Groups":
group_ID person
-----------------------
1 Person 10
2 Person 11
3 Jack
4 Person 12
Note that not all data in the "person" column have the same format.
In SQL Server, I have used the following query to strip the "Person " characters out of the person column:
SELECT
REPLACE([person],'Person ','')
AS [person]
FROM Groups
I did not use UPDATE in the query above as I do not want to alter the data in the table.
The query returned this result:
person
------
10
11
12
However, I would like this result instead:
group_ID person
-------------------
1 10
2 11
3 Jack
4 12
What should be my query to achieve this result?
Part 2
I have another table "Details":
detail_ID group1 group2
-------------------------------
100 1 2
101 3 4
From the intended result in Part 1, where the numbers in the "person" column correspond to those in "group1" and "group2" of table "Details", how do I selectively convert the numbers in "person" to integers and join them with "Details"?
Note that all data under "person" in Part 1 are strings (nvarchar(100)).
Here is the intended query output:
detail_ID group1 group2
-------------------------------
100 10 11
101 Jack 12
Note that I do not wish to permanently alter anything in both tables and the intended output above is just a result of a SELECT query.
I don't think first part will be a problem here. Your query is working fine with your expected result.
Schema:
CREATE TABLE #Groups (group_ID INT, person VARCHAR(50));
INSERT INTO #Groups
SELECT 1,'Person 10'
UNION ALL
SELECT 2,'Person 11'
UNION ALL
SELECT 3,'Jack'
UNION ALL
SELECT 4,'Person 12';
CREATE TABLE #Details(detail_ID INT,group1 INT, group2 INT);
INSERT INTO #Details
SELECT 100, 1, 2
UNION ALL
SELECT 101, 3, 4 ;
Part 1:
For me your query is giving exactly what you are expecting
SELECT group_ID,REPLACE([person],'Person ','') AS person
FROM #Groups
+----------+--------+
| group_ID | person |
+----------+--------+
| 1 | 10 |
| 2 | 11 |
| 3 | Jack |
| 4 | 12 |
+----------+--------+
Part 2:
;WITH CTE AS(
SELECT group_ID
,REPLACE([person],'Person ','') AS person
FROM #Groups
)
SELECT D.detail_ID, G1.person, G2.person
FROM #Details D
INNER JOIN CTE G1 ON D.group1 = G1.group_ID
INNER JOIN CTE G2 ON D.group1 = G2.group_ID
Result:
+-----------+--------+--------+
| detail_ID | person | person |
+-----------+--------+--------+
| 100 | 10 | 10 |
| 101 | Jack | Jack |
+-----------+--------+--------+
Try following query, it should give you the desired output.
;WITH MT AS
(
SELECT
GroupId, REPLACE([person],'Person ','') Person
AS [person]
FROM Groups
)
SELECT Detail_Id , MT1.Person AS group1 , MT2.Person AS AS group2
FROM
Details D
INNER JOIN MT MT1 ON MT1.GroupId = D.group1
INNER JOIN MT MT2 ON MT2.GroupId= D.group2
The first query works
declare #T table (id int primary key, name varchar(10));
insert into #T values
(1, 'Person 10')
, (2, 'Person 11')
, (3, 'Jack')
, (4, 'Person 12');
declare #G table (id int primary key, grp1 int, grp2 int);
insert into #G values
(100, 1, 2)
, (101, 3, 4);
with cte as
( select t.id, t.name, ltrim(rtrim(replace(t.name, 'person', ''))) as sp
from #T t
)
-- select * from cte order by cte.id;
select g.id, c1.sp as grp1, c2.sp as grp2
from #G g
join cte c1
on c1.id = g.grp1
join cte c2
on c2.id = g.grp2
order
by g.id;
id grp1 grp2
----------- ----------- -----------
100 10 11
101 Jack 12

List the names of people that have never scored above 3

From a table like this:
Name | Score
------ | ------
Bill | 1
Bill | 2
Bill | 1
Steve | 1
Steve | 4
Steve | 1
Return the names of people that have never scored above 3
Answer would be:
Name |
------ |
Bill |
The key is to get the maximum score for each person, then filter to those whose maximum is less than 3. To get the maximum you need to do an aggregate (GROUP BY and MAX). Then to apply filters to aggregates you must use HAVING rather than WHERE. So you would end up with:
SELECT Name, MAX(Score) AS HighScore
FROM Table
GROUP BY Name
HAVING MAX(Score) <= 3;
one solution would be:
SELECT DISTINCT name
FROM mytable
WHERE Name NOT IN
( SELECT Name
FROM mytable
WHERE score > 3
)
sample table :
DECLARE #Table1 TABLE
(Name varchar(5), Score int)
;
INSERT INTO #Table1
(Name, Score)
VALUES
('Bill', 1),
('Bill', 2),
('Bill', 1),
('Steve', 1),
('Steve', 4),
('Steve', 1)
;
Script :
;with CTE AS (
select Name,Score from #Table1
GROUP BY Name,Score
HAVING (Score) > 3 )
Select
NAME,
Score
from #Table1 T
where not EXISTS
(select name from CTE
where name = T.Name )
Result :
NAME Score
Bill 1
Bill 2
Bill 1
SELECT name
FROM table_name
WHERE score < 3

SQL Server : query multivalued table recursive

I have a SQL problem which I cannot solve
There are 2 tables mms and mms_mv which are linked via object_id.
The mms_mv is a multivalue table and the content is group memberships and group manager which also can be an other group.
This runs on SQL Server
mms:
|object_id|attribute_type|objectSid|
| 1 |user | a |
| 2 | group | b |
| 3 | group | c |
| 4 | group | d |
| 5 | group | f
mms_mv:
|object_id|attribute_name|reference_id|
| 2 | member | 1 |
| 3 | manager | 1 |
| 4 | manager | 2 |
I am trying to find out which groups a user can manage either directly or indirectly via nested groups.
In the example above the user (1) is member of group Number 2 and group 2 is Manager of group 4
user 1 is manager of group 3 directly.
Which groups can be managed by the user?
So the output I need is group 3 and 4
select
accountname, objectsid, mms1.reference_id as ManagerID,
mms2.object_id
from
dbo.mms_mv_link as mms1 with (nolock)
inner join
dbo.mms_metaverse as mms2 with (nolock) on mms1.object_id = mms2.object_id
where
mms2.object_type ='group'
and mms1.attribute_name = 'manager'
and mms1.reference_id in (1, 3)
This is the best I came up with to find out which of all Group id's and user id I submitted are Manager of a Group. I used an other lookup to get the groups a user is in.
My problem are the nested groups, by long thinking and googling I am not sure if it is even possible to create such a query.
I can find out all groups a user is member of, but I also need the Groups in which these groups are members.
Well I am happy if anyone has some ideas or hints for me to figure this one out.
I am even happy if you have a recommendation for a good sql book which covers such complex queries.
Thank you all for helping me.
I think that the following recursive CTE will give you what you want:
;WITH cte AS
(
SELECT m2.object_id AS groupID, m2.attribute_name
FROM #mms AS m1
INNER JOIN #mms_mv AS m2 ON m1.object_id = m2.reference_id
INNER JOIN #mms m3 ON m2.object_id = m3.object_id
WHERE m1.attribute_type = 'user' AND m3.attribute_type = 'group'
UNION ALL
SELECT m.object_id AS groupID, m.attribute_name
FROM cte AS c
INNER JOIN #mms_mv AS m ON c.groupID = m.reference_id
)
SELECT *
FROM cte
WHERE attribute_name <> 'member'
The so-called 'anchor' query of the CTE returns all groups that every user either manages or is member of. Using recursion we get all other groups managed by either the groups of the original set or by any 'intermediate' set.
With these data as input:
DECLARE #mms TABLE (object_id INT, attribute_type VARCHAR(10), objectSid VARCHAR(10))
DECLARE #mms_mv TABLE (object_id INT, attribute_name VARCHAR(10), reference_id INT)
INSERT #mms VALUES
( 1, 'user', 'a'),
( 2, 'group', 'b'),
( 3, 'group', 'c'),
( 4, 'group', 'd'),
( 5, 'group', 'f')
INSERT #mms_mv VALUES
( 2, 'member', 1),
( 3, 'manager', 1),
( 4, 'manager', 2),
( 5, 'manager', 3)
the above query yields the following output:
groupID attribute_name
----------------------
3 manager
5 manager
4 manager