T-SQL Execute Recursion for Each Value in Multiple Columns - sql

I have a dataset with two columns. A person in the first column may be in the span of control of a person in the second column (i.e. everyone is in Michael's span of control, Dwight & Stanley are in Jim's span of control):
source_id source target_id target
1 Michael Scott 5 Kelly Kapoor
3 Dwight Schrute 2 Jim Halpert
4 Stanley Hudson 2 Jim Halpert
2 Jim Halpert 5 Kelly Kapoor
I have a table that lists each person and their supervisor:
person_id person supervisor_id supervisor
1 Michael Scott 0 None
2 Jim Halpert 1 Michael Scott
3 Dwight Schrute 2 Jim Halpert
4 Stanley Hudson 2 Jim Halpert
6 Ryan Howard 1 Michael Scott
5 Kelly Kapoor 6 Ryan Howard
I have a block of code that uses recursion to find a single person's span of control from the preceding table:
with anchor as
(
select person_id, supervisor_id from table where unique_id = #ID
union all
select a.person_id, a.supervisor_id
from table a
inner join Anchor b ON b.person_id = a.supervisor_id
)
select a.person_id
from anchor a
This block of code can be turned into a stored procedure or a function. Running it for Jim (for example), returns:
person_id
3 (Dwight Schrute)
4 (Stanley Hudson)
How do I compare each value in the initial dataset (from both the source and target columns) to the values in the column returned by the preceding block of code? In other words, for each row in source, I need to check if that name is within the span of control of target. In addition, for each row in target, I need to check if that name is within the span of control of source.
Desired End Result:
source_id source target_id target isEitherPersonInOther'sSOC
1 Michael Scott 5 Kelly Kapoor Yes
3 Dwight Schrute 2 Jim Halpert Yes
4 Stanley Hudson 2 Jim Halpert Yes
2 Jim Halpert 5 Kelly Kapoor No
I know iterations are bad (i.e. running the stored procedure for each row with a cursor or while loop). I also know cross apply and a function together may work, but I have been unable to figure my way through that.
Thank you for any insight you all may have!

What it looks like you need is to take your recursive CTE and output ID pairs into a temp table. I'm picturing something like this:
DECLARE #id int
DECLARE cx CURSOR FOR
SELECT person_id FROM PersonSupervisor
CREATE TABLE #tmpPS (Supervisor int, personInChain int)
OPEN cx
FETCH NEXT FROM cx
INTO #id
WHILE ##FETCH_STATUS = 0
BEGIN
with anchor as
(
select person_id, supervisor_id from table where unique_id = #ID
union all
select a.person_id, a.supervisor_id
from table a
inner join Anchor b ON b.person_id = a.supervisor_id
)
INSERT INTO #tmpPS
select #id, a.person_id
from anchor a
FETCH NEXT FROM cx
INTO #id
END
Close cx
Deallocate cx
This creates a table of all relationships, recursively expanded. Then you can output whether any given person is either above or below a given other person with this subquery. Add it to whatever query outputs your base grid:
SELECT
SourceId,
Source,
TargetId,
Target,
CASE
WHEN EXISTS (SELECT 1 FROM #tmpPS WHERE Supervisor = SourceId and PersonInChain = TargetId)
THEN 'Yes'
WHEN EXISTS (SELECT 1 FROM #tmpPS WHERE Supervisor = TargetId and PersonInChain = SourceId)
THEN 'Yes'
ELSE 'No'
END as [isEitherPersonInOther'sSOC]
FROM ....
This also implies a version of this where you can separate the relationships out - If the first query, the TargetId is a subordinate of the SourceId. If the second query, then TargetId is a superior to SourceId.

There has to be a better way to do this but this is definitely one way to go about it -- note the code creates permanent tables to use in a function:
create table dbo.[source]
(
source_id int,
[source] nvarchar(500),
target_id int,
[target] nvarchar(500)
)
insert into [source]
select 1, 'Michael Scott', 5, 'Kelly Kapoor'
union all select 3,'Dwight Schrute',2,'Jim Halpert'
union all select 4 ,'Stanley Hudson',2,'Jim Halpert'
union all select 2 ,'Jim Halpert',5,'Kelly Kapoor'
create table dbo.supervisors
(
person_id int,
person nvarchar(500),
supervisor_id int,
supervisor nvarchar(500)
)
insert into dbo.supervisors
select 1,'Michael Scott', 0,'None'
union all select 2,'Jim Halpert',1,'Michael Scott'
union all select 3,'Dwight Schrute',2,'Jim Halpert'
union all select 4,'Stanley Hudson',2,'Jim Halpert'
union all select 6,'Ryan Howard',1,'Michael Scott'
union all select 5 ,'Kelly Kapoor',6,'Ryan Howard'
go
create function dbo.fn_isinspanofcontrol
(
#sourceid int,
#targetid int
)
RETURNS varchar(1)
as
begin
declare #retVal varchar(1)
declare #tbl table
(
person_id int
)
;with anchor as
(
select person_id, supervisor_id from supervisors where person_id = #sourceid
union all
select a.person_id, a.supervisor_id
from supervisors a
inner join Anchor b ON b.person_id = a.supervisor_id
)
insert into #tbl
select a.person_id
from anchor a
where
a.person_id = #targetid
;with anchor as
(
select person_id, supervisor_id from supervisors where person_id = #targetid
union all
select a.person_id, a.supervisor_id
from supervisors a
inner join Anchor b ON b.person_id = a.supervisor_id
)
insert into #tbl
select a.person_id
from anchor a
where
a.person_id = #sourceid
if exists( select 1
from #tbl
)
begin
set #retVal = 'Y'
end
else
begin
set #retVal = 'N'
end
return #retVal
end
select
*, dbo.fn_isinspanofcontrol(source_id,target_id)
from [source]

Related

How can I create 1 ID for a group of customers in a column?

I have a list of customers who have different contracts with my service company.
Sometimes we can have different customers per contracts. Example:
Karen and her boyfriend Will have a contract.
Sometimes a group of customers can have different contracts. Example: Karen and Will have multiple contracts with me.
Here is the table:
idCustomer idContract NameCust
-----------------------------------------
1 A Karen
1 B Will
2 A Karen
2 B Will
3 C Steph
4 C Peter
But because Karen and Will can have multiple contracts, I want a unique id for them and other group of customers. Result table I want:
idCustomer idContract NameCust Customer_GroupID
-----------------------------------------------------
1 A Karen 1
1 B Will 1
2 A Karen 1
2 B Will 1
3 C Steph 2
4 C Peter 2
I'm stuck because I tried different things that doesn't give me the result I need. I find in the forum someone who used Dense_Rank function but here is the result:
SELECT
RANK() OVER (ORDER BY idCustomers) AS Customer_GroupID,
IdCustomers,
IdContract
FROM
Table
Here is the result:
Cust_GroupID idCustomer idContract
--------------------------------------
1 1 A
2 1 B
1 2 A
2 2 B
3 3 C
3 4 C
I even tried to use multiple select, not exists but nothing.
It seem that I have understood your requirement. Still you should throw little more sample data to make it absolute clear to all. Append few more varied sample data in existing one.
You should test with different sample data and let me know if its not working.
Sample Data,
create table #test(idCustomer int,idContract varchar(50) , NameCust varchar(50))
insert into #test (idCustomer ,idContract , NameCust ) VALUES
(1,'A','Karen')
,(1,'B','Will' )
,(2,'A','Karen')
,(2,'B','Will' )
,(3,'C','Steph')
,(4,'C','Peter')
Method 1- SET BASED Approach,
;with CTE as
(
select *
,ROW_NUMBER()over(order by idCustomer)rn
from #test
)
,CTE1 as
(
select t.id,t.idContract,t.NameCust
, isnull(t1.idCustomer,t.idCustomer)customerGroupID
from CTE t
outer apply(
select top 1 idCustomer
from CTE t1
where t1.id< t.id
and((t.idCustomer=t1.idCustomer)
or (t.idContract=t1.idContract))
order by t1.id
)t1
)
,CTE2 AS(
select *
,DENSE_RANK()OVER( order by customerGroupID )Customer_GroupID
from CTE1
)
select * from CTE2
Method 2- RBAR (using cursor),
create table #test1(id int identity(1,1),idCustomer int
,idContract varchar(50) , NameCust varchar(50),customer_Groupid int)
insert into #test1 (idCustomer ,idContract
, NameCust,customer_Groupid )
select idCustomer ,idContract , NameCust,null
from #test
DECLARE #idCustomer INT
DECLARE #idContract varchar(50)
DECLARE #id INT
declare #customer_Groupid int
DECLARE #getCustomer CURSOR
SET #getCustomer = CURSOR FOR
SELECT id, idCustomer,idContract
FROM #test1
OPEN #getCustomer
FETCH NEXT
FROM #getCustomer INTO #id, #idCustomer,#idContract
WHILE ##FETCH_STATUS = 0
BEGIN
select top 1 #customer_Groupid=customer_Groupid
from #test1 where id<#id order by id desc
if not exists(select 1 from #test1 where id<#id
and (idCustomer=#idCustomer or idContract=#idContract))
BEGIN
select top 1 #customer_Groupid=customer_Groupid
from #test1 where id<#id order by id desc
if(#customer_Groupid is not null)
set #customer_Groupid=#customer_Groupid+1
end
if(#customer_Groupid is null)
set #customer_Groupid=1
update #test1 set customer_Groupid=#customer_Groupid where id=#id
FETCH NEXT
FROM #getCustomer INTO #id, #idCustomer,#idContract
END
CLOSE #getCustomer
DEALLOCATE #getCustomer
select * from #test1
drop table #test1
drop table #test
There seems to be a need to create an interim table of NameCustGroup like
NameCustList idCustomer idContract
Karen,Will 1 A,B
Karen,Will 2 A,B
Peter,Steph 3 C
Peter,Steph 4 C
And then use that to create
Customer_GroupID idCustomer idContract
1 1 A,B
1 2 A,B
2 3 C
2 4 D
The interim table seems hardest, because the NameCustList needs to be formed either by Common idCustomer or Common idContract. This is a work in progress... ...

How to synthesize attribute for joined tables

I have a view defined like this:
CREATE VIEW [dbo].[PossiblyMatchingContracts] AS
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts
FROM [dbo].AllContracts AS C
INNER JOIN [dbo].AllContracts AS CC
ON C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND C.AssociatedUser IS NULL
AND C.UniqueID <> CC.UniqueID
Which is basically finding contracts where f.e. the first name and the birthday are matching. This works great. Now I want to add a synthetic attribute to each row with the value from only one source row.
Let me give you an example to make it clearer. Suppose I have the following table:
UniqueID | FirstName | LastName | Birthday
1 | Peter | Smith | 1980-11-04
2 | Peter | Gray | 1980-11-04
3 | Peter | Gray-Smith| 1980-11-04
4 | Frank | May | 1985-06-09
5 | Frank-Paul| May | 1985-06-09
6 | Gina | Ericson | 1950-11-04
The resulting view should look like this:
UniqueID | PossiblyMatchingContracts | SyntheticID
1 | 2 | PeterSmith1980-11-04
1 | 3 | PeterSmith1980-11-04
2 | 1 | PeterSmith1980-11-04
2 | 3 | PeterSmith1980-11-04
3 | 1 | PeterSmith1980-11-04
3 | 2 | PeterSmith1980-11-04
4 | 5 | FrankMay1985-06-09
5 | 4 | FrankMay1985-06-09
6 | NULL | NULL [or] GinaEricson1950-11-04
Notice that the SyntheticID column uses ONLY values from one of the matching source rows. It doesn't matter which one. I am exporting this view to another application and need to be able to identify each "match group" afterwards.
Is it clear what I mean? Any ideas how this could be done in sql?
Maybe it helps to elaborate a bit on the actual use case:
I am importing contracts from different systems. To account for the possibility of typos or people that have married but the last name was only updated in one system, I need to find so called 'possible matches'. Two or more contracts are considered a possible match if they contain the same birthday plus the same first, last or birth name. That implies, that if contract A matches contract B, contract B also matches contract A.
The target system uses multivalue reference attributes to store these relationships. The ultimate goal is to create user objects for these contracts. The catch first is, that the shall only be one user object for multiple matching contracts. Thus I'm creating these matches in the view. The second catch is, that the creation of user objects happens by workflows, which run parallel for each contract. To avoid creating multiple user objects for matching contracts, each workflow needs to check, if there is already a matching user object or another workflow, which is about to create said user object. Because the workflow engine is extremely slow compared to sql, the workflows should not repeat the whole matching test. So the idea is, to let the workflow check only for the 'syntheticID'.
I have solved it with a multi step approach:
Create the list of possible 1st level matches for each contract
Create the base groups list, assigning a different group for for
each contract (as if they were not related to anybody)
Iterate the matches list updating the group list when more contracts need to
be added to a group
Recursively build up the SyntheticID from final group list
Output results
First of all, let me explain what I have understood, so you can tell if my approach is correct or not.
1) matching propagates in "cascade"
I mean, if "Peter Smith" is grouped up with "Peter Gray", it means that all Smith and all Gray are related (if they have the same birth date) so Luke Smith can be in the same group of John Gray
2) I have not understood what you mean with "Birth Name"
You say contracts matches on "first, last or birth name", sorry, I'm italian, I thought birth name and first were the same, also in your data there is not such column. Maybe it is related to that dash symbol between names?
When FirstName is Frank-Paul it means it should match both Frank and Paul?
When LastName is Gray-Smith it means it should match both Gray and Smith?
In following code I have simply ignored this problem, but it could be handled if needed (I already did a try, breaking names, unpivoting them and treating as double match).
Step Zero: some declaration and prepare base data
declare #cli as table (UniqueID int primary key, FirstName varchar(20), LastName varchar(20), Birthday varchar(20))
declare #comb as table (id1 int, id2 int, done bit)
declare #grp as table (ix int identity primary key, grp int, id int, unique (grp,ix))
declare #str_id as table (grp int primary key, SyntheticID varchar(1000))
declare #id1 as int, #g int
;with
t as (
select *
from (values
(1 , 'Peter' , 'Smith' , '1980-11-04'),
(2 , 'Peter' , 'Gray' , '1980-11-04'),
(3 , 'Peter' , 'Gray-Smith', '1980-11-04'),
(4 , 'Frank' , 'May' , '1985-06-09'),
(5 , 'Frank-Paul', 'May' , '1985-06-09'),
(6 , 'Gina' , 'Ericson' , '1950-11-04')
) x (UniqueID , FirstName , LastName , Birthday)
)
insert into #cli
select * from t
Step One: Create the list of possible 1st level matches for each contract
;with
p as(select UniqueID, Birthday, FirstName, LastName from #cli),
m as (
select p.UniqueID UniqueID1, p.FirstName FirstName1, p.LastName LastName1, p.Birthday Birthday1, pp.UniqueID UniqueID2, pp.FirstName FirstName2, pp.LastName LastName2, pp.Birthday Birthday2
from p
join p pp on (pp.Birthday=p.Birthday) and (pp.FirstName = p.FirstName or pp.LastName = p.LastName)
where p.UniqueID<=pp.UniqueID
)
insert into #comb
select UniqueID1,UniqueID2,0
from m
Step Two: Create the base groups list
insert into #grp
select ROW_NUMBER() over(order by id1), id1 from #comb where id1=id2
Step Three: Iterate the matches list updating the group list
Only loop on contracts that have possible matches and updates only if needed
set #id1 = 0
while not(#id1 is null) begin
set #id1 = (select top 1 id1 from #comb where id1<>id2 and done=0)
if not(#id1 is null) begin
set #g = (select grp from #grp where id=#id1)
update g set grp= #g
from #grp g
inner join #comb c on g.id = c.id2
where c.id2<>#id1 and c.id1=#id1
and grp<>#g
update #comb set done=1 where id1=#id1
end
end
Step Four: Build up the SyntheticID
Recursively add ALL (distinct) first and last names of group to SyntheticID.
I used '_' as separator for birth date, first names and last names, and ',' as separator for the list of names to avoid conflicts.
;with
c as(
select c.*, g.grp
from #cli c
join #grp g on g.id = c.UniqueID
),
d as (
select *, row_number() over (partition by g order by t,s) n1, row_number() over (partition by g order by t desc,s desc) n2
from (
select distinct c.grp g, 1 t, FirstName s from c
union
select distinct c.grp, 2, LastName from c
) l
),
r as (
select d.*, cast(CONVERT(VARCHAR(10), t.Birthday, 112) + '_' + s as varchar(1000)) Names, cast(0 as bigint) i1, cast(0 as bigint) i2
from d
join #cli t on t.UniqueID=d.g
where n1=1
union all
select d.*, cast(r.names + IIF(r.t<>d.t,'_',',') + d.s as varchar(1000)), r.n1, r.n2
from d
join r on r.g = d.g and r.n1=d.n1-1
)
insert into #str_id
select g, Names
from r
where n2=1
Step Five: Output results
select c.UniqueID, case when id2=UniqueID then id1 else id2 end PossibleMatchingContract, s.SyntheticID
from #cli c
left join #comb cb on c.UniqueID in(id1,id2) and id1<>id2
left join #grp g on c.UniqueID = g.id
left join #str_id s on s.grp = g.grp
Here is the results
UniqueID PossibleMatchingContract SyntheticID
1 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
1 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
4 5 1985-06-09_Frank,Frank-Paul_May
5 4 1985-06-09_Frank,Frank-Paul_May
6 NULL 1950-11-04_Gina_Ericson
I think that in this way the resulting SyntheticID should also be "unique" for each group
This creates a synthetic value and is easy to change to suit your needs.
DECLARE #T TABLE (
UniqueID INT
,FirstName VARCHAR(200)
,LastName VARCHAR(200)
,Birthday DATE
)
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 1,'Peter','Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 2,'Peter','Gray','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 3,'Peter','Gray-Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 4,'Frank','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 5,'Frank-Paul','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 6,'Gina','Ericson','1950-11-04'
DECLARE #PossibleMatches TABLE (UniqueID INT,[PossibleMatch] INT,SynKey VARCHAR(2000)
)
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Fn=' + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,t2.UniqueID,'Ln=' + t1.LastName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,pm.UniqueID,'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
LEFT JOIN #PossibleMatches pm on pm.UniqueID=t1.UniqueID
WHERE pm.UniqueID IS NULL
SELECT *
FROM #PossibleMatches
ORDER BY UniqueID,[PossibleMatch]
I think this will work for you
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C INNER JOIN
[dbo].AllContracts AS CC ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN(
SELECT UniqueID FROM [dbo].DefinitiveMatches)
AND C.AssociatedUser IS NULL
You can try this:
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C
INNER JOIN
[dbo].AllContracts AS CC
ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND
C.AssociatedUser IS NULL
This will generate one extra row (because we left out C.UniqueID <> CC.UniqueID) but will give you the good souluton.
Following an example with some example data extracted from your original post. The idea: Generate all SyntheticID in a CTE, query all records with a "PossibleMatch" and Union it with all records which are not yet included:
DECLARE #t TABLE(
UniqueID int
,FirstName nvarchar(20)
,LastName nvarchar(20)
,Birthday datetime
)
INSERT INTO #t VALUES (1, 'Peter', 'Smith', '1980-11-04');
INSERT INTO #t VALUES (2, 'Peter', 'Gray', '1980-11-04');
INSERT INTO #t VALUES (3, 'Peter', 'Gray-Smith', '1980-11-04');
INSERT INTO #t VALUES (4, 'Frank', 'May', '1985-06-09');
INSERT INTO #t VALUES (5, 'Frank-Paul', 'May', '1985-06-09');
INSERT INTO #t VALUES (6, 'Gina', 'Ericson', '1950-11-04');
WITH ctePrep AS(
SELECT UniqueID, FirstName, LastName, BirthDay,
ROW_NUMBER() OVER (PARTITION BY FirstName, BirthDay ORDER BY FirstName, BirthDay) AS k,
FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
),
cteKeys AS(
SELECT FirstName, BirthDay, SyntheticID
FROM ctePrep
WHERE k = 1
),
cteFiltered AS(
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
keys.SyntheticID
FROM #t AS C
JOIN #t AS CC ON C.FirstName = CC.FirstName
AND C.Birthday = CC.Birthday
JOIN cteKeys AS keys ON keys.FirstName = c.FirstName
AND keys.Birthday = c.Birthday
WHERE C.UniqueID <> CC.UniqueID
)
SELECT UniqueID, PossiblyMatchingContracts, SyntheticID
FROM cteFiltered
UNION ALL
SELECT UniqueID, NULL, FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
WHERE UniqueID NOT IN (SELECT UniqueID FROM cteFiltered)
Hope this helps. The result looked OK to me:
UniqueID PossiblyMatchingContracts SyntheticID
---------------------------------------------------------------
2 1 PeterSmith1980-11-04
3 1 PeterSmith1980-11-04
1 2 PeterSmith1980-11-04
3 2 PeterSmith1980-11-04
1 3 PeterSmith1980-11-04
2 3 PeterSmith1980-11-04
4 NULL FrankMay1985-06-09
5 NULL Frank-PaulMay1985-06-09
6 NULL GinaEricson1950-11-04
Tested in SSMS, it works perfect. :)
--create table structure
create table #temp
(
uniqueID int,
firstname varchar(15),
lastname varchar(15),
birthday date
)
--insert data into the table
insert #temp
select 1, 'peter','smith','1980-11-04'
union all
select 2, 'peter','gray','1980-11-04'
union all
select 3, 'peter','gray-smith','1980-11-04'
union all
select 4, 'frank','may','1985-06-09'
union all
select 5, 'frank-paul','may','1985-06-09'
union all
select 6, 'gina','ericson','1950-11-04'
select * from #temp
--solution is as below
select ab.uniqueID
, PossiblyMatchingContracts
, c.firstname+c.lastname+cast(c.birthday as varchar) as synID
from
(
select a.uniqueID
, case
when a.uniqueID < min(b.uniqueID)over(partition by a.uniqueid)
then a.uniqueID
else min(b.uniqueID)over(partition by a.uniqueid)
end as SmallestID
, b.uniqueID as PossiblyMatchingContracts
from #temp a
left join #temp b
on (a.firstname = b.firstname OR a.lastname = b.lastname) AND a.birthday = b.birthday AND a.uniqueid <> b.uniqueID
) as ab
left join #temp c
on ab.SmallestID = c.uniqueID
Result capture is attached below:
Say we have following table (a VIEW in your case):
UniqueID PossiblyMatchingContracts SyntheticID
1 2 G1
1 3 G2
2 1 G3
2 3 G4
3 1 G4
3 4 G6
4 5 G7
5 4 G8
6 NULL G9
In your case you can set initial SyntheticID as a string like PeterSmith1980-11-04 using UniqueID for each line. Here is a recursive CTE query it divides all lines to unconnected groups and select MAX(SyntheticId) in the current group as a new SyntheticID for all lines in this group.
WITH CTE AS
(
SELECT CAST(','+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' as Varchar(MAX)) as GroupCont,
SyntheticID
FROM PossiblyMatchingContracts
UNION ALL
SELECT CAST(GroupCont+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' AS Varchar(MAX)) as GroupCont,
pm.SyntheticID
FROM CTE
JOIN PossiblyMatchingContracts as pm
ON
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
AND NOT
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
AND
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
)
SELECT pm.UniqueID,
pm.PossiblyMatchingContracts,
ISNULL(
(SELECT MAX(SyntheticID) FROM CTE WHERE
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
))
,pm.SyntheticID) as SyntheticID
FROM PossiblyMatchingContracts pm

Select all hierarchy level and below SQL Server

I am having a difficult time with this one. I have seen a few examples on how to obtain all child records from a self referencing table given a parent and even how to get the parents of child records.
What I am trying to do is return a record and all child records given the ID.
To put this into context - I have a corporate hierarchy. Where:
#Role Level#
--------------------
Corporate 0
Region 1
District 2
Rep 3
What I need is a procedure that (1) figures out what level the record is and (2) retrieves that record and all children records.
The idea being a Region can see all districts and reps in a district, Districts can see their reps. Reps can only see themselves.
I have table:
ID ParentId Name
-------------------------------------------------------
1 Null Corporate HQ
2 1 South Region
3 1 North Region
4 1 East Region
5 1 West Region
6 3 Chicago District
7 3 Milwaukee District
8 3 Minneapolis District
9 6 Gold Coast Dealer
10 6 Blue Island Dealer
How do I do this:
CREATE PROCEDURE GetPositions
#id int
AS
BEGIN
--What is the most efficient way to do this--
END
GO
For example the expected result for #id = 3, I would want to return:
3, 6, 7, 8, 9, 10
I'd appreciate any help or ideas on this.
You could do this via a recursive CTE:
DECLARE #id INT = 3;
WITH rCTE AS(
SELECT *, 0 AS Level FROM tbl WHERE Id = #id
UNION ALL
SELECT t.*, r.Level + 1 AS Level
FROM tbl t
INNER JOIN rCTE r
ON t.ParentId = r.ID
)
SELECT * FROM rCTE OPTION(MAXRECURSION 0);
ONLINE DEMO
Assuming that you're on a reasonably modern version of SQL Server, you can use the hierarchyid datatype with a little bit of elbow grease. First, the setup:
alter table [dbo].[yourTable] add [path] hierarchyid null;
Next, we'll populate the new column:
with cte as (
select *, cast(concat('/', ID, '/') as varchar(max)) as [path]
from [dbo].[yourTable]
where [ParentID] is null
union all
select child.*,
cast(concat(parent.path, child.ID, '/') as varchar(max)) as [path]
from [dbo].[yourTable] as child
join cte as parent
on child.ParentID = parent.ID
)
update t
set path = c.path
from [dbo].[yourTable] as t
join cte as c
on t.ID = c.ID;
This is just a bog standard recursive table expression with one calculated column that represents the hierarchy. That's the hard part. Now, your procedure can look something like this:
create procedure dbo.GetPositions ( #id int ) as
begin
declare #h hierarchyid
set #h = (select Path from [dbo].[yourTable] where ID = #id);
select ID, ParentID, Name
from [dbo].[yourTable]
where Path.IsDescendentOf(#h) = 1;
end
So, to wrap up, all you're doing with the hierarchyid is storing the lineage for a given row so that you don't have to calculate it on the fly at select time.

Dividing one SQL column into subgroups with headings

I have two tables
STATUS
SNO | STATUS | DEPARTMENT_ID
1 In progress 1
2 Assigned 2
3 Quoted 2
4 Development 3
DEPARTMENTS
SNO | DEPARTMENT |
1 DESIGNING
2 MARKETING
3 PRODUCTION
Now I want a result like this using SQL stored procedure
Some Custom Column Name | DEPARTMENT_ID
DESIGNING -
In Progress 1
MARKETING -
Assigned 2
Quoted 2
PRODUCTION -
Development 3
The custom column will be used to populate a Telerik RadComboBox with DESIGNING, MARKETING and PRODUCTION acting as separators between statuses.
Select Department, -1 from Department_Table
Union
Select StatusName, Department_ID from Status_Table
Please elaborate your question so that we can provide better answer. Currently it seems you just want to return the joined data of both tables.
Often, this type of operation is more easily done at the application level. You can do it in SQL, using union all and order by, however:
select status as CustomColumnName, department
from ((select s.status, d.department, 1 as ordering
from status s join
departments d
on s.department_id = d.sno
) union all
(select d.department, NULL, 0 as ordering
from departments d
)
) dd
order by CustomColumnName, ordering;
Note: this treats the - as NULL.
Try this.Is it ok with other sample data ?
DECLARE #STATUS TABLE (
SNO INT
,[STATUS] VARCHAR(50)
,DEPARTMENT_ID INT
)
INSERT INTO #STATUS
VALUES (1,'In progress' ,1)
,(2,'Assigned',2)
,(3,'Quoted',2)
,(4,'Development',3)
DECLARE #DEPARTMENT TABLE (SNO INT,DEPARTMENT VARCHAR(50))
INSERT INTO #DEPARTMENT
VALUES ( 1,'DESIGNING'),(2,'MARKETING')
,(3,'PRODUCTION')
--select * from #STATUS
--select * from #DEPARTMENT
;
WITH CTE
AS (
SELECT DEPARTMENT [CustomeColumn]
,'-' DEPARTMENT_ID
,sno
FROM #DEPARTMENT
UNION ALL
SELECT [STATUS]
,cast(DEPARTMENT_ID AS VARCHAR(10))
,(
SELECT sno
FROM #DEPARTMENT
WHERE sno = a.DEPARTMENT_ID
)
FROM #STATUS A
)
SELECT *
FROM CTE
ORDER BY sno

Need query to select direct and indirect customerID aliases

I need a query that will return all related alias id's from either column. Shown here are some alias customer ids, among thousands of other rows. If the input parameter to a query is id=7, I need a query that would return 5 rows (1,5,7,10,22). That is because they are all aliases of one-another. For example, 22 and 10 are indirect aliases of 7.
CustomerAlias
--------------------------
AliasCuID AliasCuID2
--------------------------
1 5
1 7
5 7
10 5
22 1
Here is an excerpt from the customer table.
Customer
----------------------------------
CuID CuFirstName CuLastName
----------------------------------
1 Mike Jones
2 Fred Smith
3 Jack Jackson
4 Emily Simpson
5 Mike Jones
6 Beth Smith
7 Mike jones
8 Jason Robard
9 Emilie Jiklonmie
10 Michael jones
11 Mark Lansby
12 Scotty Slash
13 Emilie Jiklonmy
22 mike jones
I've been able to come close, but I cannot seem to select the indirectly related aliases correctly. Given this query:
SELECT DISTINCT Customer.CuID, Customer.CuFirstName, Customer.CuLastName
FROM Customer WHERE
(Customer.CuID = 7) OR (Customer.CuID IN
(SELECT AliasCuID2
FROM CustomerAlias AS CustomerAlias_2
WHERE (AliasCuID = 7))) OR (Customer.CuID IN
(SELECT AliasCuID
FROM CustomerAlias AS CustomerAlias_1
WHERE (AliasCuID2 = 7)))
Returns 3 out of 5 of the desired ids of course. This lacks the indirectly related aliased id of 10 and 22 in the result rows.
1 Mike Jones
5 Mike Jones
7 Mike jones
* Based on suggestions below, I am trying a CTE hierarchical query.
I have this now after following some suggestions. It works for some, as long as the records in the table reference enough immediate ids. But, if the query uses id=10, then it still comes up short, just by the nature of the data.
DECLARE #id INT
SET #id = 10;
DECLARE #tmp TABLE ( a1 INT, a2 INT, Lev INT );
WITH Results (AliasCuID, AliasCuID2, [Level]) AS (
SELECT AliasCuID,
AliasCuID2,
0 as [Level]
FROM CustomerAlias
WHERE AliasCuID = #id OR AliasCuID2 = #id
UNION ALL
-- Recursive step
SELECT a.AliasCuID,
a.AliasCuID2,
r.[Level] + 1 AS [Level]
FROM CustomerAlias a
INNER JOIN Results r ON a.AliasCuID = r.AliasCuID2 )
INSERT INTO #tmp
SELECT * FROM Results;
WITH Results3 (AliasCuID, AliasCuID2, [Level]) AS (
SELECT AliasCuID,
AliasCuID2,
0 as [Level]
FROM CustomerAlias
WHERE AliasCuID = #id OR AliasCuID2 = #id
UNION ALL
-- Recursive step
SELECT a.AliasCuID,
a.AliasCuID2,
r.[Level] + 1 AS [Level]
FROM CustomerAlias a
INNER JOIN Results3 r ON a.AliasCuID2 = r.AliasCuID )
INSERT INTO #tmp
SELECT * FROM Results3;
SELECT DISTINCT a1 AS id FROM #tmp
UNION ALL
SELECT DISTINCT a2 AS id FROM #tmp
ORDER BY id
Note that this is a simplified the query to just give a list of related ids.
---
id
---
5
5
7
10
But, it is still unable to pull in ids 1 and 22.
This is not an easy problem to solve unless you have some idea of the depth of your search (https://stackoverflow.com/a/7569520/1803682) - which it looks like you do not - and take a brute force approach to it.
Assuming you do not know the depth you will need to write a stored proc. I followed this approach for a nearly identical problem: https://dba.stackexchange.com/questions/7147/find-highest-level-of-a-hierarchical-field-with-vs-without-ctes/7161#7161
UPDATE
If you don't care about the chain of how the alias's were created - I would run a script recursively to make them all refer to a single (master?) record. Then you can easily do the search and it will be quick - not a solution if you care about how the alias's get traversed though.
I created a SQL Fiddle for SQL Server 2012. Please let me know if you can or cannot access it.
My thought here was that you'd want to just keep checking the left and right branches recursively, separately. This logic probably falls apart if the relationships bounce between left and right. You could set up a third CTE to reference the first two, but joining on left to right and right to left, but ain't nobody got time for that.
The code is below as well.
CREATE TABLE CustomerAlias
(
AliasCuID INT,
AliasCuID2 INT
)
GO
INSERT INTO CustomerAlias
SELECT 1,5
UNION SELECT 1, 7
UNION SELECT 5, 7
UNION SELECT 10, 5
UNION SELECT 22, 1
GO
DECLARE #Value INT
SET #Value = 7
; WITH LeftAlias AS
(
SELECT AliasCuID
, AliasCuID2
FROM CustomerAlias
WHERE AliasCuID2 = #Value
UNION ALL
SELECT a.AliasCuID
, a.AliasCuID2
FROM CustomerAlias a
JOIN LeftAlias b
ON a.AliasCuID = b.AliasCuID2
)
, RightAlias AS
(
SELECT AliasCuID
, AliasCuID2
FROM CustomerAlias
WHERE AliasCuID = #Value
UNION ALL
SELECT a.AliasCuID
, a.AliasCuID2
FROM CustomerAlias a
JOIN LeftAlias b
ON a.AliasCuID2 = b.AliasCuID
)
SELECT DISTINCT A
FROM
(
SELECT A = AliasCuID
FROM LeftAlias
UNION ALL
SELECT A = AliasCuID2
FROM LeftAlias
UNION ALL
SELECT A = AliasCuID
FROM RightAlias
UNION ALL
SELECT A = AliasCuID2
FROM RightAlias
) s
ORDER BY A