Related
I have a table with multiple insurance policies per client. One Policy per record.
I need to represent all the policies in one row per client.
I have read all the similar questions and they may be close but I don't seem to be able to translate the answers into my situation.
The table that I have looks like this
enter code here Client-ID Ins-Company Policy-Number Start-Date
1 BCBS BSBC1 2018-01-01
1 Aetna Aetna1 2017-01-01
1 Self-Pay N/A 2016-01-01
2 Self-Pay N/A 2015-01-01
3 BCBS BCBS3 2014-01-01
3 Self-Pay N/A 2013-01-01
Expected Result:
enter code here Client-ID Ins-Co1 Policy1 Start1 Ins-Co2 Policy2 Start2 Ins-Co3 Policy3 Start3
1 BCBS BCBS1 2018-01-01 Aetna Aetna1 2017-01-01 Self-Pay N/A 2016-01-01
2 Self-Pay N/A 2015-01-01
3 BCBS BCBS3 2014-01-01 Self-Pay N/A 2013-01-01
In need to create another table with these records
I created the result I needed with the following code.
First, I created a View of the source table that added a Row Number using the following Function:
ROW_NUMBER() OVER(PARTITION BY a.Client_ID ORDER BY a.Billing_Order ASC) AS Row_No
and built the following Code
enter code hereDECLARE #sql varchar(max)
DECLARE #colList varchar(max)
--create dynamic list of columns
SELECT #colList =
STUFF
(
(
SELECT + ',' +
quotename(colName + Row_No)
FROM Credible_Client_Insurance_Raw_Data_Query
CROSS APPLY
(
SELECT 1 As Ord, 'Payer_ID' ColName UNION ALL
SELECT 2 As Ord, 'Billing_Order' UNION ALL
SELECT 3 As Ord, 'Insurance_ID' UNION ALL
SELECT 4 As Ord, 'Group_No' UNION ALL
SELECT 5 As Ord, 'Copay_Fee' UNION ALL
SELECT 6 As Ord, 'Start_Date'
) v
GROUP BY colName, Ord, Row_No
ORDER BY Row_No, Ord
for xml path(''), type
).value('/','varchar(max)'),1,1,''
)
--unpivot columns into rows and then apply pivot
SET #sql
= '
SELECT Client_ID, ' + #colList + '
FROM
(
SELECT Client_ID, ColVal,
colName + Row_No ColName
FROM Credible_Client_Insurance_Raw_Data_Query
CROSS APPLY
(
SELECT Payer_ID As ColVal, ''Payer_ID'' ColName UNION ALL
SELECT Billing_Order, ''Billing_Order'' UNION ALL
SELECT Insurance_ID, ''Insurance_ID'' UNION ALL
SELECT Group_No, ''Group_No'' UNION ALL
SELECT CAST(Copay_Fee AS VARCHAR), ''Copay_Fee'' UNION ALL
SELECT CAST(Start_Date AS VARCHAR), ''Start_Date''
) v
) A
PIVOT
(
MAX(ColVal) FOR ColName IN (' + #colList + ')
) P1 '
EXEC(#sql)
Code was copied from another question
Pivot on multiple columns based on single column
Now I have a follow up question:
Now that I created a query that provides the result I need
I need to place the result into a table.
I cannot take this query and make it into a VIEW because it starts with a Declare Statement ( Illegal for Views )
So how do I transfer this query data into a Table?
I have a view defined like this:
CREATE VIEW [dbo].[PossiblyMatchingContracts] AS
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts
FROM [dbo].AllContracts AS C
INNER JOIN [dbo].AllContracts AS CC
ON C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND C.AssociatedUser IS NULL
AND C.UniqueID <> CC.UniqueID
Which is basically finding contracts where f.e. the first name and the birthday are matching. This works great. Now I want to add a synthetic attribute to each row with the value from only one source row.
Let me give you an example to make it clearer. Suppose I have the following table:
UniqueID | FirstName | LastName | Birthday
1 | Peter | Smith | 1980-11-04
2 | Peter | Gray | 1980-11-04
3 | Peter | Gray-Smith| 1980-11-04
4 | Frank | May | 1985-06-09
5 | Frank-Paul| May | 1985-06-09
6 | Gina | Ericson | 1950-11-04
The resulting view should look like this:
UniqueID | PossiblyMatchingContracts | SyntheticID
1 | 2 | PeterSmith1980-11-04
1 | 3 | PeterSmith1980-11-04
2 | 1 | PeterSmith1980-11-04
2 | 3 | PeterSmith1980-11-04
3 | 1 | PeterSmith1980-11-04
3 | 2 | PeterSmith1980-11-04
4 | 5 | FrankMay1985-06-09
5 | 4 | FrankMay1985-06-09
6 | NULL | NULL [or] GinaEricson1950-11-04
Notice that the SyntheticID column uses ONLY values from one of the matching source rows. It doesn't matter which one. I am exporting this view to another application and need to be able to identify each "match group" afterwards.
Is it clear what I mean? Any ideas how this could be done in sql?
Maybe it helps to elaborate a bit on the actual use case:
I am importing contracts from different systems. To account for the possibility of typos or people that have married but the last name was only updated in one system, I need to find so called 'possible matches'. Two or more contracts are considered a possible match if they contain the same birthday plus the same first, last or birth name. That implies, that if contract A matches contract B, contract B also matches contract A.
The target system uses multivalue reference attributes to store these relationships. The ultimate goal is to create user objects for these contracts. The catch first is, that the shall only be one user object for multiple matching contracts. Thus I'm creating these matches in the view. The second catch is, that the creation of user objects happens by workflows, which run parallel for each contract. To avoid creating multiple user objects for matching contracts, each workflow needs to check, if there is already a matching user object or another workflow, which is about to create said user object. Because the workflow engine is extremely slow compared to sql, the workflows should not repeat the whole matching test. So the idea is, to let the workflow check only for the 'syntheticID'.
I have solved it with a multi step approach:
Create the list of possible 1st level matches for each contract
Create the base groups list, assigning a different group for for
each contract (as if they were not related to anybody)
Iterate the matches list updating the group list when more contracts need to
be added to a group
Recursively build up the SyntheticID from final group list
Output results
First of all, let me explain what I have understood, so you can tell if my approach is correct or not.
1) matching propagates in "cascade"
I mean, if "Peter Smith" is grouped up with "Peter Gray", it means that all Smith and all Gray are related (if they have the same birth date) so Luke Smith can be in the same group of John Gray
2) I have not understood what you mean with "Birth Name"
You say contracts matches on "first, last or birth name", sorry, I'm italian, I thought birth name and first were the same, also in your data there is not such column. Maybe it is related to that dash symbol between names?
When FirstName is Frank-Paul it means it should match both Frank and Paul?
When LastName is Gray-Smith it means it should match both Gray and Smith?
In following code I have simply ignored this problem, but it could be handled if needed (I already did a try, breaking names, unpivoting them and treating as double match).
Step Zero: some declaration and prepare base data
declare #cli as table (UniqueID int primary key, FirstName varchar(20), LastName varchar(20), Birthday varchar(20))
declare #comb as table (id1 int, id2 int, done bit)
declare #grp as table (ix int identity primary key, grp int, id int, unique (grp,ix))
declare #str_id as table (grp int primary key, SyntheticID varchar(1000))
declare #id1 as int, #g int
;with
t as (
select *
from (values
(1 , 'Peter' , 'Smith' , '1980-11-04'),
(2 , 'Peter' , 'Gray' , '1980-11-04'),
(3 , 'Peter' , 'Gray-Smith', '1980-11-04'),
(4 , 'Frank' , 'May' , '1985-06-09'),
(5 , 'Frank-Paul', 'May' , '1985-06-09'),
(6 , 'Gina' , 'Ericson' , '1950-11-04')
) x (UniqueID , FirstName , LastName , Birthday)
)
insert into #cli
select * from t
Step One: Create the list of possible 1st level matches for each contract
;with
p as(select UniqueID, Birthday, FirstName, LastName from #cli),
m as (
select p.UniqueID UniqueID1, p.FirstName FirstName1, p.LastName LastName1, p.Birthday Birthday1, pp.UniqueID UniqueID2, pp.FirstName FirstName2, pp.LastName LastName2, pp.Birthday Birthday2
from p
join p pp on (pp.Birthday=p.Birthday) and (pp.FirstName = p.FirstName or pp.LastName = p.LastName)
where p.UniqueID<=pp.UniqueID
)
insert into #comb
select UniqueID1,UniqueID2,0
from m
Step Two: Create the base groups list
insert into #grp
select ROW_NUMBER() over(order by id1), id1 from #comb where id1=id2
Step Three: Iterate the matches list updating the group list
Only loop on contracts that have possible matches and updates only if needed
set #id1 = 0
while not(#id1 is null) begin
set #id1 = (select top 1 id1 from #comb where id1<>id2 and done=0)
if not(#id1 is null) begin
set #g = (select grp from #grp where id=#id1)
update g set grp= #g
from #grp g
inner join #comb c on g.id = c.id2
where c.id2<>#id1 and c.id1=#id1
and grp<>#g
update #comb set done=1 where id1=#id1
end
end
Step Four: Build up the SyntheticID
Recursively add ALL (distinct) first and last names of group to SyntheticID.
I used '_' as separator for birth date, first names and last names, and ',' as separator for the list of names to avoid conflicts.
;with
c as(
select c.*, g.grp
from #cli c
join #grp g on g.id = c.UniqueID
),
d as (
select *, row_number() over (partition by g order by t,s) n1, row_number() over (partition by g order by t desc,s desc) n2
from (
select distinct c.grp g, 1 t, FirstName s from c
union
select distinct c.grp, 2, LastName from c
) l
),
r as (
select d.*, cast(CONVERT(VARCHAR(10), t.Birthday, 112) + '_' + s as varchar(1000)) Names, cast(0 as bigint) i1, cast(0 as bigint) i2
from d
join #cli t on t.UniqueID=d.g
where n1=1
union all
select d.*, cast(r.names + IIF(r.t<>d.t,'_',',') + d.s as varchar(1000)), r.n1, r.n2
from d
join r on r.g = d.g and r.n1=d.n1-1
)
insert into #str_id
select g, Names
from r
where n2=1
Step Five: Output results
select c.UniqueID, case when id2=UniqueID then id1 else id2 end PossibleMatchingContract, s.SyntheticID
from #cli c
left join #comb cb on c.UniqueID in(id1,id2) and id1<>id2
left join #grp g on c.UniqueID = g.id
left join #str_id s on s.grp = g.grp
Here is the results
UniqueID PossibleMatchingContract SyntheticID
1 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
1 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
2 3 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 1 1980-11-04_Peter_Gray,Gray-Smith,Smith
3 2 1980-11-04_Peter_Gray,Gray-Smith,Smith
4 5 1985-06-09_Frank,Frank-Paul_May
5 4 1985-06-09_Frank,Frank-Paul_May
6 NULL 1950-11-04_Gina_Ericson
I think that in this way the resulting SyntheticID should also be "unique" for each group
This creates a synthetic value and is easy to change to suit your needs.
DECLARE #T TABLE (
UniqueID INT
,FirstName VARCHAR(200)
,LastName VARCHAR(200)
,Birthday DATE
)
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 1,'Peter','Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 2,'Peter','Gray','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 3,'Peter','Gray-Smith','1980-11-04'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 4,'Frank','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 5,'Frank-Paul','May','1985-06-09'
INSERT INTO #T(UniqueID,FirstName,LastName,Birthday) SELECT 6,'Gina','Ericson','1950-11-04'
DECLARE #PossibleMatches TABLE (UniqueID INT,[PossibleMatch] INT,SynKey VARCHAR(2000)
)
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID [UniqueID],t2.UniqueID [Possible Matches],'Fn=' + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.FirstName=t2.FirstName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,t2.UniqueID,'Ln=' + t1.LastName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
INNER JOIN #T t2 ON t1.Birthday=t2.Birthday
AND t1.LastName=t2.LastName
AND t1.UniqueID<>t2.UniqueID
INSERT INTO #PossibleMatches
SELECT t1.UniqueID,pm.UniqueID,'Ln=' + t1.LastName + ' Fn=' + + t1.FirstName + ' DoB=' + CONVERT(VARCHAR,t1.Birthday,102) [SynKey]
FROM #T t1
LEFT JOIN #PossibleMatches pm on pm.UniqueID=t1.UniqueID
WHERE pm.UniqueID IS NULL
SELECT *
FROM #PossibleMatches
ORDER BY UniqueID,[PossibleMatch]
I think this will work for you
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C INNER JOIN
[dbo].AllContracts AS CC ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN(
SELECT UniqueID FROM [dbo].DefinitiveMatches)
AND C.AssociatedUser IS NULL
You can try this:
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
FIRST_VALUE(CC.FirstName+CC.LastName+CC.Birthday)
OVER (PARTITION BY C.UniqueID ORDER BY CC.UniqueID) as SyntheticID
FROM
[dbo].AllContracts AS C
INNER JOIN
[dbo].AllContracts AS CC
ON
C.SecondaryMatchCodeFB = CC.SecondaryMatchCodeFB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeLB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeLB = CC.SecondaryMatchCodeBB
OR
C.SecondaryMatchCodeBB = CC.SecondaryMatchCodeLB
WHERE
C.UniqueID NOT IN
(
SELECT UniqueID FROM [dbo].DefinitiveMatches
)
AND
C.AssociatedUser IS NULL
This will generate one extra row (because we left out C.UniqueID <> CC.UniqueID) but will give you the good souluton.
Following an example with some example data extracted from your original post. The idea: Generate all SyntheticID in a CTE, query all records with a "PossibleMatch" and Union it with all records which are not yet included:
DECLARE #t TABLE(
UniqueID int
,FirstName nvarchar(20)
,LastName nvarchar(20)
,Birthday datetime
)
INSERT INTO #t VALUES (1, 'Peter', 'Smith', '1980-11-04');
INSERT INTO #t VALUES (2, 'Peter', 'Gray', '1980-11-04');
INSERT INTO #t VALUES (3, 'Peter', 'Gray-Smith', '1980-11-04');
INSERT INTO #t VALUES (4, 'Frank', 'May', '1985-06-09');
INSERT INTO #t VALUES (5, 'Frank-Paul', 'May', '1985-06-09');
INSERT INTO #t VALUES (6, 'Gina', 'Ericson', '1950-11-04');
WITH ctePrep AS(
SELECT UniqueID, FirstName, LastName, BirthDay,
ROW_NUMBER() OVER (PARTITION BY FirstName, BirthDay ORDER BY FirstName, BirthDay) AS k,
FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
),
cteKeys AS(
SELECT FirstName, BirthDay, SyntheticID
FROM ctePrep
WHERE k = 1
),
cteFiltered AS(
SELECT
C.UniqueID,
CC.UniqueID AS PossiblyMatchingContracts,
keys.SyntheticID
FROM #t AS C
JOIN #t AS CC ON C.FirstName = CC.FirstName
AND C.Birthday = CC.Birthday
JOIN cteKeys AS keys ON keys.FirstName = c.FirstName
AND keys.Birthday = c.Birthday
WHERE C.UniqueID <> CC.UniqueID
)
SELECT UniqueID, PossiblyMatchingContracts, SyntheticID
FROM cteFiltered
UNION ALL
SELECT UniqueID, NULL, FirstName+LastName+CONVERT(nvarchar(10), Birthday, 126) AS SyntheticID
FROM #t
WHERE UniqueID NOT IN (SELECT UniqueID FROM cteFiltered)
Hope this helps. The result looked OK to me:
UniqueID PossiblyMatchingContracts SyntheticID
---------------------------------------------------------------
2 1 PeterSmith1980-11-04
3 1 PeterSmith1980-11-04
1 2 PeterSmith1980-11-04
3 2 PeterSmith1980-11-04
1 3 PeterSmith1980-11-04
2 3 PeterSmith1980-11-04
4 NULL FrankMay1985-06-09
5 NULL Frank-PaulMay1985-06-09
6 NULL GinaEricson1950-11-04
Tested in SSMS, it works perfect. :)
--create table structure
create table #temp
(
uniqueID int,
firstname varchar(15),
lastname varchar(15),
birthday date
)
--insert data into the table
insert #temp
select 1, 'peter','smith','1980-11-04'
union all
select 2, 'peter','gray','1980-11-04'
union all
select 3, 'peter','gray-smith','1980-11-04'
union all
select 4, 'frank','may','1985-06-09'
union all
select 5, 'frank-paul','may','1985-06-09'
union all
select 6, 'gina','ericson','1950-11-04'
select * from #temp
--solution is as below
select ab.uniqueID
, PossiblyMatchingContracts
, c.firstname+c.lastname+cast(c.birthday as varchar) as synID
from
(
select a.uniqueID
, case
when a.uniqueID < min(b.uniqueID)over(partition by a.uniqueid)
then a.uniqueID
else min(b.uniqueID)over(partition by a.uniqueid)
end as SmallestID
, b.uniqueID as PossiblyMatchingContracts
from #temp a
left join #temp b
on (a.firstname = b.firstname OR a.lastname = b.lastname) AND a.birthday = b.birthday AND a.uniqueid <> b.uniqueID
) as ab
left join #temp c
on ab.SmallestID = c.uniqueID
Result capture is attached below:
Say we have following table (a VIEW in your case):
UniqueID PossiblyMatchingContracts SyntheticID
1 2 G1
1 3 G2
2 1 G3
2 3 G4
3 1 G4
3 4 G6
4 5 G7
5 4 G8
6 NULL G9
In your case you can set initial SyntheticID as a string like PeterSmith1980-11-04 using UniqueID for each line. Here is a recursive CTE query it divides all lines to unconnected groups and select MAX(SyntheticId) in the current group as a new SyntheticID for all lines in this group.
WITH CTE AS
(
SELECT CAST(','+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' as Varchar(MAX)) as GroupCont,
SyntheticID
FROM PossiblyMatchingContracts
UNION ALL
SELECT CAST(GroupCont+CAST(UniqueID AS Varchar(100)) +','+ CAST(PossiblyMatchingContracts as Varchar(100))+',' AS Varchar(MAX)) as GroupCont,
pm.SyntheticID
FROM CTE
JOIN PossiblyMatchingContracts as pm
ON
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
AND NOT
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
AND
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
)
)
SELECT pm.UniqueID,
pm.PossiblyMatchingContracts,
ISNULL(
(SELECT MAX(SyntheticID) FROM CTE WHERE
(
CTE.GroupCont LIKE '%,'+CAST(pm.UniqueID AS Varchar(100))+',%'
OR
CTE.GroupCont LIKE '%,'+CAST(pm.PossiblyMatchingContracts AS Varchar(100))+',%'
))
,pm.SyntheticID) as SyntheticID
FROM PossiblyMatchingContracts pm
I have the following data that I would like to pivot and get a count based on the pivoted results.
DECLARE #tempMusicSchoolStudent TABLE
(school VARCHAR(50),
studentname VARCHAR(50),
instrumentname VARCHAR(255),
expertise INT)
INSERT INTO #tempMusicSchoolStudent(school, studentname, instrumentname, expertise)
SELECT 'Foster','Matt','Guitar','10'
UNION
SELECT 'Foster','Jimmy','Guitar','5'
UNION
SELECT 'Foster','Jimmy','Keyboard','8'
UNION
SELECT 'Foster','Ryan','Keyboard','9'
UNION
SELECT 'Midlothean','Kyle','Keyboard','10'
UNION
SELECT 'Midlothean','Mary','Guitar','4'
UNION
SELECT 'Midlothean','Mary','Keyboard','7'
Raw data:
I'd like the results to look like the data below....
I got this data using the sql query below. The problem with this query is that I have a dynamic amount of instruments (I've only shown 2 in this example for simplicity sake). I'd like to use pivot because it will be cleaner dynamic sql. Otherwise I would have to dynamically left join the table to itself for each instrument.
SELECT
t.school, t.instrumentname, t.expertise,
t1.instrumentname, t1.expertise,
COUNT(DISTINCT t.studentname) [DistinctStudentCount]
FROM
#tempMusicSchoolStudent t
LEFT JOIN
#tempMusicSchoolStudent t1 ON t1.school = t.school
AND t1.studentname = t.studentname
AND t.instrumentname <> t1.instrumentname
GROUP BY
t.school, t.instrumentname, t.expertise, t1.instrumentname, t1.expertise
ORDER BY
t.school, t.instrumentname, t.expertise, t1.instrumentname, t1.expertise
If anyone has any ideas on how I can do this in a cleaner way than dynamically left joining the table to itself it would be much appreciated. Thanks.
You just need conditional aggregation:
SELECT t.school, t.instrumentname, t.expertise, t.instrumentname,
COUNT(DISTINCT t.studentname) as DistinctStudentCount
FROM #tempMusicSchoolStudent t
GROUP BY t.school, t.instrumentname, t.expertise, t.instrumentname;
You have rows with NULL values. It is entirely unclear where those come from. Your question is focused on some notion of "pivoting" where it seems that you only need aggregation. But it doesn't explain where the NULL rows comes from.
You can try to make it dynamic for multipe instruments. Refer
;with cte
as
(
SELECT * from
(SELECT * FROM #tempMusicSchoolStudent t) x
PIVOT
(MAX(expertise) FOR instrumentname in ([Guitar], [Keyboard])) y
)
SELECT school, studentname,
expertise = case when Guitar is not null then 'Guitar' else NULL end,
Guitar AS instrumentname,
expertise = case when Keyboard is not null then 'Keyboard' else NULL end,
Keyboard AS instrumentname,
count(distinct studentname) AS [DistinctStudentCount]
from cte
group by school,studentname, Guitar, Keyboard
OUTPUT:
Foster Jimmy Guitar 5 Keyboard 8 1
Foster Matt Guitar 10 NULL NULL 1
Foster Ryan NULL NULL Keyboard 9 1
Midlothean Kyle NULL NULL Keyboard 10 1
Midlothean Mary Guitar 4 Keyboard 7 1
Here's the solution I was looking for, I had to use unpivot + pivot.
The real thing that I was struggling with was selecting multiple values for the column that is being pivoted, instead of the max value.
So in this case I wanted multiple "expertise" numbers under a given "instrument expertise" column. Not just the maximum expertise for that instrument.
The first key to understanding the solution is that the pivot statement is doing an implicit group by on the columns being selected. So in order to achieve multiple values under your pivoted column you have to keep the integrity of the column you are grouping on by including some type of dense_rank/rank/row_number. This basically represents changes in the value of the column you are pivoting on and is then included in the implicit group by the pivot is doing, which results in getting multiple values in the pivoted column, not just the max.
So in the code below the "expertisenum" column is keeping the integrity of the expertise data.
DECLARE #tempMusicSchoolStudent TABLE
(school VARCHAR(50),
studentname VARCHAR(50),
instrumentname VARCHAR(255),
expertise INT)
INSERT INTO #tempMusicSchoolStudent(school, studentname, instrumentname, expertise)
SELECT 'Foster','Matt','Guitar','10'
UNION
SELECT 'Foster','Jimmy','Guitar','5'
UNION
SELECT 'Foster','Jimmy','Keyboard','8'
UNION
SELECT 'Foster','Ryan','Keyboard','9'
UNION
SELECT 'Midlothean','Kyle','Keyboard','10'
UNION
SELECT 'Midlothean','Mary','Guitar','4'
UNION
SELECT 'Midlothean','Mary','Keyboard','7'
SELECT school, [Guitar expertise], [Keyboard expertise], COUNT(*) [Count]
FROM
(
SELECT school,[expertiseNum],
CASE WHEN [Columns]='expertise' THEN instrumentname + ' expertise'
END [Columns1], [Values] AS [Values1]
FROM
(
SELECT school, studentname, instrumentname, DENSE_RANK() OVER(PARTITION BY school,instrumentname ORDER BY expertise) AS [expertiseNum],
CONVERT(VARCHAR(255),expertise) AS [expertise]
FROM #tempMusicSchoolStudent
) x
UNPIVOT (
[Values] FOR [Columns] IN ([expertise])
) unpvt
) p
PIVOT (
MAX([Values1]) FOR [Columns1] IN ([Guitar expertise], [Keyboard expertise])
) pvt
GROUP BY school,[Guitar expertise], [Keyboard expertise]
I have a database table in which multiple customers can be assigned to multiple types. I am having trouble formulating a query that will exclude all customer records that match a certain type. For example:
ID CustomerName Type
=========================
111 John Smith TFS-A
111 John Smith PRO
111 John Smith RWAY
222 Jane Doe PRO
222 Jane Doe TFS-A
333 Richard Smalls PRO
444 Bob Rhoads PRO
555 Jacob Jones TFS-B
555 Jacob Jones TFS-A
What I want is to pull only those people who are marked PRO but not marked TFS. If they are PRO and TFS, exclude them.
Any help is greatly appreciated.
You can get all 'PRO' customers and use NOT EXISTS clause to exclude the ones that are also 'TFS':
SELECT DISTINCT ID, CustomerName
FROM mytable AS t1
WHERE [Type] = 'PRO' AND NOT EXISTS (SELECT 1
FROM mytable AS t2
WHERE t1.ID = t2.ID AND [Type] LIKE 'TFS%')
SQL Fiddle Demo
solution using EXCEPT
WITH TestData
AS (
SELECT *
FROM (
VALUES ( 111, 'John Smith', 'TFS-A' )
, ( 111, 'John Smith', 'PRO' )
, ( 111, 'John Smith', 'RWAY' )
, ( 222, 'Jane Doe', 'PRO' )
, ( 222, 'Jane Doe', 'TFS-A' )
, ( 333, 'Richard Smalls', 'PRO' )
, ( 444, 'Bob Rhoads', 'PRO' )
, ( 555, 'Jacob Jones', 'TFS-B' )
, ( 555, 'Jacob Jones', 'TFS-A' ))
AS t (ID, CustomerName, [Type])
)
SELECT ID, CustomerName
FROM TestData
WHERE [Type] = 'PRO'
EXCEPT
SELECT ID, CustomerName
FROM TestData
WHERE [Type] LIKE 'TFS%'
output result
Select DISTINCT(Customername),ID
FROM tablename
WHERE NOT (ID IN (SELECT ID FROM tablename WHERE type='PRO')
AND ID IN (SELECT ID FROM tablename WHERE type='TFS'))
EDIT: now added working TFS clause
Get all customers that do not have TYPE PRO AND TFS for example
SQLFIDDLE:http://sqlfiddle.com/#!9/da4f9/2
Try This :
SELECT *
FROM table a
WHERE Type = 'PRO'
AND NOT EXISTS(SELECT 1
FROM table b
WHERE a.ID = b.ID
AND LEFT(Type, 3) = 'TFS')
I know this question has been answered, but mine answer is different. Everyone else solutions involves two queries which means what I call "double-dipping". You have to look access the same table twice. It's better to avoid this when possible for better performance. Check this out:
SELECT ID,
CustomerName,
MIN([type]) AS [Type] --doesn't matter if it's MIN or MAX
FROM yourTable
WHERE [Type] = 'PRO' --only load values that matter. Ignore RWAY
OR [Type] LIKE 'TFS-_' --notice I use a "_" instead of "%". That because "_" is a wildcard for a single character
--instead of wildcard looking for any number of characters because normally it's best to be as narrow as possible to be more efficient
GROUP BY ID,CustomerName
HAVING SUM(CASE
WHEN [Type] = 'Pro' --This is where it returns values that only have type PRO
THEN 9999
ELSE 1
END
) = 9999
So let me explain my funky HAVING logic. So as you can see it's a SUM() so and for PRO it's 9999 and TFS-_ it's 1. So when the sum is EXACTLY 9999, then it's good. Why I can't just do a COUNT(*) = 1 is because if a value has only one TFS and no pro, it would be returned, which of course would be incorrect.
Results:
ID CustomerName Type
----------- -------------- -----
444 Bob Rhoads PRO
333 Richard Smalls PRO
I am getting the following output
Code Manager Employee
1 A Emp1
1 A Emp2
1 A Emp3
2 B Emp4
2 B Emp5
but I want result as
Code Manager Employee
1 A Emp1
Emp2
Emp3
2 B Emp4
Emp5
Code and manager columns should not repeat.It should be blank.
select case when Code = lag(Code) over(order by Code, Manager, Employee)
then null
else Code
end as Code,
case when Manager = lag(Manager) over(order by Code, Manager, Employee)
then null
else Manager
end as Manager,
Employee
from YourTable Y
order by Y.Code, Y.Manager, Y.Employee
Try on SQL Fiddle
You need something like Access or Crystal Reports to do this sort of formatting. Its not possible in plain SQL.
That is not possible by SQL. You should manually loop the data in code after receiving it from database.
Edit
After comments by Vashh and Lieven, I realized that it is possible. So if he needs for display purpose he can either use Func (suggested by Vaassh), Join with null (s. by Lieven) or may be loop and add to datagridview or table or whatever he wants to use.
For the fun of it, following is one way to do it but in the end, this is better done in the end-user application
;WITH YourTable (Code, Manager, Employee) AS(
SELECT * FROM (VALUES
(1, 'A', 'Emp1')
, (1, 'A', 'Emp2')
, (1, 'A', 'Emp3')
, (2, 'B', 'Emp4')
, (2, 'B', 'Emp5')
) a (b, c, d)
)
, q AS (
SELECT rn = ROW_NUMBER() OVER (ORDER BY Code, Manager, Employee), *
FROM YourTable
)
SELECT Code = CASE WHEN q1.Code = q2.Code THEN NULL ELSE q1.Code END
, Manager = CASE WHEN q1.Code = q2.Code THEN NULL ELSE q1.Manager END
, q1.Employee
FROM q q1
LEFT OUTER JOIN q q2 ON q1.rn = q2.rn + 1
I know you're asking for an answer in Oracle, but maybe this SQL Server example will help you (if you really, really need to do it like this and not in a reporting environment):
DECLARE #TBL TABLE(
Code INT,
Manager CHAR(1),
Employee VARCHAR(4))
INSERT #TBL VALUES (1,'A','Emp1')
INSERT #TBL VALUES (1,'A','Emp2')
INSERT #TBL VALUES (1,'A','Emp3')
INSERT #TBL VALUES (2,'B','Emp4')
INSERT #TBL VALUES (2,'B','Emp5')
;WITH cte
AS (SELECT Code
,Manager
,Employee
,ROW_NUMBER() OVER(ORDER BY Code) rownum
FROM #TBL)
SELECT CASE curr.Code
WHEN prev.Code THEN ''
ELSE CAST(curr.Code AS VARCHAR(20))
END AS _Code
,CASE curr.Manager
WHEN prev.Manager THEN ''
ELSE curr.Manager
END AS _Manager
,curr.Employee
FROM cte curr
LEFT JOIN cte prev
ON curr.rownum = prev.rownum + 1
If you're just using SQL Server 2005/2008, you can achieve this with the following.
declare #table table (
Code int,
Manager Varchar(1),
Employee varchar(10)
)
insert into #table values
(1,'A','Emp1'),
(1,'A','Emp2'),
(1,'A','Emp3'),
(2,'A','Emp4'),
(2,'A','Emp5')
select * from #table
select
case when number=1 then Code else null end as Code,
case when number=1 then Manager else null end as Manager,
employee
from (
select *,
row_number() over (partition by code, manager order by code,manager) as number
from #table
) x
Which will give you:
(5 row(s) affected)
Code Manager Employee
----------- ------- ----------
1 A Emp1
1 A Emp2
1 A Emp3
2 A Emp4
2 A Emp5
(5 row(s) affected)
Code Manager employee
----------- ------- ----------
1 A Emp1
NULL NULL Emp2
NULL NULL Emp3
2 A Emp4
NULL NULL Emp5
(5 row(s) affected)
Done:)
You can change NULL values to '' if you want to.
WITH
CTE1 AS (
SELECT DISTINCT [Code], [Manager],
( SELECT TOP 1 Employee
FROM [dbo].[table] t2
WHERE t1.Code = t2.Code AND t1.Manager = t2.Manager
ORDER BY Employee) AS [Employee]
FROM [dbo].[table] t1)
,
CTE2 AS (
SELECT * FROM [dbo].[table]
EXCEPT
SELECT * FROM CTE1)
SELECT * FROM CTE1
UNION
SELECT NULL as Code, NULL as Manager, Employee
FROM CTE2
ORDER BY Employee
My SQL*Plus is a little rusty but if that's the tool that you're using then it should be fairly simple but using the BREAK command.
As mentioned in one of the comments about, this is best left to your reporting tool rather than doing in the actual SQL because an individual row without all the values doesn't make any sense outside the context of the result set.
BREAK on code on manager
SELECT code, manager, employee
FROM yourTable y
order by code, manager;
Please not that my SQL*Plus is a little rusty so this might not work exactly but details of the BREAK command can be found in the Oracle documentation.