I need to pivot some data when doing a select query. I'm using SQL Server 2014. Here is the format of the original data.
StudentID | DocumentType | PersonID
---------- ------------- --------
00001 DocA 2222
00001 DocB 2222
00002 DocB 2222
00002 DocA 3333
00003 DocA 4444
And I want it to display like...
StudentID | DocumentTypeAPersonID | DocumentTypeBPersonID
--------- --------------------- -----------------------
00001 2222 2222
00002 3333 2222
00003 4444 NULL
Sometimes a student will have both document types. Sometimes they will only have one. Not sure if the "missing" document type would show up as NULL or just blank in that field.
this way might save you some code
SELECT StudentID,
DocumentTypeAPersonID = MAX(CASE WHEN DocumentType ='DocA' THEN PersonID END),
DocumentTypeBPersonID = MAX(CASE WHEN DocumentType ='DocB' THEN PersonID END)
FROM MyTable
GROUP BY StudentID
Here you go.
SELECT StudentID, DocA, DocB FROM
(
SELECT StudentID, DocumentType, PersonID
FROM myTable
) t
pivot
(
MAX(PersonID)
FOR DocumentType IN (DocA, DocB)
) p
This is a static pivot meaning that you have to manually input the columns you want to pivot. If, for example, you also have a DocC then just do this...
SELECT StudentID, DocA, DocB, DocC FROM
(
SELECT StudentID, DocumentType, PersonID
FROM myTable
) t
pivot
(
MAX(PersonID)
FOR DocumentType IN (DocA, DocB, DocC)
) p
Related
I have a query and which give the count greater than 1, but what I expect is I need the result to be based on particular column(Rollno.) How to achieve it.
Table Studies
NAME RollNo DeptType InternalStaff_1 InternalStaff_2
----------- ----------- ----------- --------------- ---------------
Anu 5 CompSci Eve Antony
Joy 13 Architecture Elizabeth George
Adam 2 Mech Grady Lisa
Adam 2 Mech Grady Kim
Anu 5 CompSci Eve Antony
The below query gives me Count but not as expected
SELECT DISTINCT S.Name
, S.RollNo
, COUNT(S.RollNo) AS [Count]
, S.DeptType
, S.InternalStaff_1
, S.InternalStaff_2
FROM DataMining.dbo.Studies S
WHERE StartDate >= '20210325'--#StartDate
AND StartDate <= '20210407'--#EndDate
GROUP BY S.Name, S.RollNo, S.DeptType, S.InternalStaff_1, S.InternalStaff_2
HAVING COUNT(S.RollNo) > 1
ORDER BY RollNo
The query gave me the below result
NAME RollNo Count DeptType InternalStaff_1 InternalStaff_2
----------- ----------- ----------- ----------- --------------- ---------------
Anu 5 2 CompSci Eve Antony
But the expected result is
NAME RollNo Count DeptType InternalStaff_1 InternalStaff_2
----------- ----------- ----------- ----------- --------------- ---------------
Anu 5 2 CompSci Eve Antony
Adam 2 2 Mech Grady NULL
As you can see the expected result is having a different InternalStaff_2 name for Adam which is not considered on the present result.
May I know how to over come this?
Note: I need the results to be displayed based on Rollno but I also need the InternalStaff_2 to be included in the result.
Hmmm . . . If I understand correctly, you want NULL if the internal staff columns do not match. That would be:
SELECT S.Name, S.RollNo, COUNT(*) AS [Count], S.DeptType,
(CASE WHEN MIN(S.InternalStaff_1) = MAX(S.InternalStaff_1) THEN MIN(S.InternalStaff_1) END) as InternalStaff_1,
(CASE WHEN MIN(S.InternalStaff_2) = MAX(S.InternalStaff_2) THEN MIN(S.InternalStaff_2) END) as InternalStaff_2
FROM DataMining.dbo.Studies S
WHERE StartDate >= '20210325' AND --#StartDate
StartDate <= '20210407' --#EndDate
GROUP BY S.Name, S.RollNo, S.DeptType
HAVING COUNT(*) > 1
ORDER BY RollNo;
Here is a db<>fiddle that shows that this basically works.
like this?
I do not confirm your needs, but the internalstaff_2 column can refer to STRING_AGG() to replace the nested subquery in the following script.
SELECT DISTINCT S.Name
, S.RollNo
, COUNT(S.RollNo) AS [Count]
, S.DeptType
, S.InternalStaff_1
, (SELECT CASE WHEN COUNT(DISTINCT InternalStaff_2) = 1 THEN MIN(InternalStaff_2) ELSE null END
FROM #temp AS i
WHERE i.NAME = S.NAME and i.RollNo = S.RollNo and i.DeptType = S.DeptType and i.InternalStaff_1 = S.InternalStaff_1) as InternalStaff_2
FROM #temp S
GROUP BY S.[Name], S.RollNo, S.DeptType, S.InternalStaff_1
HAVING COUNT(S.RollNo) > 1
ORDER BY RollNo
Googling SQL PIVOT brings up answers to more complex situations than I need with aggregations, and although I did find this simple SQL Pivot Query , it's pivoting on a single table, whereas I have two, it's doing a rank partition which I don't know is necessary, I can't actually get it to work, plus it's 5 years old and I'm hoping there's an easier way.
I am sure this is a duplicate question so if someone can find it then please do!
People table:
PersonID
========
1
2
3
Device table:
DeviceID | PersonID
===================
1111 1
2222 1
3333 1
123 2
456 2
9999 3
I do a join like this:
SELECT p.PersonID, d.DeviceID FROM People p
LEFT JOIN Device d on d.PersonID = p.PersonID
Which gives me:
PersonID | DeviceID
===================
1 1111
1 2222
1 3333
2 123
2 456
3 9999
I know what you're thinking, it's just the Device table, but this is a minimal version of the query and tables, there's much more going on in the real ones,
I want to be able to inject a join on the People table to the Device table and get three columns:
Must I use PIVOT to get the results like this? (there will always be a max of three devices per person)
PersonID | 1 | 2 | 3
===============================================
1 1111 2222 3333
2 123 456
3 9999
(Where the blanks would be NULL)
I'm trying:
SELECT PersonID, [1], [2], [3]
FROM (
SELECT p.PersonID, d.DeviceID FROM People p
LEFT JOIN Device d on d.PersonID = p.PersonID) AS r
PIVOT
(
MAX(DeviceID)
FOR DeviceID IN([1], [2], [3])
) AS p;
But it's giving me NULL for all three columns.
The value list defined in the pivot clause must contain actual values from your table. [1], [2], [3] are values from your PersonId, not for DeviceId. So the part for DeviceId in [1], [2], [3] is not producing any results, hence all the null values.
Here is my solution. I constructed a new key_ column to pivot around.
Sample data with added person names
declare #person table
(
personid int,
personname nvarchar(100)
);
insert into #person (personid, personname) values
(1, 'Ann'),
(2, 'Britt'),
(3, 'Cedric');
declare #device table
(
personid int,
deviceid int
);
insert into #device (personid, deviceid) values
(1, 1111),
(1, 2222),
(1, 3333),
(2, 123),
(2, 456),
(3, 9999);
Solution
Run the CTE part on its own to see the intermediate result table. The key_ column contains values like DEVICE_* which are the same values used in the for key_ in part of the pivot clause.
with base as
(
select p.personname,
d.deviceid,
'DEVICE_' + convert(char, ROW_NUMBER() over(partition by p.personname order by d.deviceid)) as 'key_'
from #person p
join #device d
on d.personid = p.personid
)
select piv.personname, piv.DEVICE_1, piv.DEVICE_2, piv.DEVICE_3
from base
pivot( max(deviceid) for key_ in ([DEVICE_1], [DEVICE_2], [DEVICE_3]) ) piv;
Result
The intermediate CTE result table
personname deviceid key_
---------- ----------- ----------
Ann 1111 DEVICE_1
Ann 2222 DEVICE_2
Ann 3333 DEVICE_3
Britt 123 DEVICE_1
Britt 456 DEVICE_2
Cedric 9999 DEVICE_1
The final result
personname DEVICE_1 DEVICE_2 DEVICE_3
---------- ----------- ----------- -----------
Ann 1111 2222 3333
Britt 123 456 NULL
Cedric 9999 NULL NULL
I'm trying to match and align data, or resaid, count occurrences and then list for which values those occurrences occur.
Or, in a question: "How many times does each ID value occur, and for what names?"
For example, with this input
Name ID
-------------
jim 123
jim 234
jim 345
john 123
john 345
jane 234
jane 345
jan 45678
I want the output to be:
count ID name name name
------------------------------------
3 345 jim john jane
2 123 jim john
2 234 jim jane
1 45678 jan
Or similarly, the input could be (noticing that the ID values are not aligned),
jim john jane jan
----------------------------
123 345 234 45678
234 123 345
345
but that seems to complicate things.
As close as I am to the desired results is in SQL, as
for ID, count(ID)
from table
group by (ID)
order by count desc
which outputs
ID count
------------
345 3
123 2
234 2
45678 1
I'll appreciate help.
You seem to want a pivot. In SQL, you have to specify the number of columns in advance (unless you construct the query as a string).
But the idea is:
select ID, count(*) as cnt,
max(case when seqnum = 1 then name end) as name_1,
max(case when seqnum = 2 then name end) as name_2,
max(case when seqnum = 3 then name end) as name_3
from (select t.*,
row_number() over (partition by id order by id) as seqnum -- arbitrary ordering
from table t
) t
group by ID
order by count desc;
If you have an unknown number of columns, you can aggregate the values into an array:
select ID, count(*) as cnt,
array_agg(name order by name) as names
from table t
group by ID
order by count desc
the query would look similar to this if that's what you're looking for.
SELECT
name,
id,
COUNT(id) as count
FROM
dataSet
WHERE
dataSet.name = 'input'
AND dataSet.id = 'input'
GROUP BY
name,
id
I have a table which fills up with lots of transactions monthly, like below.
Name ID Date OtherColumn
_________________________________________________
John Smith 11111 2012-11-29 Somevalue
John Smith 11111 2012-11-30 Somevalue
Adam Gray 22222 2012-12-11 Somevalue
Tim Blue 33333 2012-12-15 Somevalue
John NewName 11111 2013-01-01 Somevalue
Adam Gray 22222 2013-01-02 Somevalue
From this table i want to create a dimension table with the unique names and id's. The problem is that a person can change his/her name, like "John" in the example above. The Id's are otherwise always unique. In those cases I want to only use the newest name (the one with the latest date).
So that I end up with a table like this:
Name ID
______________________
John NewName 11111
Adam Gray 22222
Tim Blue 33333
How do I go about achieving this?
Can I do it in a single query?
Use a CTE for this. It simplifies ranking and window functions.
;WITH CTE as
(SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
ID,
Name
FROM
YourTable)
SELECT
Name,
ID
FROM
CTE
WHERE
RN = 1
I think creating a table is a bad idea, but this is how you get the most recent name.
select name
from yourtable yt join
(select id, max(date) maxdate
from yourtable
group by id ) temp on temp.id = yt.id and yt.date = maxdate
JNK's CTE solution is an equivalent of the following.
SELECT
Name,
ID
FROM (
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
Name,
ID
FROM theTable
)
WHERE RN = 1
Trying to think a way to get rid of the partition function without introducing the possible duplicates.
I have a table of bank staff information that looks like this:
branchNumber Position firstName lastName staffNumber
------------ -------- --------- -------- -----------
25 Manager john doe 11111
25 Secretary robert paulson 11112
25 Secretary cindy lu 11113
66 Manager tim timson 22223
66 Manager jacob jacobson 22224
66 Secretary henry henryson 22225
66 Supervisor paul paulerton 22226
I am actually done with this, but I completed the assignment using SQL common table expressions, and I can't use them in this project, I need them in this format.
branchNumber numOfManagers numOfSecretaries numOfSupervisors totalEmployees
------------ ------------- ---------------- ---------------- --------------
25 1 2 0 3
66 2 1 1 4
My issue is getting multiple columns with information from a row, I have this so far,
SELECT branchNumber, COUNT(*) AS numOfManagers
FROM Staff
WHERE position = 'Manager'
GROUP BY branchNumber, Position;
This outputs the correct information for numOfManagers, but making the next three columns eludes me without using CTE's. I tried sub selects too, with no luck. Anybody have any ideas?
You can use something like this:
select branchnumber,
sum(case when Position ='Manager' then 1 else 0 end) numofManagers,
sum(case when Position ='Secretary' then 1 else 0 end) numofSecretaries,
sum(case when Position ='Supervisor' then 1 else 0 end) numofSupervisors,
count(*) totalEmployees
from yourtable
group by branchnumber
See SQL Fiddle with Demo
Or you can use the PIVOT function:
select branchnumber,
'Manager', 'Secretary', 'Supervisor',
TotalEmployees
from
(
select t1.branchnumber,
t1.position,
t2.TotalEmployees
from yourtable t1
inner join
(
select branchnumber, count(*) TotalEmployees
from yourtable
group by branchnumber
) t2
on t1.branchnumber = t2.branchnumber
) x
pivot
(
count(position)
for position in ('Manager', 'Secretary', 'Supervisor')
) p;
See SQL Fiddle with Demo