How do I transpose multiple rows to columns in SQL - sql

My first time reading a question on here.
I am working at a university and I have a table of student IDs and their supervisors, some of the students have one supervisor and some have two or three depending on their subject.
The table looks like this
ID Supervisor
1 John Doe
2 Peter Jones
2 Sarah Jones
3 Peter Jones
3 Sarah Jones
4 Stephen Davies
4 Peter Jones
4 Sarah Jones
5 John Doe
I want to create a view that turns that into this:
ID Supervisor 1 Supervisor 2 Supervisor 3
1 John Doe
2 Peter Jones Sarah Jones
3 Peter Jones Sarah Jones
4 Stephen Davies Peter Jones Sarah Jones
5 John Doe
I have looked at PIVOT functions, but don't think it matches my needs.
Any help is greatly appreciated.

PIVOT was the right clue, it only needs a little 'extra' :)
DECLARE #tt TABLE (ID INT,Supervisor VARCHAR(128));
INSERT INTO #tt(ID,Supervisor)
VALUES
(1,'John Doe'),
(2,'Peter Jones'),
(2,'Sarah Jones'),
(3,'Peter Jones'),
(3,'Sarah Jones'),
(4,'Stephen Davies'),
(4,'Peter Jones'),
(4,'Sarah Jones'),
(5,'John Doe');
SELECT
*
FROM
(
SELECT
ID,
'Supervisor ' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Supervisor) AS VARCHAR(128)) AS supervisor_id,
Supervisor
FROM
#tt
) AS tt
PIVOT(
MAX(Supervisor) FOR
supervisor_id IN ([Supervisor 1],[Supervisor 2],[Supervisor 3])
) AS piv;
Result:
ID Supervisor 1 Supervisor 2 Supervisor 3
1 John Doe NULL NULL
2 Peter Jones Sarah Jones NULL
3 Peter Jones Sarah Jones NULL
4 Peter Jones Sarah Jones Stephen Davies
5 John Doe NULL NULL
You will notice that the assignment to Supervisor X is done by ordering by the Supervisor-VARCHAR. If you want the ordering done differently, you might want to include an [Ordering] column; then change to ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [Ordering]). Eg an [Ordering] column could be an INT IDENTITY(1,1). I'll leave that as an excercise to you if that's what's really needed.

Related

Pandas - join two dataframes based on one column from table and combining two columns from another table

I am trying to join two tables based on one column from one table and two column from other table
Table a
name, designation
Mr. james john, manager
Mr. jim james, tester
Mr. abe james, developer
Table b
first name, last name, emp id
james,john,1
jim, james,2
abe,james,3
I want to join table a name column with table b combining "Mr. "+first name+last name.
Here's a way to do what your question asks:
res = a.join(b.assign(name='Mr. ' + b['first name'] + ' ' + b['last name']).set_index('name'), on='name')
Input:
dataframe a:
name designation
0 Mr. james john manager
1 Mr. jim james tester
2 Mr. abe james developer
dataframe b:
first name last name emp id
0 james john 1
1 jim james 2
2 abe james 3
Output:
name designation first name last name emp id
0 Mr. james john manager james john 1
1 Mr. jim james tester jim james 2
2 Mr. abe james developer abe james 3

Pivot Rows into column value

I have data like below:
Employer_ID Gender First_Name Last_Name Keywords
----------- ------ ---------- ---------- ---------
101 M Ian SMITH Environment
101 M Ian SMITH Global warmimg
101 M Ian SMITH Earth
101 M Ian SMITH Air
101 M Ian SMITH Sound pollution
102 M Scott Tiger Heart attack
102 M Scott Tiger Medical
102 M Scott Tiger Heart surgery
I would like to have output as below. Group by Employer_Id, Gender, First_Name and Last_Name. All relevant Keywords should be merged to produce one row per Employer_Id, Gender, First_Name and Last_Name:-
Employer_ID Gender First_Name Last_Name Keywords
----------- ------ ---------- --------- ---------
101 M Ian SMITH Environment Global warmimg Earth Air Sound pollution
102 M Scott Tiger Heart attack Medical Heart surgery
You can achieve it using String AGG as suggested by #Zhorov,
SELECT Employer_ID,Gender,First_Name,Last_Name,STRING_AGG(Keywords ,' ') AS Keywords
FROM #Temp
GROUP BY Employer_ID,Gender,First_Name,Last_Name

Find duplicate batches based on multiple columns

I have a table that contains a series of related records (batches). Each batch has a unique id and can contain customer payments. I want to find if a batch is duplicate even if it is submitted on different days.
A batch can have 1 or more records. Here is sample data set:
BatchId InputAmount CustomerName BatchDate
------- ----------- ------------ ----------
182944 $475.00 Barry Smith 16-Mar-2019
182944 $260.00 John Smith 16-Mar-2019
182944 $265.00 Jane Smith 16-Mar-2019
182944 $400.00 Sara Smith 16-Mar-2019
182944 $175.00 Andy Smith 16-Mar-2019
182945 $475.00 Barry Smith 16-Mar-2019
182945 $260.00 John Smith 16-Mar-2019
182945 $265.00 Jane Smith 16-Mar-2019
182945 $400.00 Sara Smith 16-Mar-2019
182945 $175.00 Andy Smith 16-Mar-2019
183194 $100.00 Paul Green 21-Mar-2019
183195 $100.00 Nancy Green 21-Mar-2019
183197 $150.00 John Brown 20-Mar-2019
183197 $210.00 Sarah Brown 20-Mar-2019
183198 $150.00 John Brown 21-Mar-2019
183198 $210.00 Sarah Brown 21-Mar-2019
183200 $125.00 John Doe 20-Mar-2019
183200 $110.00 Sarah Doe 20-Mar-2019
183202 $125.00 John Doe 21-Mar-2019
183202 $110.00 Sarah Doe 21-Mar-2019
183202 $115.00 Paul Rudd 21-Mar-2019
Batches (182944, 182945) and (183197,183198) are duplicate while the other batches are not.
I thought maybe I could create a summary table with counts and sums and get close but I'm having trouble finding the true duplicates by including the names as well.
DECLARE #Summaries TABLE(
BatchId INT,
BatchDate DATETIME,
BatchCount INT,
BatchAmount MONEY)
-- Summarize the Data so we can look for duplicates
INSERT INTO #Summaries
SELECT a.BatchId, a.BatchDate, COUNT(*) AS RecordCount, SUM(a.InputAmount) AS BatchAmount
FROM Batches a
WHERE a.BatchDate BETWEEN '20190316' and '20190321'
GROUP BY a.BatchId, a.BatchDate
ORDER BY a.BatchId DESC
-- find the potential duplicate batches based on the Counts and Sums
SELECT A.* FROM #Summaries A
INNER JOIN (SELECT BatchCount, BatchAmount, BatchDate FROM #Summaries
GROUP BY BatchCount, BatchAmount, BatchDate
HAVING COUNT(*) > 1) B
ON A.BatchCount = B.BatchCount
AND A.BatchAmount = B.BatchAmount
WHERE DATEDIFF(DAY, a.BatchDate, b.BatchDate) BETWEEN -1 AND 1
Thank you for the help. I'm using a SQL Server 2012 database.
you can try like below
with cte as
(select BatchId from table_name
group by BatchId
having count(*)>1
) select * from table_name a where a.BatchId in (select BatchId from cte)

SQL aggregation without min and max

I'm relatively new to SQL. I currently have the following CoursesTbl
StudentName CourseID InstructorName
Harry Potter 180 John Wayne
Harry Potter 181 Tiffany Williams
John Williams 180 Robert Smith
John Williams 181 Bob Adams
Now what I really want is this:
StudentName Course1(180) Course2(181)
Harry Potter John Wayne Tiffany Williams
John Williams Robert Smith Bob Adams
I've tried this query:
Select StudentName, Min(InstructorName) as Course1, Max(InstructorName) as
Course2 from CoursesTbl
Group By StudentName
Now it's clear to me that I need to group by the Student Name. But using Min and Max messes up the instructor order.
i.e. Min for Harry is John Wayne and Max is Tiffany Williams
Min for John Williams is Bob Adams and Max is Robert Smith.
So it does not display instructors in the correct order.
Can anyone please suggest how this could be fixed?
You can use conditional aggregation with a CASE statement along with an aggregate function to PIVOT the data into columns:
select
[StudentName],
Course1 = max(case when CourseId = 180 then InstructorName end),
Course2 = max(case when CourseId = 181 then InstructorName end)
from #Table1
group by StudentName
See Demo. You could also use the PIVOT function to get the result:
select
StudentName,
Course1 = [180],
Course2 = [181]
from
(
select StudentName,
CourseId,
InstructorName
from #Table1
) d
pivot
(
max(InstructorName)
for CourseId in ([180], [181])
) piv
Another Demo.

SQL Query: How to select multiple instances of a single item without collapsing into a group?

I'm trying to do with following with an SQL query in Impala. I've got a single data table that has (among other things) two columns with values that intersect multiple times. For example, let's say we have a table with two columns for related names and phone numbers:
Names Phone Numbers
John Smith (123) 456-7890
Rob Johnson (123) 456-7890
Greg Jackson (123) 456-7890
Tom Green (123) 456-7890
Jack Mathis (123) 456-7890
John Smith (234) 567-8901
Rob Johnson (234) 567-8901
Joe Wolf (234) 567-8901
Mike Thomas (234) 567-8901
Jim Moore (234) 567-8901
John Smith (345) 678-9012
Rob Johnson (345) 678-9012
Toby Ellis (345) 678-9012
Sam Wharton (345) 678-9012
Bob Thompson (345) 678-9012
John Smith (456) 789-0123
Rob Johnson (456) 789-0123
Kelly Howe (456) 789-0123
Hank Rehms (456) 789-0123
Jim Fellows (456) 789-0123
What I need to get from this table is a selection of each item from the Name column that has multiple entries from the Phone Numbers column associated with it, like this:
Names Phone Numbers
John Smith (123) 456-7890
John Smith (234) 567-8901
John Smith (345) 678-9012
John Smith (456) 789-0123
Rob Johnson (123) 456-7890
Rob Johnson (234) 567-8901
Rob Johnson (345) 678-9012
Rob Johnson (456) 789-0123
This is the query I've got so far, but it's not quite giving me the results I'm looking for:
SELECT a.name, a.phone_number, b.phone_number, b.count1
FROM databasename a
INNER JOIN (
SELECT phone_number, COUNT(phone_number) as count1
FROM databasename
GROUP BY phone_number
) b
ON a.phone_number = b.phone_number;
Any ideas on how to improve my query to get the results I'm looking for?
Thank you.
Working with your query...
This generates a subset by name of users having more than 1 phone number it then joins back to the entire set based on name returning all phone numbers for users having more than 1 phone number. however if a user has the same phone number listed more than once it would get returned. to eliminate those if needed, add distinct to the count in the inline view.
SELECT a.name, a.phone_number
FROM databasename a
INNER JOIN (
SELECT name, COUNT(phone_number) as count1
FROM databasename
GROUP BY name
having COUNT(phone_number) > 1
) b
on a.name = b.name
Order by a.name, a.phone_Number
One method is to use exists:
select t.*
from tablename t
where exists (select 1 from tablename t2 where t2.name = t.name and t2.phonenumber <> t.phonenumber)
SELECT DISTINCT x.*
FROM my_table x
JOIN my_table y
ON y.name = x.name
AND y.phone <> x.phone;