Pivot Rows into column value - sql

I have data like below:
Employer_ID Gender First_Name Last_Name Keywords
----------- ------ ---------- ---------- ---------
101 M Ian SMITH Environment
101 M Ian SMITH Global warmimg
101 M Ian SMITH Earth
101 M Ian SMITH Air
101 M Ian SMITH Sound pollution
102 M Scott Tiger Heart attack
102 M Scott Tiger Medical
102 M Scott Tiger Heart surgery
I would like to have output as below. Group by Employer_Id, Gender, First_Name and Last_Name. All relevant Keywords should be merged to produce one row per Employer_Id, Gender, First_Name and Last_Name:-
Employer_ID Gender First_Name Last_Name Keywords
----------- ------ ---------- --------- ---------
101 M Ian SMITH Environment Global warmimg Earth Air Sound pollution
102 M Scott Tiger Heart attack Medical Heart surgery

You can achieve it using String AGG as suggested by #Zhorov,
SELECT Employer_ID,Gender,First_Name,Last_Name,STRING_AGG(Keywords ,' ') AS Keywords
FROM #Temp
GROUP BY Employer_ID,Gender,First_Name,Last_Name

Related

Inputting data into a table from two different tables

I have these 3 tables with data
SQL> select * from subject;
SUBJECTID LNAME FNAME PROJID
---------- ------------ ---------- ----------
10011 Indy Eva XYZ01
20022 Jordan Sam XYZ01
30033 Jordan Mary XYZ01
40044 Belmont Renee XYZ02
50055 Pissaro Becky XYZ02
60066 Nadal Becky XYZ03
70077 Bardot Brigitte XYZ03
80088 null Eva XYZ03
90099 Garnet Larry XYZ04
10111 Isner Monica XYZ04
11011 Dupont Marty XYZ05
11 rows selected.
SQL> select * from project;
PROJID MEDICNAME PURPOSE START_DATE END_DATE PI_ID
---------- ---------- ------------ ----------- ----------- ----------
XYZ02 Medic1 diabetes 01-oct-2018 31-jul-2022 10001
XYZ01 Medic1 foot 01-sep-2019 31-jul-2021 10001
XYZ04 Medic3 spleen 10-jan-2019 31-jul-2021 10001
XYZ05 Medic5 spleen 10-jul-2020 1-jan-2021 10002
XYZ03 Medic3 lung 01-nov-2016 31-dec-2022 10002
SQL> select * from researcher;
PID LNAME FNAME
---------- ------------ ----------
10001 Elgar Dawn
10002 Jordan Daniel
10003 Jordan Athena
10004 Rivers Karen
10005 Gomez Tracy
10006 Gomez Jenny
10007 Perry Eva
10008 McHale Vicky
8 rows selected.
and then created a third table that looks like this
SQL> CREATE TABLE n_subject
2 (SubjID number(7),
3 Lastname varchar2(12),
4 Firstname varchar2(10));
I want to populate my new table with the Subjects who were involved in projects that were lead by Dawn Elgar (PID is 10001). Is there a way to do that across 3 tables? I am close with code that looks like this
SQL> insert into n_subject (subjid, lastname, firstname)
2 select subjectid, lname, fname
3 from subject where projid = 'XYZ01' or projid = 'XYZ02' or projid = 'XYZ04';
but am trying to get the data in there across all the three tables instead, using the ProjectId and the PID. Is this possible?
you can use select with inner join statement
INSERT INTO n_subject
(subjid, lastname, firstname)
SELECT subjectid,
lname,
fname
FROM [subject]
JOIN [project]
ON [project].projid = [subject].projid
WHERE [project].pi_id = '10001';

Self JOIN to find the parent detail which matches with the row data -

I am trying to query in MS SQL and I can not resolve it. I have a table employees:
Id Name Surname FatherName MotherName WifeName Pincode isChild
-- ------- ------- ---------- ---------- -------- ------- -------
1 John Green James Sue null 101011 1
2 Michael Sloan Barry Lilly null 101011 1
3 Sally Green Andrew Molly Jemi 101011 1
4 Barry Sloan Soul Paul Lilly 101011 0
5 James Green Ned White Sue 101011 0
I want a query that selects rows where the father name and mother name of child matches with name and wife name. For the example table, where I want to return the result of rows where father and mother name matches the name and wife name column. For eg. id=1, where John's father name James and mother name Sue matches with id 5 which returns James as first name and Sue as wife name. So my query should return (this is my expected result)
Id Name Surname FatherName MotherName WifeName Pincode isChild
-- ------- ------- ---------- ---------- -------- ------- -------
5 James Green Ned White Sue 101011 0
4 Barry Sloan Soul Paul Lilly 101011 0
I tried with the below query but it checks for James only. How to change my query so that it checks all the names and returns the expected result.
select * FROM employees
where first_name like '%James%'
and wife_name like '%Sue%'
and pincode=101011;
Any tips on this will be really helpful. I am new to joins, need help on writing self join to get the result.
…
select *
from thetable as p -- the parent/father
where exists -- with one child at least
(
select *
from thetable as c
where c.fathername = p.name
and c.mothername = p.wifename
-- lastname?
)
Too long for a comment, but also not intended as a slam against what you are working with. Please take as constructive criticism.
Aside from VERY POOR DESIGN on the table content, getting that corrected before you get too deep into whatever you are working should be done first. A more typical design might be having a table of people. Now, to get the relationships you could do a couple ways. One is that on each individual person's record, you add 2 additional IDs. FatherID, MotherID. These IDs would join directly back to the child vs hard strings to match against. Take a surname like Smith or Jones. Then, look at the many instances of a "John Smith" may exist, yes a lot, and lower probability of finding a matching wife's name of Sue, Mary or whatever else name. But even that could lead to multiple possibilities. Yes, you are adding a PIN, but even a computer can generate a random pin of 1234.
By having the IDs, there is NO ambiguity of who the relationship is with.
If the data were slightly altered to something like
Id Name Surname FatherID MotherID SpouseID
-- ------- ------- ---------- ---------- --------
1 John Green 5 6 null
2 Michael Sloan 4 3 null
3 Lilly Sloan null null 4
4 Barry Sloan null null 3
5 James Green 9 10 6
6 Sue Green 7 8 5
7 Bill Jones null null 8
8 Martha Jones null null 7
9 Brian Green null null 10
10 Beth Smith-Green null null 9
So, in this modified example, you can see right away that ID#1 John Green has parents of Father (ID#5) is James and Mother (ID#6) is Sue. But even from this, James is a child to Father (ID#9) Brian and Mother (ID#10) Beth. This scenario is showing to a grand-parent level capacity and that each of James and Sue are also children but to their respective parents. Sue's parents of the Jones surname.
For Michael Sloan, parents of #4 Barry, and #3 Lilly.
And I additionally added a spouse ID. This prevents redundancy of people's names copied all over. Then you can query based on the child's parent's respective IDs to find out vs a hopeful name LIKE guess.
So, even though not solving a relatively simple query, fixing the underlying foundation of your database and is relations will, long-term, help ease your querying in the future.
Try this:
SELECT
T2.*
FROM Employee T1
JOIN Employee T2 ON T2.Name = T1.FatherName
AND T2.WifeName = T1.MotherName

Find duplicate batches based on multiple columns

I have a table that contains a series of related records (batches). Each batch has a unique id and can contain customer payments. I want to find if a batch is duplicate even if it is submitted on different days.
A batch can have 1 or more records. Here is sample data set:
BatchId InputAmount CustomerName BatchDate
------- ----------- ------------ ----------
182944 $475.00 Barry Smith 16-Mar-2019
182944 $260.00 John Smith 16-Mar-2019
182944 $265.00 Jane Smith 16-Mar-2019
182944 $400.00 Sara Smith 16-Mar-2019
182944 $175.00 Andy Smith 16-Mar-2019
182945 $475.00 Barry Smith 16-Mar-2019
182945 $260.00 John Smith 16-Mar-2019
182945 $265.00 Jane Smith 16-Mar-2019
182945 $400.00 Sara Smith 16-Mar-2019
182945 $175.00 Andy Smith 16-Mar-2019
183194 $100.00 Paul Green 21-Mar-2019
183195 $100.00 Nancy Green 21-Mar-2019
183197 $150.00 John Brown 20-Mar-2019
183197 $210.00 Sarah Brown 20-Mar-2019
183198 $150.00 John Brown 21-Mar-2019
183198 $210.00 Sarah Brown 21-Mar-2019
183200 $125.00 John Doe 20-Mar-2019
183200 $110.00 Sarah Doe 20-Mar-2019
183202 $125.00 John Doe 21-Mar-2019
183202 $110.00 Sarah Doe 21-Mar-2019
183202 $115.00 Paul Rudd 21-Mar-2019
Batches (182944, 182945) and (183197,183198) are duplicate while the other batches are not.
I thought maybe I could create a summary table with counts and sums and get close but I'm having trouble finding the true duplicates by including the names as well.
DECLARE #Summaries TABLE(
BatchId INT,
BatchDate DATETIME,
BatchCount INT,
BatchAmount MONEY)
-- Summarize the Data so we can look for duplicates
INSERT INTO #Summaries
SELECT a.BatchId, a.BatchDate, COUNT(*) AS RecordCount, SUM(a.InputAmount) AS BatchAmount
FROM Batches a
WHERE a.BatchDate BETWEEN '20190316' and '20190321'
GROUP BY a.BatchId, a.BatchDate
ORDER BY a.BatchId DESC
-- find the potential duplicate batches based on the Counts and Sums
SELECT A.* FROM #Summaries A
INNER JOIN (SELECT BatchCount, BatchAmount, BatchDate FROM #Summaries
GROUP BY BatchCount, BatchAmount, BatchDate
HAVING COUNT(*) > 1) B
ON A.BatchCount = B.BatchCount
AND A.BatchAmount = B.BatchAmount
WHERE DATEDIFF(DAY, a.BatchDate, b.BatchDate) BETWEEN -1 AND 1
Thank you for the help. I'm using a SQL Server 2012 database.
you can try like below
with cte as
(select BatchId from table_name
group by BatchId
having count(*)>1
) select * from table_name a where a.BatchId in (select BatchId from cte)

How to make a DISTINCT CONCAT statement?

New SQL developer here, how do I make a DISTINCT CONCAT statement?
Here is my statement without the DISTINCT key:
COLUMN Employee FORMAT a25;
SELECT CONCAT(CONCAT(EMPLOYEEFNAME, ' '), EMPLOYEELNAME) AS "Employee", JOBTITLE "Job Title"
FROM Employee
ORDER BY EMPLOYEEFNAME;
Here is it's output:
Employee Job Title
------------------------- -------------------------
Bill Murray Cable Installer
Bill Murray Cable Installer
Bob Smith Project Manager
Bob Smith Project Manager
Frank Herbert Network Specilist
Henry Jones Technical Support
Homer Simpson Programmer
Jane Doe Programmer
Jane Doe Programmer
Jane Doe Programmer
Jane Fonda Project Manager
John Jameson Cable Installer
John Jameson Cable Installer
John Carpenter Technical Support
John Carpenter Technical Support
John Jameson Cable Installer
John Carpenter Technical Support
John Carpenter Technical Support
Kathy Smith Network Specilist
Mary Jane Project Manager
Mary Jane Project Manager
21 rows selected
If I were to use the DISTINCT key I should only have 11 rows selected, however
if I use SELECT DISTINCT CONCAT I get an error.
One option is to use GROUP BY:
SELECT CONCAT(CONCAT(EMPLOYEEFNAME, ' '), EMPLOYEELNAME) AS "Employee",
JOBTITLE AS "Job Title"
FROM Employee
GROUP BY CONCAT(CONCAT(EMPLOYEEFNAME, ' '), EMPLOYEELNAME),
JOBTITLE
ORDER BY "Employee"
Another option, if you really want to use DISTINCT, would be to subquery your current query:
SELECT DISTINCT t.Employee,
t."Job Title"
FROM
(
SELECT CONCAT(CONCAT(EMPLOYEEFNAME, ' '), EMPLOYEELNAME) AS "Employee",
JOBTITLE AS "Job Title"
FROM Employee
) t

How do I transpose multiple rows to columns in SQL

My first time reading a question on here.
I am working at a university and I have a table of student IDs and their supervisors, some of the students have one supervisor and some have two or three depending on their subject.
The table looks like this
ID Supervisor
1 John Doe
2 Peter Jones
2 Sarah Jones
3 Peter Jones
3 Sarah Jones
4 Stephen Davies
4 Peter Jones
4 Sarah Jones
5 John Doe
I want to create a view that turns that into this:
ID Supervisor 1 Supervisor 2 Supervisor 3
1 John Doe
2 Peter Jones Sarah Jones
3 Peter Jones Sarah Jones
4 Stephen Davies Peter Jones Sarah Jones
5 John Doe
I have looked at PIVOT functions, but don't think it matches my needs.
Any help is greatly appreciated.
PIVOT was the right clue, it only needs a little 'extra' :)
DECLARE #tt TABLE (ID INT,Supervisor VARCHAR(128));
INSERT INTO #tt(ID,Supervisor)
VALUES
(1,'John Doe'),
(2,'Peter Jones'),
(2,'Sarah Jones'),
(3,'Peter Jones'),
(3,'Sarah Jones'),
(4,'Stephen Davies'),
(4,'Peter Jones'),
(4,'Sarah Jones'),
(5,'John Doe');
SELECT
*
FROM
(
SELECT
ID,
'Supervisor ' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Supervisor) AS VARCHAR(128)) AS supervisor_id,
Supervisor
FROM
#tt
) AS tt
PIVOT(
MAX(Supervisor) FOR
supervisor_id IN ([Supervisor 1],[Supervisor 2],[Supervisor 3])
) AS piv;
Result:
ID Supervisor 1 Supervisor 2 Supervisor 3
1 John Doe NULL NULL
2 Peter Jones Sarah Jones NULL
3 Peter Jones Sarah Jones NULL
4 Peter Jones Sarah Jones Stephen Davies
5 John Doe NULL NULL
You will notice that the assignment to Supervisor X is done by ordering by the Supervisor-VARCHAR. If you want the ordering done differently, you might want to include an [Ordering] column; then change to ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [Ordering]). Eg an [Ordering] column could be an INT IDENTITY(1,1). I'll leave that as an excercise to you if that's what's really needed.