SQL listing combinations in different columns only once - sql

My DB looks as followed
**Customer Order Product**
cid(PK) oid(PK) pid(PK)
fname cid(FK) pname
lname pid(FK) pprice
Now I'm using a query to get the following results:
**fname lname oid pname pprice**
Bill Gates 1111 Router 40,-
Bill Gates 1112 Laptop 699,-
Steve Jobs 1113 Tablet 1299,-
Steve Jobs 1114 Watch 699,-
What I want however is to list the first and last name only once if the person has more then one orders. How would I be able to achieve this?
Example expected output:
**fname lname oid pname pprice**
Bill Gates 1111 Router 40,-
1112 Laptop 699,-
Steve Jobs 1113 Tablet 1299,-
1114 Watch 699,-

This problem can be solved with LAG function.
LAG function returns value of a given column from previous row.
Then we can compare this value from previous row with value from current row.
We can use CASE statement to return value depending on result of this comparison.
Here is a query which gives desired result:
SELECT
CASE
WHEN lag(concat(cop.fname, cop.lname), 1, 0) OVER (order by cop.oid) = concat(cop.fname, cop.lname) THEN null
ELSE cop.fname
END AS fname,
CASE
WHEN lag(concat(cop.fname, cop.lname), 1, 0) OVER (order by cop.oid) = concat(cop.fname, cop.lname) THEN null
ELSE cop.lname
END AS lname,
cop.oid,
cop.pname,
cop.pprice
FROM
(SELECT c.fname, c.lname, o.oid, p.pname, p.pprice
FROM customer c
LEFT JOIN myorder o on c.cid=o.cid
LEFT JOIN product p on o.pid=p.pid) cop
It is not good idea to choose ORDER as a name for table because ORDER is keyword in SQL. So I choosed MYORDER as a name for table containing orders.
Above query works for Oracle database

Use ROW_NUMBER() over(Partition by fname, Lname Order By Oid) as RN in your query to number the lines. Then in your outer query or your next query if you dump the results to a temporary table use a case statement to set fname and lname to empty string whenever RN is not 1.
You need an additional column such as Customer No to order on so you would order by CustomerNo, RN, Lname, Fname

Related

Transpose in Postgresql

I am trying to design a database of customer details. Where customers can have up to two different phone numbers.
When I run the Select * command to bring out the customers that match criteria, I get this:
Name | Number
James | 12344532
James | 23232422
I would like it to display all customers with two numbers this way:
Name | Number | Number
James 12344532 23232422
John 32443322
Jude 12121212 23232422
I am using Postgresql server on Azure Data studio.
Please assist.
I tried using this command:
Select * FROM name.name,
min(details.number) AS number1,
max(details.number) AS number2
FROM name
JOIN details
ON name.id=details.id
GROUP BY name.name
I got this:
Name | Number | Number
James 12344532 23232422
John 32443322 32443322
Jude 12121212 23232422
Customers with just 1 phone number gets duplicated in the table. How do I go about this?
I would aggregate the numbers into an array, then extract the array elements:
select n.name,
d.numbers[1] as number_1,
d.numbers[2] as number_2
from name n
join (
select id, array_agg(number) as numbers
from details
group by id
) d on d.id = n.id
order by name;
This is also easy to extend if you have more than two numbers.
Try using the following query:
SELECT
Name,
MIN(CASE WHEN rn = 1 THEN Number END) AS Number1,
MIN(CASE WHEN rn = 2 THEN Number END) AS Number2
FROM
(SELECT
Name, Number,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Number) AS rn
FROM name) t
GROUP BY Name
This query will use the ROW_NUMBER() function to assign a unique row number to each phone number for each customer. The ROW_NUMBER() function is ordered by the Number column, so the lowest number will have a row number of 1, and the second lowest number will have a row number of 2, etc.
Then we use the outer query to group customer by name and use MIN() function to get the first and second number based on the row number.
This query will return the desired output, with two columns, one showing the customer's first phone number and the other showing their second phone number.
Note: The query above assumes that the phone number is unique for each customer. If a customer has duplicate phone numbers, the query will return the first one it encounters.

SQL - how to transpose only some row values into column headers without pivot

I have a table similar to this:
stud_ID | first_name | last_name | email | col_num | user_value
1 tom smith 50 Retail
1 tom smith 60 Product
2 Sam wright 50 Retail
2 Sam wright 60 Sale
but need to convert it to: (basically transpose 'col_num' to column headers and change 50 to function, 60 to department)
stud_ID | first_name | last_name | email | Function | Department
1 tom smith Retail Product
2 Sam wright Retail Sale
Unfortunately Pivot doesn't work in my system, just wondering if there is any other way to do this please?
The code that I have so far (sorry for the long list):
SELECT c.person_id_external as stu_id,
c.lname,
c.fname,
c.mi,
a.cpnt_id,
a.cpnt_typ_id,
a.rev_dte,
a.rev_num,
cp.cpnt_title AS cpnt_desc,
a.compl_dte,
a.CMPL_STAT_ID,
b.cmpl_stat_desc,
b.PROVIDE_CRDT,
b.INITIATE_LEVEL1_SURVEY,
b.INITIATE_LEVEL3_SURVEY,
a.SCHD_ID,
a.TOTAL_HRS,
a.CREDIT_HRS,
a.CPE_HRS,
a.CONTACT_HRS,
a.TUITION,
a.INST_NAME,
--a.COMMENTS,
a.BASE_STUD_ID,
a.BASE_CPNT_TYP_ID,
a.BASE_CPNT_ID,
a.BASE_REV_DTE,
a.BASE_CMPL_STAT_ID,
a.BASE_COMPL_DTE,
a.ES_USER_NAME,
a.INTERNAL,
a.GRADE_OPT,
a.GRADE,
a.PMT_ORDER_TICKET_NO,
a.TICKET_SEQUENCE,
a.ORDER_ITEM_ID,
a.ESIG_MESSAGE,
a.ESIG_MEANING_CODE_ID,
a.ESIG_MEANING_CODE_DESC,
a.CPNT_KEY,
a.CURRENCY_CODE,
c.EMP_STAT_ID,
c.EMP_TYP_ID,
c.JL_ID,
c.JP_ID,
c.TARGET_JP_ID,
c.JOB_TITLE,
c.DMN_ID,
c.ORG_ID,
c.REGION_ID,
c.CO_ID,
c.NOTACTIVE,
c.ADDR,
c.CITY,
c.STATE,
c.POSTAL,
c.CNTRY,
c.SUPER,
c.COACH_STUD_ID,
c.HIRE_DTE,
c.TERM_DTE,
c.EMAIL_ADDR,
c.RESUME_LOCN,
c.COMMENTS,
c.SHIPPING_NAME,
c.SHIPPING_CONTACT_NAME,
c.SHIPPING_ADDR,
c.SHIPPING_ADDR1,
c.SHIPPING_CITY,
c.SHIPPING_STATE,
c.SHIPPING_POSTAL,
c.SHIPPING_CNTRY,
c.SHIPPING_PHON_NUM,
c.SHIPPING_FAX_NUM,
c.SHIPPING_EMAIL_ADDR,
c.STUD_PSWD,
c.PIN,
c.PIN_DATE,
c.ENCRYPTED,
c.HAS_ACCESS,
c.BILLING_NAME,
c.BILLING_CONTACT_NAME,
c.BILLING_ADDR,
c.BILLING_ADDR1,
c.BILLING_CITY,
c.BILLING_STATE,
c.BILLING_POSTAL,
c.BILLING_CNTRY,
c.BILLING_PHON_NUM,
c.BILLING_FAX_NUM,
c.BILLING_EMAIL_ADDR,
c.SELF_REGISTRATION,
c.SELF_REGISTRATION_DATE,
c.ACCESS_TO_ORG_FIN_ACT,
c.NOTIFY_DEV_PLAN_ITEM_ADD,
c.NOTIFY_DEV_PLAN_ITEM_MOD,
c.NOTIFY_DEV_PLAN_ITEM_REMOVE,
c.NOTIFY_WHEN_SUB_ITEM_COMPLETE,
c.NOTIFY_WHEN_SUB_ITEM_FAILURE,
c.LOCKED,
c.PASSWORD_EXP_DATE,
c.SECURITY_QUESTION,
c.SECURITY_ANSWER,
c.ROLE_ID,
c.IMAGE_ID,
c.GENDER,
c.PAST_SERVICE,
c.LST_UNLOCK_TSTMP,
c.MANAGE_SUB_SP,
c.MANAGE_OWN_SP,
d.col_num,
d.user_value
FROM pa_cpnt_evthst a,
pa_cmpl_stat b,
pa_student c,
pv_course cp,
pa_stud_user d
WHERE a.cmpl_stat_id = b.cmpl_stat_id
AND a.stud_id = c.stud_id
AND cp.cpnt_typ_id(+) = a.cpnt_typ_id
AND cp.cpnt_id(+) = a.cpnt_id
AND cp.rev_dte(+) = a.rev_dte
AND a.CPNT_TYP_ID != 'SYSTEM_PROGRAM_ENTITY'
AND c.stud_id = d.stud_id
AND d.col_num in ('10','30','50','60')
I would just use conditional aggregation:
select stud_ID, first_name, last_name, email,
max(case when col_num = 50 then user_value end) as function,
max(case when col_num = 60 then user_value end) as department
from t
group by stud_ID, first_name, last_name, email;
Your code seems to have nothing to do with the sample data. I do notice however that you are using implicit join syntax. You really need to learn how to use proper, explicit, standard JOIN syntax.
I'm assuming you have Sql Server 2000 or 2003. What you need to do in that case is create a script with one cursor.
This cursor will create a text with something like this:
string var = "CREATE TABLE #Report (Col1 VARCHAR(20), Col2, VARCHAR(20), " + ColumnName
That way you can create a temp table on the fly, at the end you will need to do a Select of your temp table to get your pivot table ready.
Its not that easy if you are not familiar with cursors.
OR
if there are only few values on your 'pivot' column and they are not going to grow you can also do something like this:
Pivot using SQL Server 2000
I'm unable to understand your code, so I'll just assume the table mentioned in the sample data as stud(because of stud_id).
So here is what I think can do the work of pivot.
SELECT ISNULL(s1.stud_ID, s2.stud_id),
ISNULL(s1.first_name, s2.first_name),
ISNULL(s1.last_name, s2.last_name),
ISNULL(s1.email, s2.email),
s1.user_value as [Function], s2.user_value as Department
FROM stud s1 OUTER JOIN stud s2
ON s1.stud_ID = s2.stud_ID -- Assuming stud_ID is primary key, else join on all primary keys
AND s1.col_num = 50 AND s2.col_num = 60
Explanation: I'm just trying to simulate here what PIVOT does. For every column you want, you create a new table in the JOIN and constaint it to only one value in your col_num column. For example, if there are no values for 50 in s1, the OUTER JOIN will get make it NULL and we need to pull records from s2.
Note: If you need more than 2 new columns, then you can use COALESCE instead of ISNULL

SQL: Finding duplicate records based on custom criteria

I need to find duplicates based on two tables and based on custom criteria. The following determines whether it's a duplicate, and if so, show only the most recent one:
If Employee Name and all EmployeePolicy CoverageId(s) are an exact match another record, then that's considered a duplicate.
--Employee Table
EmployeeId Name Salary
543 John 54000
785 Alex 63000
435 John 75000
123 Alex 88000
333 John 67000
--EmployeePolicy Table
EmployeePolicyId EmployeeId CoverageId
1 543 8888
2 543 7777
3 785 5555
4 435 8888
5 435 7777
6 123 4444
7 333 8888
8 333 7776
For example, the duplicates in the example above are the following:
EmployeeId Name Salary
543 John 54000
435 John 75000
This is because they are the only ones that have a matching name in the Employee table as well as both have the same exact CoverageIds in the EmployeePolicy table.
Note: EmployeeId 333 also with Name = John is not a match because both of his CoverageIDs are not the same as the other John's CoverageIds.
At first I have been trying to find duplicates the old fashioned way by Grouping records and saying having count(*) > 1, but then quickly realized that it would not work because while in English my criteria defines a duplicate, in SQL the CoverageIDs are different so they are NOT considered duplicates.
By that same accord, I tried something like:
-- Create a TMP table
INSERT INTO #tmp
SELECT *
FROM Employee e join EmployeePolicy ep on e.EmpoyeeId = ep.EmployeeId
SELECT info.*
FROM
(
SELECT
tmp.*,
ROW_NUMBER() OVER(PARTITION BY tmp.Name, tmp.CoverageId ORDER BY tmp.EmployeeId DESC) AS RowNum
FROM #tmp tmp
) info
WHERE
info.RowNum = 1 AND
Again, this does not work because SQL does not see this as duplicates. Not sure how to translate my English definition of duplicate into SQL definition of duplicate.
Any help is most appreciated.
The easiest way is to concatenate the policies into a string. That, alas, is cumbersome in SQL Server. Here is a set-based approach:
with ep as (
select ep.*, count(*) over (partition by employeeid) as cnt
from employeepolicy ep
)
select ep.employeeid, ep2.employeeid
from ep join
ep ep2
on ep.employeeid < ep2.employeeid and
ep.CoverageId = ep2.CoverageId and
ep.cnt = ep2.cnt
group by ep.employeeid, ep2.employeeid, ep.cnt
having count(*) = cnt -- all match
The idea is to match the coverages for different employees. A simple criteria is that the number of coverages need to match. Then, it checks that the number of matching coverages is the actual count.
Note: This puts the employee id pairs in a single row. You can join back to the employees table to get the additional information.
I have not tested the T-SQL but I believe the following should give you the output you are looking for.
;WITH CTE_Employee
AS
(
SELECT E.[Name]
,E.[EmployeeId]
,P.[CoverageId]
,E.[Salary]
FROM Employee E
INNER JOIN EmployeePolicy P ON E.EmployeeId = P.EmployeeId
)
, CTE_DuplicateCoverage
AS
(
SELECT E.[Name]
,E.[CoverageId]
FROM CTE_Employee E
GROUP BY E.[Name], E.[CoverageId]
HAVING COUNT(*) > 1
)
SELECT E.[EmployeeId]
,E.[Name]
,MAX(E.[Salary]) AS [Salary]
FROM CTE_Employee E
INNER JOIN CTE_DuplicateCoverage D ON E.[Name] = D.[Name] AND E.[CoverageId] = D.[CoverageId]
GROUP BY E.[EmployeeId], E.[Name]
HAVING COUNT(*) > 1
ORDER BY E.[EmployeeId]

sql select tuples and group by id

I have the current database schema
EMPLOYEES
ID | NAME | JOB
JOBS
ID | JOBNAME | PRICE
I want to query so that it goes through each employee, and gets all their jobs, but I want each employee ID to be grouped so that it returns the employee ID followed by all the jobs they have. e.g if employee with ID 1 had jobs with ID, JOBNAME (1, Roofing), (1,Brick laying)
I want it to return something like
1 Roofing Bricklaying
I was trying
SELECT ID,JOBNAME FROM JOBS WHERE ID IN (SELECT ID FROM EMPLOYEES) GROUP BY ID;
but get the error
not a GROUP BY expression
Hope this is clear enough, if not please say and I'll try to explain better
EDIT:
WITH ALL_JOBS AS
(
SELECT ID,LISTAGG(JOBNAME || ' ') WITHIN GROUP (ORDER BY ID) JOBNAMES FROM JOBS GROUP BY ID
)
SELECT ID,JOBNAMES FROM ALL_JOBS A,EMPLOYEES B
WHERE A.ID = B.ID
GROUP BY ID,JOBNAMES;
In the with clause, I am grouping by on ID and concatenating the columns corresponding to an ID(also concatenating with ' ' to distinguish the columns).
For example, if we have
ID NAME
1 Roofing
1 Brick laying
2 Michael
2 Schumacher
we will get the result set as
ID NAME
1 Roofing Brick laying
2 Michael Schumacher
Then, I am join this result set with the EMPLOYEES table on ID.
You need to put JobName to group by expression too.
SELECT ID,JOBNAME FROM JOBS WHERE ID IN (SELECT ID FROM EMPLOYEES) GROUP BY ID,JOBNAME;

How do I write a standard SQL GROUP BY that includes columns not in the GROUP BY clause

Let's say I have a table called Customer, defined like this:
Id Name DepartmentId Hired
1 X 101 2001/01/01
2 Y 102 2002/01/01
3 Z 102 2003/01/01
And I want to retrieve the date of the last hiring in each department.
Obviously I would do this
SELECT c.DepartmentId, MAX(c.Hired)
FROM Customer c
GROUP BY c.DepartmentId
Which returns:
101 2001/01/01
102 2003/01/01
But what do I do if I want to return the name of the guy hired? I.e. I would want this result set:
101 2001/01/01 X
102 2003/01/01 Z
Note that the following does not work, as it would return three rows rather than the two I'm looking for:
SELECT c.DepartmentId, c.Name, MAX(c.Hired)
FROM Customer c
GROUP BY c.DepartmentId
I can't remember seeing a query that achieves this.
NOTE: It's not acceptable to join on the Hired field, as that would not be guaranteed to be accurate.
A subselect would do the job and would handle the case where more than one person was hired in the same department on the same day:
SELECT c.DepartmentId, c.Name, c.Hired from Customer c,
(SELECT DepartmentId, MAX(Hired) as MaxHired
FROM Customer
GROUP BY DepartmentId) as sub
WHERE c.DepartmentId = sub.DepartmentId AND c.Hired = sub.MaxHired
Standard Sql:
select *
from Customer C
where exists
(
-- Linq to Sql put NULL instead ;-)
-- In fact, you can even put 1/0 here and would not cause division by zero error
-- An RDBMS do not parse the select clause of correlated subquery
SELECT NULL
FROM Customer
where c.DepartmentId = DepartmentId
GROUP BY DepartmentId
having c.Hired = MAX(Hired)
)
If Sql Server happens to support tuple testing, this is the most succint:
select *
from Customer
where (DepartmentId, Hired) in
(select DepartmentId, MAX(Hired)
from Customer
group by DepartmentId)
SELECT a.*
FROM Customer AS a
JOIN
(SELECT DepartmentId, MAX(Hired) AS Hired
FROM Customer GROUP BY DepartmentId) AS b
USING (DepartmentId,Hired);