Finding Duplicates of Multiple Rows in DB2

Finding Duplicates of Multiple Rows in DB2 - sql

I have a table where there can be duplicates that can span across multiple rows.
For example, lets have Employee table with department.
DEPTId Name SALARY
1 TOM 121
1 MARK 21
1 SALLIE 34
2 JAY 342
2 BRITNEY 3
3 TOM 121
3 MARK 21
3 SALLIE 34
4 MARK 21
4 SALLIE 34
5 MARK 21
5 SALLIE 34
5 TOM 121
5 BRITNEY 3
Here when I pass the DeptId as 3, I need to get DeptId 1, as 3 is essentially same as 1.
5 is not same as 1 as it has more rows. All the rows match & then it is a duplicate.
Can I know how can I find using a single query?

You are trying to compare two sets within sets (employees within departments).
A set-based approach is to do a self-join on the table, matching by employee. Next group by two departments. If the two departments have the same employees, then all employees will match. That is, there will be no cases where an employee in one department does not match an employee in the other.
The having clause tests for this condition.
This version of the query uses a driver table to match the departments to the employees. When the sets do not match, the full outer join will have an unmatching row, which is picked up in the having clause:
select driver.deptid1, driver.deptid2
from (select d1.deptid as deptid1, d2.deptid as deptid2
from (select distinct deptid from employees) d1 cross join
(select distinct deptid from employees) d2
) driver left outer join
employees e1
on e1.deptid = driver.deptid full outer join
employees e2
on driver.deptid2 = e2.deptid and e1.name = e2.name and e1.salary = e2.salary
group by driver.deptid1, driver.deptid2
having SUM(case when e1.name is null then 1 else 0 end) = 0 and
SUM(case when e2.name is null then 1 else 0 end) = 0

Related

Not getting data when trying to create multiple inner join on 2 columns from the same table

I have a table and data like below:
Employee:
Id DepartmentId
1 100
2 100
3 100
LeaveRequest:
Id SentFromEmployeeId SentToEmployeeId
1 1 2
2 1 2
3 2 3
LeaveUpdateLogs:
Id RequestedDate LeaveRequestId Status
1 2021-11-01 11:55:56 1 Pending
2 2021-11-02 10:55:56 1 Accepted
3 2021-11-03 11:55:56 1 Accepted
4 2021-11-04 09:55:56 1 Declined
5 2021-11-05 10:55:56 1 Closed
6 2021-11-06 05:55:56 2 Pending
7 2021-11-07 05:55:56 2 Accepted
8 2021-11-08 02:55:56 2 Accepted
9 2021-11-09 05:55:56 2 Declined
10 2021-11-10 05:55:56 2 Closed
Now here I want to calculate statistics as below for a particular department:
Total number of requests sent and received for DepartmentId 100.
But I am confused here for getting data for "Sent" and "Received" like below:
select SentFromEmployeeId,SentToEmployeeId,* from LeaveUpdateLogs l
inner join LeaveRequest lr on l.LeaveRequestId = lr.Id
inner join Employee e1 on e1.Id = lr.SentFromEmployeeId
inner join Employee e2 on e2.Id = lr.SentToEmployeeId
where (l.RequestedDate >= '2021-11-01' and l.RequestedDate < '2021-11-16')
and (e1.DepartmentId =100 or e2.DepartmentId = 100)
But this doesn't return any data although I have "25" records between this 2 dates. When I comment out this 3 lines:
inner join Employee e1 on e1.Id = lr.SentFromEmployeeId
inner join Employee e2 on e2.Id = lr.SentToEmployeeId
(e1.DepartmentId =100 or e2.DepartmentId = 100)
Then query is working fine but I want to get the data for particular department.
Can someone please help me fix this logic?

Since I think you are only asking how to join on multiple values here are a couple ways of you could tackle this.
inner join Employee e1 on e1.Id in (lr.SentFromEmployeeId, lr.SentToEmployeeId)
Or
inner join Employee e1 on e1.Id = lr.SentFromEmployeeId
OR e1.Id = lr.SentToEmployeeId

SQL: Retrieve all records where has all joined

Sorry, couldn't think of a better title.
I have 3 tables in Oracle XE. An EMPLOYEE table a PROJECT table and a WORK_ON table. An EMPLOYEE can WORK_ON many PROJECTs. I am trying to get the employee name who is working on all the projects.
EMPLOYEE Table
Emp_ID EMP_Name
1 Esther
2 Peter
3 Joan
4 Roger
5 Liam
PROJECT Table
Project_ID
1
2
3
WROKS_ON Table
Emp_ID Project_ID
1 3
2 1
2 2
2 3
3 1
3 2
4 1
4 2
4 3
Given the fields my result should be Peter and Roger.
Started with the following, but got stuck:
SELECT EMP_NameLOYEE.E_NAME
FROM EMP_NameLOYEE INNER JOIN
(PROJECT INNER JOIN WROKS_ON ON PROJECT.Project_ID = WROKS_ON.Project_ID) ON
EMP_NameLOYEE.Emp_ID = WROKS_ON.Emp_ID
WHERE WROKS_ON.Project_ID In (SELECT DISTINCT Project_ID FROM PROJECT);
Obviously this retrieves all the names of the employees that are working on each project duplicated, but not exactly what I want.

You can leave the project table out of it.
SELECT e.emp_id, COUNT(project_id)
FROM employee e
INNER JOIN works_on wo ON wo.emp_id = e.emp_id
GROUP BY e.emp_id
HAVING COUNT( project_id ) = (SELECT COUNT(*) FROM project);
SQL Fiddle

You need to generate all combinations of employees and projects with a cross join and left join the works table and check for row counts for each e_name.
SELECT e.E_NAME
FROM EMPLOYEE e
CROSS JOIN PROJECT p
LEFT JOIN WORKS_ON w ON p.Project_ID = w.Project_ID and e.emp_id=w.emp_id
GROUP by e.E_NAME
HAVING COUNT(*)=COUNT(w.project_id)

SQL count 2 equal columns and select other columns

I have a two separate tables, one with vacancies, and one with applications to those vacancies. I want to select a new table which selects from the vacancies table with a number of other columns from that table, and another column that calculates how many applications there are for those vacancies. So my vacancy table looks like this:
ID Active StartDate JobID JobTypeID HoursPerWeek
1 1 2017-02-28 2 CE 0
2 1 2017-02-15 4 CE 40
3 1 2017-02-14 1 CE 40
4 1 2017-02-28 1 CE 48
My applications table looks like this:
ID VacancyID Forename Surname EmailAddress TelephoneNumber
1 1 John Smith jsmith#gmail.com 447777777777
2 2 John Smith jsmith#gmail.com 447748772641
3 2 John Smith jsmith#gmail.com 447777777777
4 2 John Smith jsmith#gmail.com 447700123456
5 4 John Smith jsmith#gmail.com 447400123569
6 4 John Smith jsmith#gmail.com 447400126547
7 4 John Smith jsmith#gmail.com 447555123654
I want a table that looks like this:
ID Active StartDate JobID HoursPerWeek NumberOfApplicants
1 1 2017-02-28 2 0 1
2 1 2017-02-15 4 40 3
3 1 2017-02-14 1 40 0
4 1 2017-02-28 1 48 3
How can I select that table using joins and count the number of applicants where the VacancyID is equal to the ID of the first vacancy table? I have tried:
select Vacancy.ID, VacancyID, count(*) as NumberOfApplications from VacancyApplication
join Vacancy on Vacancy.ID=VacancyID
group by VacancyID, Vacancy.ID
This obviously doesn't select all the other columns and it also does not select ID 3 because there are 0 applications for that - I want ID 3 to be there with a value of 0 as well as all the other columns. How do I do this? I've tried various forms of grouping and selecting but I'm quite new to SQL so I'm not really sure how this can be done.

Use RIGHT JOIN instead of INNER JOIN and count the vacancyid column from vacancyapplication table. For the non matching records you will get count as 0
SELECT v.id, v.Active, v.StartDate, v.JobID, v.HoursPerWeek
Count(va.vacancyid) AS NumberOfApplications
FROM vacancyapplication va
RIGHT JOIN vacancy v
ON v.id = va.vacancyid
GROUP BY v.id, v.Active, v.StartDate, v.JobID, v.HoursPerWeek
Start using Alias names, it makes the query more readable

Hoping, i understood your problem correctly. Please try below query
select Vacancy.ID, VacancyID, count(*) as NumberOfApplications from VacancyApplication
left join Vacancy on Vacancy.ID=VacancyID
group by VacancyID, Vacancy.ID

You can use count as a window function using the OVER clause, thus eliminating he need for group by:
SELECT v.ID,
v.Active,
v.StartDate,
v.JobID,
v.JobTypeID,
COUNT(va.ID) OVER(PARTITION BY v.ID) HoursPerWeek
FROM Vacancy v
LEFT JOIN vacancyapplication va ON(v.ID = va.VacancyID)

Use left join and table aliases:
select v.ID, count(va.VacancyID) as NumberOfApplications
from Vacancy v join
VacancyApplication va
on v.ID = va.VacancyID
group by v.ID;
You seem to want all the columns. You could include them in the group by. However, a correlated subquery or outer apply is simpler:
select v.*, va.cnt
from vacancy v outer apply
(select count(*) as cnt
from VacancyApplication va
where v.ID = va.VacancyID
) va;
This is probably more efficient anyway, especially if you have an index on VacancyApplication(VacancyID).

How to join multiple tables and retrieve an aggregate

I have 3 tables and need to retrieve each EmployeeID, their Name, and their total WorkTime. My table structure is as follows:
DEPT TABLE
ID DEPTNAME DESIGNATION
1 MG MANAGER
2 AN ANALYTICS
3 DV DEVELOPER
4 PM PM
WORK TABLE
EMPID WORKTIME ID(FK TO TABLE DEPT) DATE
1 8 1 09/15/2014
2 7 2 09/15/2014
1 6 1 09/16/2014
2 8 2 09/16/2014
EMP TABLE
EMPID NAME
1 SK
2 TK
3 MK
4 CK
I want all the Employee names with ID and the total working time, as below:
EMPID NAME WORKTIME NOOFDATESWORKS
1 SK 14(8+6) 2
2 TK 15(7+8) 2
3 MK 0
4 CK 0
*please note: employees can work for multiple departments

SELECT E.EmpID,
E.Name,
W.TotalWorkTime
FROM Emp E
LEFT JOIN ( SELECT EMID, SUM(WorkTime) TotalWorkTime
FROM Work
GROUP BY EMID) W
ON E.EmpID =W.EMID
By the way, shouldn't the department Id be on the Emp table rather than the Work table?, as it is it doesn't make much sense to me

List the name of division that has the most employees working on projects

So there are three tables that would be applicable in this statement. The division table, which houses the division name and division id, the workon table which houses the projects and employee ids that correlate to the project, and the employee table that houses the employee id, department id, and name. I'm trying to find the department that has the most employees who work on projects.
This is my code:
select distinct
(dname) as "Division Name"
from
employee e, division d
where
d.did = e.did and
d.did in (
select did from employee where empid in (
select empid from workon having count(pid) >= all(pid)
)
)
I'm supposed to get the answer "human resources" but I cannot seem to get that answer no matter what code I use.
Workon table
PID EMPID HOURS
3 1 30
2 3 40
5 4 30
6 6 60
4 3 70
2 4 45
5 3 90
3 3 100
6 8 30
4 4 30
5 8 30
6 7 30
6 9 40
5 9 50
4 6 45
2 7 30
2 8 30
2 9 30
1 9 30
1 8 30
1 7 30
1 5 30
1 6 30
2 6 30
Employee Table
EMPID NAME SALARY DID
1 kevin 32000 2
2 joan 42000 1
3 brian 37000 3
4 larry 82000 5
5 harry 92000 4
6 peter 45000 2
7 peter 68000 3
8 smith 39000 4
9 chen 71000 1
10 kim 46000 5
11 smith 46000 1
Division
DID DNAME MANAGERID
1 engineering 2
2 marketing 1
3 human resource 3
4 Research and development 5
5 accounting 4

Check this reference out please.
SQLFIDDLE
select d.id, d.name, p.maxcounts
from dept d,
(select we.dep, max(we.counts) as maxcounts
from (select w.eid, count(w.pid) as counts,
e.dep as dep from employee e, workon w
where e.id = w.eid
group by e.dep) as we) as p
where d.id = p.dep
;
RESULTS:
ID NAME MAXCOUNTS
111 human resoruces 5
FOLLOWING is the edit based on your own data:
Reference : SQLFIDDLE_Based_ON_OP_Data
There are three ways you can achieve this. Either use the nested selects, save Max(count) into a variable or order data by desc and limit it to 1.
Method 1:
-- using nested select
--sub query 1 explaining to OP how final answer is derived
select e.dep, count(w.eid) as num_emp
from employee e, workon w
where e.id = w.eid
group by e.dep
order by e.dep
;
-- **results of sub query 1:**
DEP NUM_EMP
1 4
2 5
3 7
4 5
5 3
-- Final nested select query
select ee.dep, dd.name, count(ww.eid)
from employee ee, dept dd, workon ww
where ee.id = ww.eid
and ee.dep = dd.id
group by ee.dep, dd.name
having count(ww.eid) =
(select distinct max(t.num_emp)
from (select e.dep, count(w.eid) as num_emp
from employee e, workon w
where e.id = w.eid
group by e.dep
order by e.dep)as t)
;
-- results using nested selects
DEP NAME COUNT(WW.EID)
3 human resource 7
-- query using a variable
select max(x.num_emp) into #myvar from
(select e.dep, count(w.eid) as num_emp
from employee e, workon w
where e.id = w.eid
group by e.dep) as x
;
select x.dep, x.name, x.num_emp as num_emp from
(select e.dep, d.name, count(w.pid) as num_emp
from employee e, workon w, dept d
where e.id = w.eid
and e.dep = d.id
group by e.dep) as x
where x.num_emp = #myvar
;
-- results using variable
DEP NAME NUM_EMP
3 human resource 7
-- query using limit 1 with ordered desc table
select e.dep, d.name, count(w.eid) as num_emp
from employee e, workon w, dept d
where e.id = w.eid
and e.dep = d.id
group by e.dep
order by num_emp desc
limit 1
-- results using order by desc and limit 1:
DEP NAME NUM_EMP
3 human resource 7
Now when using Method 3, it may or may not matter to you that sometimes there will be two departments with same highest number of employees working in projects. So in that case you may use either nested or variable methods.
*PS I do not have the privilledge to be full time on StackOverFlow, hence sorry for getting back to you late :) *

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Finding Duplicates of Multiple Rows in DB2 - sql

Related

Not getting data when trying to create multiple inner join on 2 columns from the same table

SQL: Retrieve all records where has all joined

SQL count 2 equal columns and select other columns

How to join multiple tables and retrieve an aggregate

List the name of division that has the most employees working on projects

Categories

Resources