Count specific duplicates in Oracle - sql

Hi I have problem to count the number of employees (EmpID) with a phone number (PhoneNum) assigned also to some other employee. But only for specific organization (OrgID)
My Oracle tables looks like this:
TABLE OrgEmployees (OrgID, EmpID, ...)
TABLE PhoneNums (ID, EmpID, PhoneNum, ...)
Sample data for the specific organization:
SELECT pn.EmpID, pn.PhoneNum FROM PhoneNums pn
WHERE EmpID IN (SELECT DISTINCT EmpID FROM OrgEmployees oe
WHERE oe.OrgID = 'XY');
EmpID PhoneNum
723 963264
731 963264
973 963276
729 963276
103 963450
725 963450
722 963460
731 963460
722 963462
731 963462
427 995487
295 995487
771 123151
503 123151
721 963265
104 963266
Correct result on above set of data should be 14.
My attempts went like this:
SELECT pn.PhoneNum, count(pn.EmpID) FROM PhoneNums pn
WHERE pn.EmpID IN (SELECT oe.EmpID FROM OrgEmployees oe
WHERE oe.OrgID = 'XY')
GROUP BY pn.PhoneNum
HAVING count (*) > 1
ORDER BY pn.PhoneNum;
But how could I consider if EmpID are the same or not?
Thank you in advance

I think you want count(distinct):
SELECT pn.PhoneNum, COUNT(DISTINCT pn.EmpID)
FROM PhoneNums pn
WHERE pn.EmpID IN (SELECT oe.EmpID
FROM OrgEmployees oe
WHERE oe.OrgID = 'XY'
)
GROUP BY pn.PhoneNum
HAVING COUNT(DISTINCT pn.EmpID) > 1
ORDER BY pn.PhoneNum;
I would be more inclined to write this using JOIN rather than IN:
SELECT pn.PhoneNum, COUNT(DISTINCT pn.EmpID)
FROM PhoneNums pn JOIN
OrgEmployees oe
ON oe.OrgID = 'XY' AND pn.EmpID = oe.EmpID
GROUP BY pn.PhoneNum
HAVING COUNT(DISTINCT pn.EmpID) > 1
ORDER BY pn.PhoneNum;

Related

SQL: Finding duplicate records based on custom criteria

I need to find duplicates based on two tables and based on custom criteria. The following determines whether it's a duplicate, and if so, show only the most recent one:
If Employee Name and all EmployeePolicy CoverageId(s) are an exact match another record, then that's considered a duplicate.
--Employee Table
EmployeeId Name Salary
543 John 54000
785 Alex 63000
435 John 75000
123 Alex 88000
333 John 67000
--EmployeePolicy Table
EmployeePolicyId EmployeeId CoverageId
1 543 8888
2 543 7777
3 785 5555
4 435 8888
5 435 7777
6 123 4444
7 333 8888
8 333 7776
For example, the duplicates in the example above are the following:
EmployeeId Name Salary
543 John 54000
435 John 75000
This is because they are the only ones that have a matching name in the Employee table as well as both have the same exact CoverageIds in the EmployeePolicy table.
Note: EmployeeId 333 also with Name = John is not a match because both of his CoverageIDs are not the same as the other John's CoverageIds.
At first I have been trying to find duplicates the old fashioned way by Grouping records and saying having count(*) > 1, but then quickly realized that it would not work because while in English my criteria defines a duplicate, in SQL the CoverageIDs are different so they are NOT considered duplicates.
By that same accord, I tried something like:
-- Create a TMP table
INSERT INTO #tmp
SELECT *
FROM Employee e join EmployeePolicy ep on e.EmpoyeeId = ep.EmployeeId
SELECT info.*
FROM
(
SELECT
tmp.*,
ROW_NUMBER() OVER(PARTITION BY tmp.Name, tmp.CoverageId ORDER BY tmp.EmployeeId DESC) AS RowNum
FROM #tmp tmp
) info
WHERE
info.RowNum = 1 AND
Again, this does not work because SQL does not see this as duplicates. Not sure how to translate my English definition of duplicate into SQL definition of duplicate.
Any help is most appreciated.
The easiest way is to concatenate the policies into a string. That, alas, is cumbersome in SQL Server. Here is a set-based approach:
with ep as (
select ep.*, count(*) over (partition by employeeid) as cnt
from employeepolicy ep
)
select ep.employeeid, ep2.employeeid
from ep join
ep ep2
on ep.employeeid < ep2.employeeid and
ep.CoverageId = ep2.CoverageId and
ep.cnt = ep2.cnt
group by ep.employeeid, ep2.employeeid, ep.cnt
having count(*) = cnt -- all match
The idea is to match the coverages for different employees. A simple criteria is that the number of coverages need to match. Then, it checks that the number of matching coverages is the actual count.
Note: This puts the employee id pairs in a single row. You can join back to the employees table to get the additional information.
I have not tested the T-SQL but I believe the following should give you the output you are looking for.
;WITH CTE_Employee
AS
(
SELECT E.[Name]
,E.[EmployeeId]
,P.[CoverageId]
,E.[Salary]
FROM Employee E
INNER JOIN EmployeePolicy P ON E.EmployeeId = P.EmployeeId
)
, CTE_DuplicateCoverage
AS
(
SELECT E.[Name]
,E.[CoverageId]
FROM CTE_Employee E
GROUP BY E.[Name], E.[CoverageId]
HAVING COUNT(*) > 1
)
SELECT E.[EmployeeId]
,E.[Name]
,MAX(E.[Salary]) AS [Salary]
FROM CTE_Employee E
INNER JOIN CTE_DuplicateCoverage D ON E.[Name] = D.[Name] AND E.[CoverageId] = D.[CoverageId]
GROUP BY E.[EmployeeId], E.[Name]
HAVING COUNT(*) > 1
ORDER BY E.[EmployeeId]

Longest item in each group

I am trying to find which activity took the longest (1) by facility (giving me 6 different activities) and (2) by facility and department (giving me 11 different activities).
This code only gives my one response when
SELECT NOC.FCILTY_ID, NAC.ACTIVITY_ID, NAC.ELAPSED_SECONDS
FROM NAC, NOC
WHERE NAC.OBS_ID=NOC.OBS_ID
AND NAC.ELAPSED_SECONDS IN (SELECT MAX(NAC.ELAPSED_SECONDS) FROM NAC, NOC
GROUP BY NOC.FCILTY_ID)
ORDER BY NOC.FCILTY_ID;
An example of some of the data and the code to retrieve some of the data is given below.
SELECT NAC.OBS_ID, NOC.FCILTY_ID, NOC.DEPT_NO, NAC.ACTIVITY_ID, NAC.ACTIVE_SECONDS, NAC.CAT
FROM NAC, NOC
WHERE NAC.OBS_ID = NOC.OBS_ID;
OBS_ID FCILTY_ID DEPT_NO ACTIVITY_ID ACTIVE_SECONDS CAT
1 A a 132 73.9999584 Motion
2 A a 133 92.000016 Operations
3 A a 134 198.0000288 Operations
4 A a 135 54.9999936 Error/Defect
5 A a 136 79.0000128 Error/Defect
6 A a 137 57.9999744 Operations
Use a CTE to add a ROW_NUMBER for each desired grouping,rnf for facility and rnfd for facility and department
WITH CTE AS
(SELECT NAC.OBS_ID, NOC.FCILTY_ID, NOC.DEPT_NO, NAC.ACTIVITY_ID, NAC.ACTIVE_SECONDS, NAC.CAT,
ROW_NUMBER() OVER(PARTITION BY NOC.FCILTY_ID ORDER BY ACTIVE_SECONDS DESC) as rnf,
ROW_NUMBER() OVER(PARTITION BY NOC.FCILTY_ID,NOC.DEPT_NO ORDER BY ACTIVE_SECONDS DESC) as rnfd
FROM NAC, NOC
WHERE NAC.OBS_ID = NOC.OBS_ID)
SELECT NAC.OBS_ID, NOC.FCILTY_ID, NOC.DEPT_NO, NAC.ACTIVITY_ID, NAC.ACTIVE_SECONDS, NAC.CAT FROM CTE
WHERE rnf=1 OR rnfd =1
EDIT
For 2 separate queries
..WHERE rnf=1
..WHERE rnfd =1
You need to join to a subquery. Here is one way.
with maxInterval as
(select cat theCat, max(active_seconds) longestTime
from etc
group by cat
)
select whatever
from yourTables join maxInterval on cat = theCat
and active_seconds = longestTime

SQL to retrieve one of several rows grouped by 1 field while retaining multiple columns

I have not been able to wrap my head around how to create a GROUP BY query with SQL in Access 2013 to accomplish retrieving just 1 row per "group" based on the MAX value in one of the columns for multi-column data while retaining all the columns.
I want to return only 1 [ClientID] row with the MAX of [SupervisorID]. I don't care which [DispositionID] is returned.
Here is (part) of what I have for data (there are many more columns than this that I must retain):
ClientID LastName FirstName Sex DispositionID SupervisorID
6263 Junk Danny M 1222 322
6263 Junk Danny M 1223 Null
6263 Junk Danny M 1220 322
6260 Fake Angie F 1208 206
6244 Junker Adam M 1153 322
6244 Junker Adam M 1148 Null
6257 Junkly Summer F 1218 Null
6257 Junkly Summer F 1179 Null
[ClientID], [LastName], [FirstName], and [Sex] are pulled from Table A.
[DispositionID] is pulled from Table B (linked on [ClientID]).
[SupervisorID] is pulled from Table C (linked on [DispositionID]).
The target query would return this:
ClientID LastName FirstName Sex DispositionID SupervisorID
6263 Junk Danny M 1220 322
6260 Fake Angie F 1208 206
6244 Junker Adam M 1153 322
6257 Junkly Summer F 1179 Null
Once I have the target query, I'll have to do a UNION of that query with another (similar) query, although I imagine that will not influence how to accomplish this first query.
Once the two queries are UNIONed, I'll have to filter that query to show just rows with either a specific [SupervisorID] (e.g., 322) OR a Null [SupervisorID].
I also tried the SELECT DISTINCT but that still returned multiple [ClientID]s instead of just one.
Try:
select x.clientid,
x.lastname,
x.firstname,
x.sex,
min(y.dispositionid) as dispositionid,
x.supervisorid
from (select clientid,
lastname,
firstname,
sex,
max(supervisorid) as supervisorid
from tbl
group by clientid, lastname, firstname, sex) x
inner join tbl y
on x.clientid = y.clientid
and x.supervisorid = y.supervisorid
group by x.clientid, x.lastname, x.firstname, x.sex, x.supervisorid
union all
select x.clientid,
x.lastname,
x.firstname,
x.sex,
min(x.dispositionid),
x.supervisorid
from tbl x
where not exists (select 1
from tbl y
where y.supervisorid is not null
and y.clientid = x.clientid)
group by x.clientid, x.lastname, x.firstname, x.sex, x.supervisorid
Fiddle: http://sqlfiddle.com/#!2/1d315/12/0
(matches your desired output)
This is a bit tricky because of the NULL values. This should work:
select d.*
from data as d inner join
(select clientId, max(dispositionid) as maxd, max(superviserid) as maxs
from data
group by clientid
) as c
on d.clientId = c.clientId and
(d.supervisorId = c.maxs or
c.maxs is null and d.dispositionid = c.maxd
);

SQL find entire row where only 2 columns values

I'm attempting to
select columns Age, Height, House_number, Street
from my_table
where count(combination of House_number, Street)
occurs more than once.
My table looks like this
Age, Height, House_number, Street
15 178 6 Mc Gill Crst
85 166 6 Mc Gill Crst
85 166 195 Mc Gill Crst
18 151 99 Moon Street
52 189 14a Grimm Lane
My desired outcome looks like this
Age, Height, House_number, Street
15 178 6 Mc Gill Crst
85 166 6 Mc Gill Crst
Stuck!
The best way to do this is with window functions, assuming your database supports them:
select columns Age, Height, House_number, Street
from (select t.*, count(*) over (partition by house_number, street) as cnt
from my_table t
) t
where cnt > 1
This is using a windows function (also called analytic function) in Oracle. The expression count(*) over (partition by house_number, street) is counting the number of rows for each house_number and street combination. It is kind of like doing a group by, but it adds the count to each row rather than combining multiple rows into one.
Once you have that, it is easy to simply choose the rows where the value is greater than 1.
Since you haven't mentioned the RDBMS you are using, the query below will amost work on most RDBMS.
SELECT *
FROM tableName
WHERE (House_number, Street) IN
(
SELECT House_number, STREET
FROM tableName
GROUP BY House_number, STREET
HAVING COUNT(*) >= 2
)
SQLFiddle Demo
Sounds like you need a NOT DISTINCT. The following might give you what you need: Multiple NOT distinct
If you do not have windowing function, then you can use a subquery with a JOIN. The subquery gets the list of the house_number and street that have a count of greater than 1, this result is then used to join back to your table:
select t1.age,
t1.height,
t1.house_number,
t1.street
from my_table t1
inner join
(
select house_number, street
from my_table
group by house_number, street
having count(*) > 1
) t2
on t1.house_number = t2.house_number
and t1.street = t2.street
See SQL Fiddle with Demo

Sub Query having group by and count

tbl_Offer
OFID bigint
Offer_Text text
OFID Offer_Text
------- ----------
1014 Test1
1015 Test2
tbl_TransactionDishout
offerNo TerminalID Created
---------------------------------
1014 170924690436418 2010-05-25 12:51:59.547
tblVTSettings
gid mid tid
-----------------------
50 153 119600317313328
104 158 160064024922223
76 162 256674529511898
1111 148 123909123909123
These are the three tables.
Now I want the information of all deals (offers) separated by schools (look gid where TerminalID in (50,76,104)).
These are the three schools: (50,76,104)
The o/p should have these fields:
OfferID(OFID), School the offer is for, Offer_Text, Number of time the offer is.
The query may be somehow like this:
SELECT OFID, Offer_Text,
Counter =
(
SELECT COUNT(*) FROM dbo.tbl_TransactionDishout t
WHERE t.OfferNo = CAST(OFID AS NVARCHAR(30))
and t.TerminalID in
(select TID from tblVTSettings where gid in (50,76,104))
)
FROM dbo.tbl_Offer
Where EXISTS (SELECT * FROM dbo.tbl_TransactionDishout
WHERE OfferNo = CAST(OFID AS NVARCHAR(30)))
Please try this.
SELECT to.OFID
,ts.gid AS 'School the offer is for'
,to.Offer_Text
,COUNT(to.OFID) AS 'Number of time the offer is'
FROM tbl_Offer to
JOIN tbl_TransactionDishout tt
ON to.OFID = tt.offerNo
JOIN tblVTSettings ts
ON ts.tid = tt.TerminalID
Try:
SELECT o.OFID,
s.gid,
o.Offer_Text,
count(*) over (partition by o.OFID) number_schools,
count(*) over (partition by s.gid) number_offers
FROM tbl_Offer o
JOIN tbl_TransactionDishout d ON o.OFID = d.offerNo
JOIN tblVTSettings s ON s.tid = d.TerminalID