Longest item in each group - sql

I am trying to find which activity took the longest (1) by facility (giving me 6 different activities) and (2) by facility and department (giving me 11 different activities).
This code only gives my one response when
SELECT NOC.FCILTY_ID, NAC.ACTIVITY_ID, NAC.ELAPSED_SECONDS
FROM NAC, NOC
WHERE NAC.OBS_ID=NOC.OBS_ID
AND NAC.ELAPSED_SECONDS IN (SELECT MAX(NAC.ELAPSED_SECONDS) FROM NAC, NOC
GROUP BY NOC.FCILTY_ID)
ORDER BY NOC.FCILTY_ID;
An example of some of the data and the code to retrieve some of the data is given below.
SELECT NAC.OBS_ID, NOC.FCILTY_ID, NOC.DEPT_NO, NAC.ACTIVITY_ID, NAC.ACTIVE_SECONDS, NAC.CAT
FROM NAC, NOC
WHERE NAC.OBS_ID = NOC.OBS_ID;
OBS_ID FCILTY_ID DEPT_NO ACTIVITY_ID ACTIVE_SECONDS CAT
1 A a 132 73.9999584 Motion
2 A a 133 92.000016 Operations
3 A a 134 198.0000288 Operations
4 A a 135 54.9999936 Error/Defect
5 A a 136 79.0000128 Error/Defect
6 A a 137 57.9999744 Operations

Use a CTE to add a ROW_NUMBER for each desired grouping,rnf for facility and rnfd for facility and department
WITH CTE AS
(SELECT NAC.OBS_ID, NOC.FCILTY_ID, NOC.DEPT_NO, NAC.ACTIVITY_ID, NAC.ACTIVE_SECONDS, NAC.CAT,
ROW_NUMBER() OVER(PARTITION BY NOC.FCILTY_ID ORDER BY ACTIVE_SECONDS DESC) as rnf,
ROW_NUMBER() OVER(PARTITION BY NOC.FCILTY_ID,NOC.DEPT_NO ORDER BY ACTIVE_SECONDS DESC) as rnfd
FROM NAC, NOC
WHERE NAC.OBS_ID = NOC.OBS_ID)
SELECT NAC.OBS_ID, NOC.FCILTY_ID, NOC.DEPT_NO, NAC.ACTIVITY_ID, NAC.ACTIVE_SECONDS, NAC.CAT FROM CTE
WHERE rnf=1 OR rnfd =1
EDIT
For 2 separate queries
..WHERE rnf=1
..WHERE rnfd =1

You need to join to a subquery. Here is one way.
with maxInterval as
(select cat theCat, max(active_seconds) longestTime
from etc
group by cat
)
select whatever
from yourTables join maxInterval on cat = theCat
and active_seconds = longestTime

Related

Last record per transaction

I am trying to select the last record per sales order.
My query is simple in SQL Server management.
SELECT *
FROM DOCSTATUS
The problem is that this database has tens of thousands of records, as it tracks all SO steps.
ID SO SL Status Reason Attach Name Name Systemdate
22 951581 3 Processed Customer NULL NULL BW 2016-12-05 13:33:27.857
23 951581 3 Submitted Customer NULL NULL BW 2016-17-05 13:33:27.997
24 947318 1 Hold Customer NULL NULL bw 2016-12-05 13:54:27.173
25 947318 1 Invoices Submit Customer NULL NULL bw 2016-13-05 13:54:27.300
26 947318 1 Ship Customer NULL NULL bw 2016-14-05 13:54:27.440
I would to see the most recent record per the SO
ID SO SL Status Reason Attach Name Name Systemdate
23 951581 4 Submitted Customer NULL NULL BW 2016-17-05 13:33:27.997
26 947318 1 Ship Customer NULL NULL bw 2016-14-05 13:54:27.440
Well I'm not sure how that table has two Name columns, but one easy way to do this is with ROW_NUMBER():
;WITH cte AS
(
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY SO ORDER BY Systemdate DESC)
FROM dbo.DOCSTATUS
)
SELECT ID, SO, SL, Status, Reason, ..., Systemdate
FROM cte WHERE rn = 1;
Also please always reference the schema, even if today everything is under dbo.
I think you can keep it this simple:
SELECT *
FROM DOCSTATUS
WHERE ID IN (SELECT MAX(ID)
FROM DOCSTATUS
GROUP BY SO)
You want only the maximum ID from each SO.
An efficient method with the right index is a correlated subquery:
select t.*
from t
where t.systemdate = (select max(t2.systemdate) from t t2 where t2.so = t.so);
The index is on (so, systemdate).

Count specific duplicates in Oracle

Hi I have problem to count the number of employees (EmpID) with a phone number (PhoneNum) assigned also to some other employee. But only for specific organization (OrgID)
My Oracle tables looks like this:
TABLE OrgEmployees (OrgID, EmpID, ...)
TABLE PhoneNums (ID, EmpID, PhoneNum, ...)
Sample data for the specific organization:
SELECT pn.EmpID, pn.PhoneNum FROM PhoneNums pn
WHERE EmpID IN (SELECT DISTINCT EmpID FROM OrgEmployees oe
WHERE oe.OrgID = 'XY');
EmpID PhoneNum
723 963264
731 963264
973 963276
729 963276
103 963450
725 963450
722 963460
731 963460
722 963462
731 963462
427 995487
295 995487
771 123151
503 123151
721 963265
104 963266
Correct result on above set of data should be 14.
My attempts went like this:
SELECT pn.PhoneNum, count(pn.EmpID) FROM PhoneNums pn
WHERE pn.EmpID IN (SELECT oe.EmpID FROM OrgEmployees oe
WHERE oe.OrgID = 'XY')
GROUP BY pn.PhoneNum
HAVING count (*) > 1
ORDER BY pn.PhoneNum;
But how could I consider if EmpID are the same or not?
Thank you in advance
I think you want count(distinct):
SELECT pn.PhoneNum, COUNT(DISTINCT pn.EmpID)
FROM PhoneNums pn
WHERE pn.EmpID IN (SELECT oe.EmpID
FROM OrgEmployees oe
WHERE oe.OrgID = 'XY'
)
GROUP BY pn.PhoneNum
HAVING COUNT(DISTINCT pn.EmpID) > 1
ORDER BY pn.PhoneNum;
I would be more inclined to write this using JOIN rather than IN:
SELECT pn.PhoneNum, COUNT(DISTINCT pn.EmpID)
FROM PhoneNums pn JOIN
OrgEmployees oe
ON oe.OrgID = 'XY' AND pn.EmpID = oe.EmpID
GROUP BY pn.PhoneNum
HAVING COUNT(DISTINCT pn.EmpID) > 1
ORDER BY pn.PhoneNum;

SQL: Finding duplicate records based on custom criteria

I need to find duplicates based on two tables and based on custom criteria. The following determines whether it's a duplicate, and if so, show only the most recent one:
If Employee Name and all EmployeePolicy CoverageId(s) are an exact match another record, then that's considered a duplicate.
--Employee Table
EmployeeId Name Salary
543 John 54000
785 Alex 63000
435 John 75000
123 Alex 88000
333 John 67000
--EmployeePolicy Table
EmployeePolicyId EmployeeId CoverageId
1 543 8888
2 543 7777
3 785 5555
4 435 8888
5 435 7777
6 123 4444
7 333 8888
8 333 7776
For example, the duplicates in the example above are the following:
EmployeeId Name Salary
543 John 54000
435 John 75000
This is because they are the only ones that have a matching name in the Employee table as well as both have the same exact CoverageIds in the EmployeePolicy table.
Note: EmployeeId 333 also with Name = John is not a match because both of his CoverageIDs are not the same as the other John's CoverageIds.
At first I have been trying to find duplicates the old fashioned way by Grouping records and saying having count(*) > 1, but then quickly realized that it would not work because while in English my criteria defines a duplicate, in SQL the CoverageIDs are different so they are NOT considered duplicates.
By that same accord, I tried something like:
-- Create a TMP table
INSERT INTO #tmp
SELECT *
FROM Employee e join EmployeePolicy ep on e.EmpoyeeId = ep.EmployeeId
SELECT info.*
FROM
(
SELECT
tmp.*,
ROW_NUMBER() OVER(PARTITION BY tmp.Name, tmp.CoverageId ORDER BY tmp.EmployeeId DESC) AS RowNum
FROM #tmp tmp
) info
WHERE
info.RowNum = 1 AND
Again, this does not work because SQL does not see this as duplicates. Not sure how to translate my English definition of duplicate into SQL definition of duplicate.
Any help is most appreciated.
The easiest way is to concatenate the policies into a string. That, alas, is cumbersome in SQL Server. Here is a set-based approach:
with ep as (
select ep.*, count(*) over (partition by employeeid) as cnt
from employeepolicy ep
)
select ep.employeeid, ep2.employeeid
from ep join
ep ep2
on ep.employeeid < ep2.employeeid and
ep.CoverageId = ep2.CoverageId and
ep.cnt = ep2.cnt
group by ep.employeeid, ep2.employeeid, ep.cnt
having count(*) = cnt -- all match
The idea is to match the coverages for different employees. A simple criteria is that the number of coverages need to match. Then, it checks that the number of matching coverages is the actual count.
Note: This puts the employee id pairs in a single row. You can join back to the employees table to get the additional information.
I have not tested the T-SQL but I believe the following should give you the output you are looking for.
;WITH CTE_Employee
AS
(
SELECT E.[Name]
,E.[EmployeeId]
,P.[CoverageId]
,E.[Salary]
FROM Employee E
INNER JOIN EmployeePolicy P ON E.EmployeeId = P.EmployeeId
)
, CTE_DuplicateCoverage
AS
(
SELECT E.[Name]
,E.[CoverageId]
FROM CTE_Employee E
GROUP BY E.[Name], E.[CoverageId]
HAVING COUNT(*) > 1
)
SELECT E.[EmployeeId]
,E.[Name]
,MAX(E.[Salary]) AS [Salary]
FROM CTE_Employee E
INNER JOIN CTE_DuplicateCoverage D ON E.[Name] = D.[Name] AND E.[CoverageId] = D.[CoverageId]
GROUP BY E.[EmployeeId], E.[Name]
HAVING COUNT(*) > 1
ORDER BY E.[EmployeeId]

Selecting n-th to last values

I have a table like so:
id device group
-----------------
1 a 1000
2 a 1000
3 b 1001
4 b 1001
5 b 1001
6 b 1002
8 a 1003
9 a 1003
10 a 1003
11 a 1003
12 b 1004
13 b 1004
All id's and groups are sequential. What I would like is to select id and device based on groups and devices. Think of it as a pagination type selection. Getting the last group is a simple inner selection, but how do I select the second last group, or the third last group - etc.
I tried the row number function like this:
SELECT * FROM
( SELECT *, ROW_NUMBER() OVER (PARTITION BY device ORDER BY group DESC) rn FROM data) tmp
WHERE rn = 1;
.. but changing rn is giving me the previous id, not the previous group.
I would like to end up with a selection that could accomodate these results:
device = a, group = latest:
id device group
10 a 1003
11 a 1003
device = a, group = latest - 1:
id device group
1 a 1000
2 a 1000
Any one know how to accomplish this?
Edit:
Use case is a GPS enabled device in a car, sending data every 30 seconds. Imagine going on a drive today. First you go to the shops, then you go home. the first trip is you driving to the shop. The second trip is you driving back. I want to show those trips on a map, but it means I need to identify your last trip, and then the trip before it - ad infinitum, until you run out of trips.
You can try this approach:
`with x as (
select distinct page
from test_table),
y as (
select x.page
,row_number() over (order by page desc) as row_num
from x)
select test_table.* from test_table join y on y.page = test_table.page
where y.row_num =2`
I will try to explain what I have did here.
The first block(x) returns the distinct groups(pages in my case).
The second block(y) assigns row numbers to the groups in terms of their rank. In this case the ranking is in descending order of the pages.
Finally the third block, selects the desired value for the desired page. In case you want the pen-ultimate page , type rouw_num=2, if third from last use row_num =3 and likewise.
You can play around with the values [here]: http://sqlfiddle.com/#!15/190c06/26
Use dense_rank():
select d.*
from (select d.*, dense_rank() over (order by group_id desc) as seqnum
from data d
where device = 'a'
) d
where seqnum = 2;

dense rank duplicate values oracle

So I am really happy being able to rank results based on effective dates, but currently I'm having an issue where one data element repeats (POD) while another changes based on EFFDT (DEPT).
I only want to rank unique values for Pod, and later Dept. However Pod is based on Dept, which changes more frequently. The below code gives me:
EENBR PodRank POD DeptRank DeptNbr DeptEffdt
100 1 73 1 12420 4/11/2005
100 2 73 2 12560 5/22/2005
100 3 73 3 12501 6/24/2007
200 1 12 1 50768 3/14/2005
200 2 13 2 10949 9/9/2012
300 1 73 1 12450 3/21/2005
300 2 73 2 12471 12/25/2005
300 3 73 3 12581 12/21/2008
300 4 73 4 12585 6/6/2010
300 5 73 5 12432 5/19/2013
SELECT DISTINCT
AL4.FULL_NAME,
AL4.EMPLOYEE_NUMBER,
dense_rank() over (partition by AL4.EMPLOYEE_NUMBER
order by AL3.EFFECTIVE_START_DATE) as POD_RANKING,
AL7.POD_NBR as POD,
row_number() over (partition by AL4.EMPLOYEE_NUMBER
order by AL3.EFFECTIVE_START_DATE) as DEPT_RANKING,
AL3.RECORDVALUE AS DEPT_NUMBER,
AL3.EFFECTIVE_START_DATE AS "DEPT EFFECTIVE DATE"
FROM T1 AL3,
T2 AL4,
T3 AL7
WHERE AL4.PERSON_ID = AL3.PERSON_ID
AND AL4.EMPLOYEE_NUMBER = AL3.EMPLOYEE_NUMBER
AND AL3.RECORDTYPE = 'DEPARTMENT_NUMBER'
AND AL7.DEPT_NBR = AL3.RECORDVALUE
Order By AL4.Employee_Number;
Is there a function that only ranks unique values?
The function you are looking for is the analytic function dense_rank():
dense_rank() over (partition by eenbr order by pod) as ranking
This is the simplest way to get what you want. You can just add it in the select clause of your query.
There's no function for this, but you can get the result when you use nested window functions:
SELECT dt.*,
SUM(flag) OVER (PARTITION BY EMPLOYEE_NUMBER
ORDER BY "DEPT EFFECTIVE DATE") AS POD_RANKING
FROM
(
SELECT
AL4.FULL_NAME,
AL4.EMPLOYEE_NUMBER,
AL7.POD_NBR AS POD,
ROW_NUMBER() OVER (PARTITION BY AL4.EMPLOYEE_NUMBER
ORDER BY AL3.EFFECTIVE_START_DATE) AS DEPT_RANKING,
AL3.RECORDVALUE AS DEPT_NUMBER,
AL3.EFFECTIVE_START_DATE AS "DEPT EFFECTIVE DATE",
CASE WHEN ROW_NUMBER()
OVER (PARTITION BY AL4.EMPLOYEE_NUMBER,AL7.POD_NBR
ORDER BY AL3.EFFECTIVE_START_DATE) = 1 THEN 1 ELSE 0 END AS flag
FROM T1 AL3,
T2 AL4,
T3 AL7
WHERE AL4.PERSON_ID = AL3.PERSON_ID
AND AL4.EMPLOYEE_NUMBER = AL3.EMPLOYEE_NUMBER
AND AL3.RECORDTYPE = 'DEPARTMENT_NUMBER'
AND AL7.DEPT_NBR = AL3.RECORDVALUE
) dt
ORDER BY AL4.Employee_Number;
Edit:
Ok, I noticed this is a overly complex version of a simple DENSE_RANK with different order, shortly before Gordon posted his answer :-)
dense_rank() over (partition by AL4.EMPLOYEE_NUMBER order by AL7.POD_NBR)