Return Data based Relevant Dates - sql

I have an occupancy table and a pay history table. I want to return the state that the employee is in from the occupancy at the time of the relevant pay.
Occupancy Table
Emp#|Commence Date|State
-----|-------------|----
101 | 1/01/2016 | VIC
101 | 1/04/2016 | NSW
101 | 1/08/2016 | ACT
Pay History Table
Emp#|Pay Date
----|--------
101 |15/01/2016
101 |15/02/2016
101 |15/03/2016
101 |15/04/2016
101 |15/05/2016
101 |15/06/2016
101 |15/07/2016
101 |15/08/2016
101 |15/09/2016
I'm wanting to return the following
Emp#|:Pay Date:|State
----|----------|-----
101 |15/01/2016|VIC
101 |15/02/2016|VIC
101 |15/03/2016|VIC
101 |15/04/2016|NSW
101 |15/05/2016|NSW
101 |15/06/2016|NSW
101 |15/07/2016|NSW
101 |15/08/2016|ACT
101 |15/09/2016|ACT
Can someone assist, please

You need to generate the end_date in the occupancy table in a subquery; the lead() function is perfect for this purpose. I use it with all three arguments - the third argument gives a "default" date which I chose arbitrarily as 15 December 2099 for the "current" status. Then its a simple join on empno and a between condition on dates.
I assume you have more than one empno in your data, so I accommodated that. Then: I don't know if # is legal in Oracle column names, but I didn't want to try; I changed to empno. And names definitely can't have spaces in them unless you quote the names, which has many disadvantages; I worked around that too.
with
occupancy ( empno, commence_date, state ) as (
select 101, to_date('1/01/2016', 'dd/mm/yyyy'), 'VIC' from dual union all
select 101, to_date('1/04/2016', 'dd/mm/yyyy'), 'NSW' from dual union all
select 101, to_date('1/08/2016', 'dd/mm/yyyy'), 'ACT' from dual
),
pay_history ( empno, pay_date ) as (
select 101, to_date('15/01/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/02/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/03/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/04/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/05/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/06/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/07/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/08/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/09/2016', 'dd/mm/yyyy') from dual
)
-- end of test data (not part of the SQL query); query begins below this line
select p.empno, p.pay_date, o.state
from pay_history p inner join (
select empno, commence_date,
lead(commence_date, 1, date '2099-12-15')
over (partition by empno order by commence_date) as end_date,
state
from occupancy ) o
on p.empno = o.empno
and p.pay_date between o.commence_date and o.end_date
order by empno, pay_date -- if needed
;
Output:
EMPNO PAY_DATE STATE
----- ---------- -----
101 15/01/2016 VIC
101 15/02/2016 VIC
101 15/03/2016 VIC
101 15/04/2016 NSW
101 15/05/2016 NSW
101 15/06/2016 NSW
101 15/07/2016 NSW
101 15/08/2016 ACT
101 15/09/2016 ACT
9 rows selected.

Negotiating against myself here :-) I am posting this as a separate Answer, rather than editing my earlier one, because this is indeed a different answer.
Please refer to my other Answer for the input data and sample output - they are the same. Only the query is different. Instead of a join, we can UNION ALL the two tables (with some necessary adjustments: add a null column for state to the pay_history table, and a flag of 0 for the occupancy table and 1 for the pay_history table); then use the last_value() analytic function on the resulting union, and filter out the rows from the occupancy table in the outermost query. This may be quite a bit faster than the join-based solution.
select empno, dt as pay_date, state
from (
select empno, dt, flag,
last_value(state ignore nulls)
over (partition by empno order by dt, flag) as state
from (
select empno, commence_date as dt, state, 0 as flag
from occupancy
union all
select empno, pay_date, null, 1
from pay_history
)
)
where flag = 1
order by empno, pay_date -- if needed
;

Related

How do I find entries that contains ALL values of a particular attribute from another table using SQL queries in oracle

Sorry if the question was not phrased awkwardly, I didn't know how to put it without an example.
Given I have a 2 tables, Watch and WorkIn:
Watch: {emailaddress, videoID)
WorkIn: {videoID, castname}
How do I find for example, the email address of people who have watched ALL of 'David' products?
I was thinking of using GROUP BY and ALL but I'm rather new to sql querying and don't really know how to put it all together.
Two tables - one with e-mails and videos watched by e-mail owner and the other with videos and cast names worked in them. As I understand the e-mail owners could watch and cast names could participate in multiple videos with cross over possibilities. Something like in this two tables (as sample data):
WITH
t_watch AS
(
Select 'aaa.aaa#aaa.aa' "E_MAIL", 101 "VIDEO_ID" From Dual Union All
Select 'ccc.ccc#ccc.cc' "E_MAIL", 101 "VIDEO_ID" From Dual Union All
Select 'aaa.aaa#aaa.aa' "E_MAIL", 102 "VIDEO_ID" From Dual Union All
Select 'bbb.bbb#bbb.bb' "E_MAIL", 102 "VIDEO_ID" From Dual Union All
Select 'ccc.ccc#ccc.cc' "E_MAIL", 201 "VIDEO_ID" From Dual Union All
Select 'ddd.ddd#ddd.dd' "E_MAIL", 101 "VIDEO_ID" From Dual Union All
Select 'ddd.ddd#ddd.dd' "E_MAIL", 201 "VIDEO_ID" From Dual Union All
Select 'aaa.aaa#aaa.aa' "E_MAIL", 301 "VIDEO_ID" From Dual
),
t_work_in AS
(
Select 101 "VIDEO_ID", 'Chriss' "CAST_NAME" From Dual Union All
Select 101 "VIDEO_ID", 'David' "CAST_NAME" From Dual Union All
Select 101 "VIDEO_ID", 'Annie' "CAST_NAME" From Dual Union All
Select 102 "VIDEO_ID", 'Chriss' "CAST_NAME" From Dual Union All
Select 201 "VIDEO_ID", 'Chriss' "CAST_NAME" From Dual Union All
Select 201 "VIDEO_ID", 'David' "CAST_NAME" From Dual Union All
Select 201 "VIDEO_ID", 'Annie' "CAST_NAME" From Dual Union All
Select 202 "VIDEO_ID", 'Annie' "CAST_NAME" From Dual Union All
Select 301 "VIDEO_ID", 'Robert' "CAST_NAME" From Dual
),
The question is about getting the e-mail address of a person watched all the videos participated by specific cast name. To get the answer you should get the lists of:
unique ordered video ids watched by particular e-mail owner
unique ordered video ids that particular cast name worked in
To do that create two ctes - cte_watched_by and cte_worked_in:
cte_worked_in AS
(
SELECT DISTINCT
wk.CAST_NAME,
LISTAGG(wk.VIDEO_ID, ', ') WITHIN GROUP (Order By wk.VIDEO_ID) OVER(Partition By wk.CAST_NAME) "WORKED_LIST",
Count(DISTINCT wk.VIDEO_ID) OVER(Partition By wk.CAST_NAME) "COUNT_IDS_WORKED"
FROM
(Select DISTINCT wrk.CAST_NAME, wrk.VIDEO_ID From t_work_in wrk Left Join t_watch wtc ON(wtc.VIDEO_ID = wrk.VIDEO_ID) Order By wrk.VIDEO_ID) wk
),
cte_watched_by AS
(
SELECT DISTINCT
wb.E_MAIL,
LISTAGG(wb.VIDEO_ID, ', ') WITHIN GROUP (Order By wb.VIDEO_ID) OVER(Partition By wb.E_MAIL) "WATCHED_LIST",
Count(DISTINCT wb.VIDEO_ID) OVER(Partition By wb.E_MAIL) "COUNT_IDS_WATCHED"
FROM
(Select DISTINCT wtc.E_MAIL, wtc.VIDEO_ID From t_watch wtc Left Join t_work_in wrk ON(wrk.VIDEO_ID = wtc.VIDEO_ID) Order By wtc.VIDEO_ID) wb
)
This is what we have got so far:
cte_worked_in
CAST_NAME
WORKED_LIST
COUNT_IDS_WORKED
Chriss
101, 102, 201
3
Robert
301
1
Annie
101, 201, 202
3
David w101, 201
2
cte_watched_by
E_MAIL
WATCHED_LIST
COUNT_IDS_WATCHED
ccc.ccc#ccc.cc
101, 201
2
ddd.ddd#ddd.dd
101, 201
2
bbb.bbb#bbb.bb
102
1
aaa.aaa#aaa.aa
101, 102, 301
3
Now we can join those ctes and get the answer - below is main SQL:
SELECT DISTINCT
wk.CAST_NAME "WORKED_IN",
wk.COUNT_IDS_WORKED "COUNT_IDS_WORKED",
wk.WORKED_LIST,
wb.E_MAIL "WATCHED_BY",
wb.COUNT_IDS_WATCHED "COUNT_IDS_WATCHED",
wb.WATCHED_LIST
FROM
cte_worked_in wk
INNER JOIN
cte_watched_by wb ON
(
(wk.COUNT_IDS_WORKED = wb.COUNT_IDS_WATCHED And wk.WORKED_LIST = wb.WATCHED_LIST)
OR
(wk.COUNT_IDS_WORKED = 1 And wb.COUNT_IDS_WATCHED > 1 And InStr(wb.WATCHED_LIST, wk.WORKED_LIST) > 0)
OR
(wk.COUNT_IDS_WORKED = 2 And wb.COUNT_IDS_WATCHED > 2 And InStr(wb.WATCHED_LIST, SubStr(wk.WORKED_LIST, 1, 3)) > 0 And InStr(wb.WATCHED_LIST, SubStr(wk.WORKED_LIST, 6, 3)) > 0)
)
... and here is resulting dataset:
WORKED_IN
COUNT_IDS_WORKED
WORKED_LIST
WATCHED_BY
COUNT_IDS_WATCHED
WATCHED_LIST
Robert
1
301
aaa.aaa#aaa.aa
3
101, 102, 301
David
2
101, 201
ddd.ddd#ddd.dd
2
101, 201
David
2
101, 201
ccc.ccc#ccc.cc
2
101, 201
The ORs within ON expression of INNER JOIN serve to cover situations where somebody watch a lot of movies with cast names that works in just few movies. It is the weakest part of this answer.
The resulting dataset could be filtered (if needed) to show just results of a particular cast name and/or e-mail.
Regards...
Wasn't able to test this but I think this should work:
SELECT emailaddress, count(*)
FROM Watch w inner join WorkIn wi on w.videoID = wi.videoID
WHERE wi.castName = 'Chris Evans'
GROUP BY emailaddress
HAVING count(*) = (select count(*) from WorkIn where wi.castName = 'Chris Evans')
The group by will count for each emailaddress the number of videos seen with Chris Evans.
HAVING is like a WHERE clause on the aggregated result (so after the GROUP BY). Only when this count is equal to the total number of videos in which Chris Evans plays, the emailaddress is returned.

Retrieve single row from a query

I am creating a query to find salary details of an employee with date_to as '31-dec-4712' (Latest).
But, If date_to is 31-dec-4712 for two rows for an employee then the one with status 'Approved' should be picked in other cases when only
single rows comes then that should be returned as is.
I have created the below query for the salary details. need help with teh above scenario
select distinct PAPF.EMPLOYEE_NUMBER ,
TO_CHAR (EMP_DOJ (PAPF.PERSON_ID),'DD-MON-YYYY' ) DOJ ,
TO_CHAR(HR_EMPLOYEE_ORIGINAL_DOJ(PAPF.EMPLOYEE_NUMBER,42) ,'DD- MON-YYYY' ) ORIGINAL_DOJ,
PPP.CHANGE_DATE,
PPP.DATE_TO,
PPP.PROPOSED_SALARY_N TOTAL_REMUN,
HR_GENERAL.DECODE_LOOKUP('PER_SAL_PROPOSAL_STATUS',APPROVED) status
from PER_ALL_ASSIGNMENTS_F PAAF,
PER_ALL_PEOPLE_F PAPF,
PER_PAY_PROPOSALS PPP
where 1 = 1
and PAPF.PERSON_ID = PAAF.PERSON_ID
and PAPF.BUSINESS_GROUP_ID = 21
and PAPF.CURRENT_EMPLOYEE_FLAG = 'Y'
and papf.employee_number = '109575'
and :P_DATE1 between PAAF.EFFECTIVE_START_DATE
and PAAF.EFFECTIVE_END_DATE
and :P_DATE1 between PAPF.EFFECTIVE_START_DATE
and PAPF.EFFECTIVE_END_DATE
and :P_DATE1 between PPP.CHANGE_DATE(+)
and NVL(PPP.DATE_TO, HR_GENERAL.END_OF_TIME)
and PPP.ASSIGNMENT_ID(+) = PAAF.ASSIGNMENT_ID
order by TO_NUMBER(PAPF.EMPLOYEE_NUMBER);
Emp_num DOJ ORIGINAL_DOJ CHANGE_DATE DATE_TO TOTAL_REMUN STATUS
109575 01-DEC-2016 24-JUL-2014 01-MAY-19 31-DEC-12 250000 Proposed
109575 01-DEC-2016 24-JUL-2014 01-APR-19 31-DEC-12 100000 Approved
You can use conditional ordering for each employee separately, like here:
-- sample rows
with salaries (emp_id, name, salary, date_to, status) as (
select 1001, 'Orange', 1400, date '4712-12-31', 'Rejected' from dual union all
select 1001, 'Orange', 1200, date '4712-12-31', 'Approved' from dual union all
select 1002, 'Red', 2500, date '4712-12-31', 'Approved' from dual union all
select 1003, 'Blue', 2700, date '4712-12-31', 'Proposed' from dual union all
select 1004, 'Green', 2200, date '2012-07-31', 'Approved' from dual union all
select 1005, 'White', 1200, date '4712-12-31', 'Approved' from dual union all
select 1005, 'White', 1300, date '4712-12-31', 'Rejected' from dual )
-- end of sample data
select emp_id, name, salary, date_to, status
from (
select s.*,
row_number() over (partition by emp_id
order by case status when 'Approved' then 1 end) rn
from salaries s
where date_to = date '4712-12-31')
where rn = 1
Result:
EMP_ID NAME SALARY DATE_TO STATUS
---------- ------ ---------- ----------- --------
1001 Orange 1200 4712-12-31 Approved
1002 Red 2500 4712-12-31 Approved
1003 Blue 2700 4712-12-31 Proposed
1005 White 1200 4712-12-31 Approved
If the STATUS takes only two values, "Approved" and "Proposed", you can order by STATUS and fetch the first row. If you have (or in the future you'll have) more statuses and you want to define a priority add a column in the select with a "CASE" that assigns to each status the corresponding priority. Then you order by this column and you fetch the first row....

Month counts between dates

I have the below table. I need to count how many ids were active in a given month. So thinking I'll need to create a row for each id that was active during that month so that id can be counted each month. A row should be generated for a term_dt during that month.
active_dt term_dt id
1/1/2018 101
1/1/2018 5/15/2018 102
3/1/2018 6/1/2018 103
1/1/2018 4/25/18 104
Apparently this is a "count number of overlapping intervals" problem. The algorithm goes like this:
Create a sorted list of all start and end points
Calculate a running sum over this list, add one when you encounter a start and subtract one when you encounter an end
If two points are same then perform subtractions first
You will end up with list of all points where the sum changed
Here is a rough outline of the query. It is for SQL Server but could be ported to any RDBMS that supports window functions:
WITH cte1(date, val) AS (
SELECT active_dt, 1 FROM #t AS t
UNION ALL
SELECT COALESCE(term_dt, '2099-01-01'), -1 FROM #t AS t
-- if end date is null then assume the row is valid indefinitely
), cte2 AS (
SELECT date, SUM(val) OVER(ORDER BY date, val) AS rs
FROM cte1
)
SELECT YEAR(date) AS YY, MONTH(date) AS MM, MAX(rs) AS MaxActiveThisYearMonth
FROM cte2
GROUP BY YEAR(date), MONTH(date)
DB Fiddle
I was toying with a simpler query, that seemed to do the trick, for Oracle:
with candidates (month_start) as (
select to_date ('2018-' || column_value || '-01','YYYY-MM-DD')
from
table
(sys.odcivarchar2list('01','02','03','04','05',
'06','07','08','09','10','11','12'))
), sample_data (active_dt, term_dt, id) as (
select to_date('01/01/2018', 'MM/DD/YYYY'), null, 101 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('05/15/2018', 'MM/DD/YYYY'), 102 from dual
union select to_date('03/01/2018', 'MM/DD/YYYY'),
to_date('06/01/2018', 'MM/DD/YYYY'), 103 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('04/25/2018', 'MM/DD/YYYY'), 104 from dual
)
select c.month_start, count(1)
from candidates c
join sample_data d
on c.month_start between d.active_dt and nvl(d.term_dt,current_date)
group by c.month_start
order by c.month_start
An alternative solution would be to use a hierarchical query, e.g.:
WITH your_table AS (SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, NULL term_dt, 101 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('15/05/2018', 'dd/mm/yyyy') term_dt, 102 ID FROM dual UNION ALL
SELECT to_date('01/03/2018', 'dd/mm/yyyy') active_dt, to_date('01/06/2018', 'dd/mm/yyyy') term_dt, 103 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('25/04/2018', 'dd/mm/yyyy') term_dt, 104 ID FROM dual)
SELECT active_month,
COUNT(*) num_active_ids
FROM (SELECT add_months(TRUNC(active_dt, 'mm'), -1 + LEVEL) active_month,
ID
FROM your_table
CONNECT BY PRIOR ID = ID
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= FLOOR(months_between(coalesce(term_dt, SYSDATE), active_dt)) + 1)
GROUP BY active_month
ORDER BY active_month;
ACTIVE_MONTH NUM_ACTIVE_IDS
------------ --------------
01/01/2018 3
01/02/2018 3
01/03/2018 4
01/04/2018 4
01/05/2018 3
01/06/2018 2
01/07/2018 1
01/08/2018 1
01/09/2018 1
01/10/2018 1
Whether this is more or less performant than the other answers is up to you to test.

Grouping data by name and date ranges

I have data in my oracle table where I have names and date rages as following:
Name From To
Lopes, Janine 07-Jun-17 16-Jul-17
Lopes, Janine 17-Jul-17 23-Jul-17
Lopes, Janine 24-Jul-17 31-Aug-17
Baptista, Maria 23-Dec-16 19-Feb-17
Deyak,Sr, Thomas 22-Jan-17 18-Apr-17
Deyak,Sr, Thomas 27-Apr-17 14-May-17
Deyak,Sr, Thomas 15-May-17 21-May-17
Deyak,Sr, Thomas 22-May-17 28-May-17
Deyak,Sr, Thomas 29-May-17 31-May-17
Serrentino, Joyce 18-Mar-17 30-Apr-17
More, Cathleen 30-Jul-17 13-Aug-17
More, Cathleen 14-Aug-17 20-Aug-17
More, Cathleen 21-Aug-17 27-Aug-17
More, Cathleen 28-Aug-17 03-Sep-17
More, Cathleen 04-Sep-17 10-Sep-17
More, Cathleen 11-Sep-17 24-Sep-17
Barrows, Michael 30-Jan-17 19-Mar-17
Barrows, Michael 20-Mar-17 26-Mar-17
Barrows, Michael 27-Mar-17 02-Apr-17
Barrows, Michael 03-Apr-17 07-Apr-17
Mostly for one user the to date is one greater than from date and is continuous but in some cases there is break the data so my output should look like this:
Name From To
Lopes, Janine 07-Jun-17 31-Aug-17
Baptista, Maria 23-Dec-16 19-Feb-17
Deyak,Sr, Thomas 22-Jan-17 18-Apr-17
Deyak,Sr, Thomas 27-Apr-17 31-May-17
Serrentino, Joyce 18-Mar-17 30-Apr-17
More, Cathleen 30-Jul-17 24-Sep-17
Barrows, Michael 30-Jan-17 07-Apr-17
If I do min(from) and max(to) I loose some records like for Thomas.
How should I write sql to get the data is I require.
In Oracle 12.1 and above, the MATCH_RECOGNIZE clause does quick work of such requirements. I am using the same setup and simulated data (WITH clause) from my other answer, and the output is also the same.
select name, date_fr, date_to
from inputs
match_recognize(
partition by name
order by date_fr
measures a.date_fr as date_fr,
last(date_to) as date_to
pattern ( a b* )
define b as date_fr = prev(date_to) + 1
)
;
This can be solved nicely with the Tabibitosan method.
Preparation:
alter session set nls_date_format = 'dd-Mon-rr';
Session altered.
;
Query (including simulated inputs for convenience):
with
inputs ( name, date_fr, date_to ) as (
select 'Lopes, Janine' , to_date('07-Jun-17'), to_date('16-Jul-17') from dual union all
select 'Lopes, Janine' , to_date('17-Jul-17'), to_date('23-Jul-17') from dual union all
select 'Lopes, Janine' , to_date('24-Jul-17'), to_date('31-Aug-17') from dual union all
select 'Baptista, Maria' , to_date('23-Dec-16'), to_date('19-Feb-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('22-Jan-17'), to_date('18-Apr-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('27-Apr-17'), to_date('14-May-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('15-May-17'), to_date('21-May-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('22-May-17'), to_date('28-May-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('29-May-17'), to_date('31-May-17') from dual union all
select 'Serrentino, Joyce', to_date('18-Mar-17'), to_date('30-Apr-17') from dual union all
select 'More, Cathleen' , to_date('30-Jul-17'), to_date('13-Aug-17') from dual union all
select 'More, Cathleen' , to_date('14-Aug-17'), to_date('20-Aug-17') from dual union all
select 'More, Cathleen' , to_date('21-Aug-17'), to_date('27-Aug-17') from dual union all
select 'More, Cathleen' , to_date('28-Aug-17'), to_date('03-Sep-17') from dual union all
select 'More, Cathleen' , to_date('04-Sep-17'), to_date('10-Sep-17') from dual union all
select 'More, Cathleen' , to_date('11-Sep-17'), to_date('24-Sep-17') from dual union all
select 'Barrows, Michael' , to_date('30-Jan-17'), to_date('19-Mar-17') from dual union all
select 'Barrows, Michael' , to_date('20-Mar-17'), to_date('26-Mar-17') from dual union all
select 'Barrows, Michael' , to_date('27-Mar-17'), to_date('02-Apr-17') from dual union all
select 'Barrows, Michael' , to_date('03-Apr-17'), to_date('07-Apr-17') from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select name, min(date_fr) as date_fr, max(date_to) as date_to
from ( select name, date_fr, date_to,
date_to - sum( date_to - date_fr + 1 ) over (partition by name
order by date_fr) as gr
from inputs
)
group by name, gr
order by name, date_fr
;
Output:
NAME DATE_FR DATE_TO
----------------- --------- ---------
Baptista, Maria 23-Dec-16 19-Feb-17
Barrows, Michael 30-Jan-17 07-Apr-17
Deyak,Sr, Thomas 22-Jan-17 18-Apr-17
Deyak,Sr, Thomas 27-Apr-17 31-May-17
Lopes, Janine 07-Jun-17 31-Aug-17
More, Cathleen 30-Jul-17 24-Sep-17
Serrentino, Joyce 18-Mar-17 30-Apr-17
7 rows selected

Finding missing dates in a sequence

I have following table with ID and DATE
ID DATE
123 7/1/2015
123 6/1/2015
123 5/1/2015
123 4/1/2015
123 9/1/2014
123 8/1/2014
123 7/1/2014
123 6/1/2014
456 11/1/2014
456 10/1/2014
456 9/1/2014
456 8/1/2014
456 5/1/2014
456 4/1/2014
456 3/1/2014
789 9/1/2014
789 8/1/2014
789 7/1/2014
789 6/1/2014
789 5/1/2014
789 4/1/2014
789 3/1/2014
In this table, I have three customer ids, 123, 456, 789 and date column which shows which month they worked.
I want to find out which of the customers have gap in their work.
Our customers work record is kept per month...so, dates are monthly..
and each customer have different start and end dates.
Expected results:
ID First_Absent_date
123 10/01/2014
456 06/01/2014
To get a simple list of the IDs with gaps, with no further details, you need to look at each ID separately, and as #mikey suggested you can count the number of months and look at the first and last date to see if how many months that spans.
If your table has a column called month (since date isn't allowed unless it's a quoted identifier) you could start with:
select id, count(month), min(month), max(month),
months_between(max(month), min(month)) + 1 as diff
from your_table
group by id
order by id;
ID COUNT(MONTH) MIN(MONTH) MAX(MONTH) DIFF
---------- ------------ ---------- ---------- ----------
123 8 01-JUN-14 01-JUL-15 14
456 7 01-MAR-14 01-NOV-14 9
789 7 01-MAR-14 01-SEP-14 7
Then compare the count with the month span, in a having clause:
select id
from your_table
group by id
having count(month) != months_between(max(month), min(month)) + 1
order by id;
ID
----------
123
456
If you can actually have multiple records in a month for an ID, and/or the date recorded might not be the start of the month, you can do a bit more work to normalise the dates:
select id,
count(distinct trunc(month, 'MM')),
min(trunc(month, 'MM')),
max(trunc(month, 'MM')),
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1 as diff
from your_table
group by id
order by id;
select id
from your_table
group by id
having count(distinct trunc(month, 'MM')) !=
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1
order by id;
Oracle Setup:
CREATE TABLE your_table ( ID, "DATE" ) AS
SELECT 123, DATE '2015-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-06-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-05-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-04-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-11-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-10-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-03-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-03-01' FROM DUAL;
Query:
SELECT ID,
MIN( missing_date )
FROM (
SELECT ID,
CASE WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
= ADD_MONTHS( "DATE", 1 ) THEN NULL
WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
IS NULL THEN NULL
ELSE ADD_MONTHS( "DATE", 1 )
END AS missing_date
FROM your_table
)
GROUP BY ID
HAVING COUNT( missing_date ) > 0;
Output:
ID MIN(MISSING_DATE)
---------- -------------------
123 2014-10-01 00:00:00
456 2014-06-01 00:00:00
You could use a Lag() function to see if records have been skipped for a particular date or not.Lag() basically helps in comparing the data in current row with previous row. So if we order by DATE, we could easily compare and find any gaps.
select * from
(
select ID,DATE_, case when DATE_DIFF>1 then 1 else 0 end comparison from
(
select ID, DATE_ ,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;
This groups all the entries by id, and then arranges the records by date. If a customer is always present, there would not be a gap in his date. So anyone who has a date difference greater than 1 had a gap. You could tweak this as per your requirement.
EDIT : Just observed that you are storing data in mm/dd/yyyy format, when I closely observed above answers.You are storing only first date of every month. So, the above query can be tweaked as :
select * from
(
select ID,DATE_,PREV_DATE,last_day(PREV_DATE)+1 ABSENT_DATE, case when DATE_DIFF>31 then 1 else 0 end comparison from
(
select ID, DATE_ ,LAG(DATE_,1) OVER (PARTITION BY ID ORDER BY DATE_) PREV_DATE,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;