Grouping data by name and date ranges - sql

I have data in my oracle table where I have names and date rages as following:
Name From To
Lopes, Janine 07-Jun-17 16-Jul-17
Lopes, Janine 17-Jul-17 23-Jul-17
Lopes, Janine 24-Jul-17 31-Aug-17
Baptista, Maria 23-Dec-16 19-Feb-17
Deyak,Sr, Thomas 22-Jan-17 18-Apr-17
Deyak,Sr, Thomas 27-Apr-17 14-May-17
Deyak,Sr, Thomas 15-May-17 21-May-17
Deyak,Sr, Thomas 22-May-17 28-May-17
Deyak,Sr, Thomas 29-May-17 31-May-17
Serrentino, Joyce 18-Mar-17 30-Apr-17
More, Cathleen 30-Jul-17 13-Aug-17
More, Cathleen 14-Aug-17 20-Aug-17
More, Cathleen 21-Aug-17 27-Aug-17
More, Cathleen 28-Aug-17 03-Sep-17
More, Cathleen 04-Sep-17 10-Sep-17
More, Cathleen 11-Sep-17 24-Sep-17
Barrows, Michael 30-Jan-17 19-Mar-17
Barrows, Michael 20-Mar-17 26-Mar-17
Barrows, Michael 27-Mar-17 02-Apr-17
Barrows, Michael 03-Apr-17 07-Apr-17
Mostly for one user the to date is one greater than from date and is continuous but in some cases there is break the data so my output should look like this:
Name From To
Lopes, Janine 07-Jun-17 31-Aug-17
Baptista, Maria 23-Dec-16 19-Feb-17
Deyak,Sr, Thomas 22-Jan-17 18-Apr-17
Deyak,Sr, Thomas 27-Apr-17 31-May-17
Serrentino, Joyce 18-Mar-17 30-Apr-17
More, Cathleen 30-Jul-17 24-Sep-17
Barrows, Michael 30-Jan-17 07-Apr-17
If I do min(from) and max(to) I loose some records like for Thomas.
How should I write sql to get the data is I require.

In Oracle 12.1 and above, the MATCH_RECOGNIZE clause does quick work of such requirements. I am using the same setup and simulated data (WITH clause) from my other answer, and the output is also the same.
select name, date_fr, date_to
from inputs
match_recognize(
partition by name
order by date_fr
measures a.date_fr as date_fr,
last(date_to) as date_to
pattern ( a b* )
define b as date_fr = prev(date_to) + 1
)
;

This can be solved nicely with the Tabibitosan method.
Preparation:
alter session set nls_date_format = 'dd-Mon-rr';
Session altered.
;
Query (including simulated inputs for convenience):
with
inputs ( name, date_fr, date_to ) as (
select 'Lopes, Janine' , to_date('07-Jun-17'), to_date('16-Jul-17') from dual union all
select 'Lopes, Janine' , to_date('17-Jul-17'), to_date('23-Jul-17') from dual union all
select 'Lopes, Janine' , to_date('24-Jul-17'), to_date('31-Aug-17') from dual union all
select 'Baptista, Maria' , to_date('23-Dec-16'), to_date('19-Feb-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('22-Jan-17'), to_date('18-Apr-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('27-Apr-17'), to_date('14-May-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('15-May-17'), to_date('21-May-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('22-May-17'), to_date('28-May-17') from dual union all
select 'Deyak,Sr, Thomas' , to_date('29-May-17'), to_date('31-May-17') from dual union all
select 'Serrentino, Joyce', to_date('18-Mar-17'), to_date('30-Apr-17') from dual union all
select 'More, Cathleen' , to_date('30-Jul-17'), to_date('13-Aug-17') from dual union all
select 'More, Cathleen' , to_date('14-Aug-17'), to_date('20-Aug-17') from dual union all
select 'More, Cathleen' , to_date('21-Aug-17'), to_date('27-Aug-17') from dual union all
select 'More, Cathleen' , to_date('28-Aug-17'), to_date('03-Sep-17') from dual union all
select 'More, Cathleen' , to_date('04-Sep-17'), to_date('10-Sep-17') from dual union all
select 'More, Cathleen' , to_date('11-Sep-17'), to_date('24-Sep-17') from dual union all
select 'Barrows, Michael' , to_date('30-Jan-17'), to_date('19-Mar-17') from dual union all
select 'Barrows, Michael' , to_date('20-Mar-17'), to_date('26-Mar-17') from dual union all
select 'Barrows, Michael' , to_date('27-Mar-17'), to_date('02-Apr-17') from dual union all
select 'Barrows, Michael' , to_date('03-Apr-17'), to_date('07-Apr-17') from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select name, min(date_fr) as date_fr, max(date_to) as date_to
from ( select name, date_fr, date_to,
date_to - sum( date_to - date_fr + 1 ) over (partition by name
order by date_fr) as gr
from inputs
)
group by name, gr
order by name, date_fr
;
Output:
NAME DATE_FR DATE_TO
----------------- --------- ---------
Baptista, Maria 23-Dec-16 19-Feb-17
Barrows, Michael 30-Jan-17 07-Apr-17
Deyak,Sr, Thomas 22-Jan-17 18-Apr-17
Deyak,Sr, Thomas 27-Apr-17 31-May-17
Lopes, Janine 07-Jun-17 31-Aug-17
More, Cathleen 30-Jul-17 24-Sep-17
Serrentino, Joyce 18-Mar-17 30-Apr-17
7 rows selected

Related

Filtering out data based effective and Term Data

I have query contains both start and end date, i would like to filter out data based on two dates. I need only 2019 and higher data either based on start or end date, if you can have a look examples. i need ID 1,2,3,6,7 and 4,5 is not required. We can do based on extract year for both start and end date. but looking for better approach Thanks!
CREATE TABLE TEMP
(
ID INT,
SDate DATE,
EDate DATE
)
INSERT INTO TEMP
SELECT 1,'01/01/2014', '01/01/2019' FROM DUAL
UNION ALL
SELECT 2,'01/01/2015', '01/01/2020' FROM DUAL
UNION ALL
SELECT 3,'01/01/2019', '12/31/2019' FROM DUAL
UNION ALL
SELECT 4,'01/01/2012', '12/31/2018' FROM DUAL
UNION ALL
SELECT 5,'01/01/2010', '10/01/2016' FROM DUAL
UNION ALL
SELECT 6,'06/01/2020', '10/01/2020' FROM DUAL
UNION ALL
SELECT 7,'01/01/2021', '03/01/2021' FROM DUAL
Something like this? Sounds like what you said, but - you didn't post how you do it now. If that's not "it", what "better" approach do you need and why? I mean, what's wrong with this?
SQL> select *
2 from temp
3 where extract(year from sdate) >= 2019
4 or extract(year from edate) >= 2019;
ID SDATE EDATE
---------- ---------- ----------
1 01/01/2014 01/01/2019
2 01/01/2015 01/01/2020
3 01/01/2019 12/31/2019
6 06/01/2020 10/01/2020
7 01/01/2021 03/01/2021
SQL>
You can simply use the condition on edate as follows:
select * from emp
where edate >= date '2019-01-01';
This will use the index on edate, if any.

Month counts between dates

I have the below table. I need to count how many ids were active in a given month. So thinking I'll need to create a row for each id that was active during that month so that id can be counted each month. A row should be generated for a term_dt during that month.
active_dt term_dt id
1/1/2018 101
1/1/2018 5/15/2018 102
3/1/2018 6/1/2018 103
1/1/2018 4/25/18 104
Apparently this is a "count number of overlapping intervals" problem. The algorithm goes like this:
Create a sorted list of all start and end points
Calculate a running sum over this list, add one when you encounter a start and subtract one when you encounter an end
If two points are same then perform subtractions first
You will end up with list of all points where the sum changed
Here is a rough outline of the query. It is for SQL Server but could be ported to any RDBMS that supports window functions:
WITH cte1(date, val) AS (
SELECT active_dt, 1 FROM #t AS t
UNION ALL
SELECT COALESCE(term_dt, '2099-01-01'), -1 FROM #t AS t
-- if end date is null then assume the row is valid indefinitely
), cte2 AS (
SELECT date, SUM(val) OVER(ORDER BY date, val) AS rs
FROM cte1
)
SELECT YEAR(date) AS YY, MONTH(date) AS MM, MAX(rs) AS MaxActiveThisYearMonth
FROM cte2
GROUP BY YEAR(date), MONTH(date)
DB Fiddle
I was toying with a simpler query, that seemed to do the trick, for Oracle:
with candidates (month_start) as (
select to_date ('2018-' || column_value || '-01','YYYY-MM-DD')
from
table
(sys.odcivarchar2list('01','02','03','04','05',
'06','07','08','09','10','11','12'))
), sample_data (active_dt, term_dt, id) as (
select to_date('01/01/2018', 'MM/DD/YYYY'), null, 101 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('05/15/2018', 'MM/DD/YYYY'), 102 from dual
union select to_date('03/01/2018', 'MM/DD/YYYY'),
to_date('06/01/2018', 'MM/DD/YYYY'), 103 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('04/25/2018', 'MM/DD/YYYY'), 104 from dual
)
select c.month_start, count(1)
from candidates c
join sample_data d
on c.month_start between d.active_dt and nvl(d.term_dt,current_date)
group by c.month_start
order by c.month_start
An alternative solution would be to use a hierarchical query, e.g.:
WITH your_table AS (SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, NULL term_dt, 101 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('15/05/2018', 'dd/mm/yyyy') term_dt, 102 ID FROM dual UNION ALL
SELECT to_date('01/03/2018', 'dd/mm/yyyy') active_dt, to_date('01/06/2018', 'dd/mm/yyyy') term_dt, 103 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('25/04/2018', 'dd/mm/yyyy') term_dt, 104 ID FROM dual)
SELECT active_month,
COUNT(*) num_active_ids
FROM (SELECT add_months(TRUNC(active_dt, 'mm'), -1 + LEVEL) active_month,
ID
FROM your_table
CONNECT BY PRIOR ID = ID
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= FLOOR(months_between(coalesce(term_dt, SYSDATE), active_dt)) + 1)
GROUP BY active_month
ORDER BY active_month;
ACTIVE_MONTH NUM_ACTIVE_IDS
------------ --------------
01/01/2018 3
01/02/2018 3
01/03/2018 4
01/04/2018 4
01/05/2018 3
01/06/2018 2
01/07/2018 1
01/08/2018 1
01/09/2018 1
01/10/2018 1
Whether this is more or less performant than the other answers is up to you to test.

Return Data based Relevant Dates

I have an occupancy table and a pay history table. I want to return the state that the employee is in from the occupancy at the time of the relevant pay.
Occupancy Table
Emp#|Commence Date|State
-----|-------------|----
101 | 1/01/2016 | VIC
101 | 1/04/2016 | NSW
101 | 1/08/2016 | ACT
Pay History Table
Emp#|Pay Date
----|--------
101 |15/01/2016
101 |15/02/2016
101 |15/03/2016
101 |15/04/2016
101 |15/05/2016
101 |15/06/2016
101 |15/07/2016
101 |15/08/2016
101 |15/09/2016
I'm wanting to return the following
Emp#|:Pay Date:|State
----|----------|-----
101 |15/01/2016|VIC
101 |15/02/2016|VIC
101 |15/03/2016|VIC
101 |15/04/2016|NSW
101 |15/05/2016|NSW
101 |15/06/2016|NSW
101 |15/07/2016|NSW
101 |15/08/2016|ACT
101 |15/09/2016|ACT
Can someone assist, please
You need to generate the end_date in the occupancy table in a subquery; the lead() function is perfect for this purpose. I use it with all three arguments - the third argument gives a "default" date which I chose arbitrarily as 15 December 2099 for the "current" status. Then its a simple join on empno and a between condition on dates.
I assume you have more than one empno in your data, so I accommodated that. Then: I don't know if # is legal in Oracle column names, but I didn't want to try; I changed to empno. And names definitely can't have spaces in them unless you quote the names, which has many disadvantages; I worked around that too.
with
occupancy ( empno, commence_date, state ) as (
select 101, to_date('1/01/2016', 'dd/mm/yyyy'), 'VIC' from dual union all
select 101, to_date('1/04/2016', 'dd/mm/yyyy'), 'NSW' from dual union all
select 101, to_date('1/08/2016', 'dd/mm/yyyy'), 'ACT' from dual
),
pay_history ( empno, pay_date ) as (
select 101, to_date('15/01/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/02/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/03/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/04/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/05/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/06/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/07/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/08/2016', 'dd/mm/yyyy') from dual union all
select 101, to_date('15/09/2016', 'dd/mm/yyyy') from dual
)
-- end of test data (not part of the SQL query); query begins below this line
select p.empno, p.pay_date, o.state
from pay_history p inner join (
select empno, commence_date,
lead(commence_date, 1, date '2099-12-15')
over (partition by empno order by commence_date) as end_date,
state
from occupancy ) o
on p.empno = o.empno
and p.pay_date between o.commence_date and o.end_date
order by empno, pay_date -- if needed
;
Output:
EMPNO PAY_DATE STATE
----- ---------- -----
101 15/01/2016 VIC
101 15/02/2016 VIC
101 15/03/2016 VIC
101 15/04/2016 NSW
101 15/05/2016 NSW
101 15/06/2016 NSW
101 15/07/2016 NSW
101 15/08/2016 ACT
101 15/09/2016 ACT
9 rows selected.
Negotiating against myself here :-) I am posting this as a separate Answer, rather than editing my earlier one, because this is indeed a different answer.
Please refer to my other Answer for the input data and sample output - they are the same. Only the query is different. Instead of a join, we can UNION ALL the two tables (with some necessary adjustments: add a null column for state to the pay_history table, and a flag of 0 for the occupancy table and 1 for the pay_history table); then use the last_value() analytic function on the resulting union, and filter out the rows from the occupancy table in the outermost query. This may be quite a bit faster than the join-based solution.
select empno, dt as pay_date, state
from (
select empno, dt, flag,
last_value(state ignore nulls)
over (partition by empno order by dt, flag) as state
from (
select empno, commence_date as dt, state, 0 as flag
from occupancy
union all
select empno, pay_date, null, 1
from pay_history
)
)
where flag = 1
order by empno, pay_date -- if needed
;

Finding missing dates in a sequence

I have following table with ID and DATE
ID DATE
123 7/1/2015
123 6/1/2015
123 5/1/2015
123 4/1/2015
123 9/1/2014
123 8/1/2014
123 7/1/2014
123 6/1/2014
456 11/1/2014
456 10/1/2014
456 9/1/2014
456 8/1/2014
456 5/1/2014
456 4/1/2014
456 3/1/2014
789 9/1/2014
789 8/1/2014
789 7/1/2014
789 6/1/2014
789 5/1/2014
789 4/1/2014
789 3/1/2014
In this table, I have three customer ids, 123, 456, 789 and date column which shows which month they worked.
I want to find out which of the customers have gap in their work.
Our customers work record is kept per month...so, dates are monthly..
and each customer have different start and end dates.
Expected results:
ID First_Absent_date
123 10/01/2014
456 06/01/2014
To get a simple list of the IDs with gaps, with no further details, you need to look at each ID separately, and as #mikey suggested you can count the number of months and look at the first and last date to see if how many months that spans.
If your table has a column called month (since date isn't allowed unless it's a quoted identifier) you could start with:
select id, count(month), min(month), max(month),
months_between(max(month), min(month)) + 1 as diff
from your_table
group by id
order by id;
ID COUNT(MONTH) MIN(MONTH) MAX(MONTH) DIFF
---------- ------------ ---------- ---------- ----------
123 8 01-JUN-14 01-JUL-15 14
456 7 01-MAR-14 01-NOV-14 9
789 7 01-MAR-14 01-SEP-14 7
Then compare the count with the month span, in a having clause:
select id
from your_table
group by id
having count(month) != months_between(max(month), min(month)) + 1
order by id;
ID
----------
123
456
If you can actually have multiple records in a month for an ID, and/or the date recorded might not be the start of the month, you can do a bit more work to normalise the dates:
select id,
count(distinct trunc(month, 'MM')),
min(trunc(month, 'MM')),
max(trunc(month, 'MM')),
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1 as diff
from your_table
group by id
order by id;
select id
from your_table
group by id
having count(distinct trunc(month, 'MM')) !=
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1
order by id;
Oracle Setup:
CREATE TABLE your_table ( ID, "DATE" ) AS
SELECT 123, DATE '2015-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-06-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-05-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-04-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-11-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-10-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-03-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-03-01' FROM DUAL;
Query:
SELECT ID,
MIN( missing_date )
FROM (
SELECT ID,
CASE WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
= ADD_MONTHS( "DATE", 1 ) THEN NULL
WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
IS NULL THEN NULL
ELSE ADD_MONTHS( "DATE", 1 )
END AS missing_date
FROM your_table
)
GROUP BY ID
HAVING COUNT( missing_date ) > 0;
Output:
ID MIN(MISSING_DATE)
---------- -------------------
123 2014-10-01 00:00:00
456 2014-06-01 00:00:00
You could use a Lag() function to see if records have been skipped for a particular date or not.Lag() basically helps in comparing the data in current row with previous row. So if we order by DATE, we could easily compare and find any gaps.
select * from
(
select ID,DATE_, case when DATE_DIFF>1 then 1 else 0 end comparison from
(
select ID, DATE_ ,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;
This groups all the entries by id, and then arranges the records by date. If a customer is always present, there would not be a gap in his date. So anyone who has a date difference greater than 1 had a gap. You could tweak this as per your requirement.
EDIT : Just observed that you are storing data in mm/dd/yyyy format, when I closely observed above answers.You are storing only first date of every month. So, the above query can be tweaked as :
select * from
(
select ID,DATE_,PREV_DATE,last_day(PREV_DATE)+1 ABSENT_DATE, case when DATE_DIFF>31 then 1 else 0 end comparison from
(
select ID, DATE_ ,LAG(DATE_,1) OVER (PARTITION BY ID ORDER BY DATE_) PREV_DATE,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;

Oracle SQL Row Number selection

These below all relate to the same record in the same file....basically it is labelled 'UNK' until someone assigns a product number to it. in this case the number 12345678 was assigned by Paul on 01Jan. Each record before/after that is when someone changes something on that record.
What I want is to capture that record, the 1st time when it goes from UNK to a number... and capture the user name and date etc from that line.
I have tried min, least, and I'm not sure about rownum or where to put the string if I did.
Car_Id Product # user name date
111 unk john 20Dec
111 unk alan 25Dec
111 unk pete 30Dec
111 12345678 paul 01Jan
111 12345678 jim 10Jan
222 unk alan 25Dec
222 unk pete 30Dec
222 87654321 paul 02Jan
222 87654321 steve 05Jan
But in logical terms I want it to do this... give me the 1st record after UNK.
Please can I have the full string if possible.
Correct me if I am wrong, but your data seems to be ordered by date, so logically you could just take the first recoredset where the productnumber is not "unk".
Select *
From (SELECT * FROM YourTable orderby date) t -- make sure data is ordered before selecting it
where t.ProductNr <> 'unk' and -- don't get data without a number
rownum = 1 -- take the first
Sounds like maybe the analytic function row_number() would be the best way to do this:
with sample_data as (select 111 car_id, 'unk' product#, 'john' user_name, to_date('20/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 111 car_id, 'unk' product#, 'alan' user_name, to_date('21/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 111 car_id, 'unk' product#, 'pete' user_name, to_date('22/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 111 car_id, '12345678' product#, 'paul' user_name, to_date('23/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 111 car_id, '12345678' product#, 'jim' user_name, to_date('24/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 222 car_id, 'unk' product#, 'alan' user_name, to_date('25/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 222 car_id, 'unk' product#, 'pete' user_name, to_date('26/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 222 car_id, '87654321' product#, 'paul' user_name, to_date('27/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual union all
select 222 car_id, '87654321' product#, 'steve' user_name, to_date('28/12/2014 10:12:24', 'dd/mm/yyyy hh24:mi:ss') dt from dual)
select car_id,
product#,
user_name,
dt
from (select sd.*,
row_number() over (partition by car_id order by dt) rn
from sample_data sd
where product# != 'unk')
where rn = 1;
CAR_ID PRODUCT# USER_NAME DT
---------- -------- --------- ---------------------
111 12345678 paul 23/12/2014 10:12:24
222 87654321 paul 27/12/2014 10:12:24