Finding missing dates in a sequence - sql

I have following table with ID and DATE
ID DATE
123 7/1/2015
123 6/1/2015
123 5/1/2015
123 4/1/2015
123 9/1/2014
123 8/1/2014
123 7/1/2014
123 6/1/2014
456 11/1/2014
456 10/1/2014
456 9/1/2014
456 8/1/2014
456 5/1/2014
456 4/1/2014
456 3/1/2014
789 9/1/2014
789 8/1/2014
789 7/1/2014
789 6/1/2014
789 5/1/2014
789 4/1/2014
789 3/1/2014
In this table, I have three customer ids, 123, 456, 789 and date column which shows which month they worked.
I want to find out which of the customers have gap in their work.
Our customers work record is kept per month...so, dates are monthly..
and each customer have different start and end dates.
Expected results:
ID First_Absent_date
123 10/01/2014
456 06/01/2014

To get a simple list of the IDs with gaps, with no further details, you need to look at each ID separately, and as #mikey suggested you can count the number of months and look at the first and last date to see if how many months that spans.
If your table has a column called month (since date isn't allowed unless it's a quoted identifier) you could start with:
select id, count(month), min(month), max(month),
months_between(max(month), min(month)) + 1 as diff
from your_table
group by id
order by id;
ID COUNT(MONTH) MIN(MONTH) MAX(MONTH) DIFF
---------- ------------ ---------- ---------- ----------
123 8 01-JUN-14 01-JUL-15 14
456 7 01-MAR-14 01-NOV-14 9
789 7 01-MAR-14 01-SEP-14 7
Then compare the count with the month span, in a having clause:
select id
from your_table
group by id
having count(month) != months_between(max(month), min(month)) + 1
order by id;
ID
----------
123
456
If you can actually have multiple records in a month for an ID, and/or the date recorded might not be the start of the month, you can do a bit more work to normalise the dates:
select id,
count(distinct trunc(month, 'MM')),
min(trunc(month, 'MM')),
max(trunc(month, 'MM')),
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1 as diff
from your_table
group by id
order by id;
select id
from your_table
group by id
having count(distinct trunc(month, 'MM')) !=
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1
order by id;

Oracle Setup:
CREATE TABLE your_table ( ID, "DATE" ) AS
SELECT 123, DATE '2015-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-06-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-05-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-04-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-11-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-10-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-03-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-03-01' FROM DUAL;
Query:
SELECT ID,
MIN( missing_date )
FROM (
SELECT ID,
CASE WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
= ADD_MONTHS( "DATE", 1 ) THEN NULL
WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
IS NULL THEN NULL
ELSE ADD_MONTHS( "DATE", 1 )
END AS missing_date
FROM your_table
)
GROUP BY ID
HAVING COUNT( missing_date ) > 0;
Output:
ID MIN(MISSING_DATE)
---------- -------------------
123 2014-10-01 00:00:00
456 2014-06-01 00:00:00

You could use a Lag() function to see if records have been skipped for a particular date or not.Lag() basically helps in comparing the data in current row with previous row. So if we order by DATE, we could easily compare and find any gaps.
select * from
(
select ID,DATE_, case when DATE_DIFF>1 then 1 else 0 end comparison from
(
select ID, DATE_ ,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;
This groups all the entries by id, and then arranges the records by date. If a customer is always present, there would not be a gap in his date. So anyone who has a date difference greater than 1 had a gap. You could tweak this as per your requirement.
EDIT : Just observed that you are storing data in mm/dd/yyyy format, when I closely observed above answers.You are storing only first date of every month. So, the above query can be tweaked as :
select * from
(
select ID,DATE_,PREV_DATE,last_day(PREV_DATE)+1 ABSENT_DATE, case when DATE_DIFF>31 then 1 else 0 end comparison from
(
select ID, DATE_ ,LAG(DATE_,1) OVER (PARTITION BY ID ORDER BY DATE_) PREV_DATE,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;

Related

compare the value with the previous line oracle sql ORACLE

I have a table like this.
Date
Enddate
20012022
21012022
21012022
23012022
23012022
24012022
20012022
26012022
26012022
27012022
27012022
27012022
The next date entry is equal to the last one enddate. How do I find lines that don't follow this rule? In the example, line 4 (previus enddate 24012022 - next date 20012022).
I tried use
lag()
I can't understand how it works... Thanks for helping..
Here's one option.
Sample data:
SQL> with test (datum, enddatum) as
2 (select date '2022-01-20', date '2022-01-21' from dual union all
3 select date '2022-01-21', date '2022-01-23' from dual union all
4 select date '2022-01-23', date '2022-01-24' from dual union all
5 select date '2022-01-20', date '2022-12-26' from dual union all
6 select date '2022-12-26', date '2022-12-27' from dual union all
7 select date '2022-12-27', date '2022-12-27' from dual
8 ),
Query begins here: find previous enddatum so that you could compare it to datum (line #17):
9 temp as
10 (select datum,
11 enddatum,
12 lag(enddatum) over (order by enddatum) previous_enddatum
13 from test
14 )
15 select datum, enddatum
16 from temp
17 where datum <> previous_enddatum;
DATUM ENDDATUM
---------- ----------
20.01.2022 26.12.2022
SQL>
The LAG() function's result depends on query partition clause and order by clause. Here are two codes giving different results if ordered by Start or End date:
Your sample data:
WITH
tbl (START_DATE, END_DATE) as
( Select DATE '2022-01-20', DATE '2022-01-21' From dual Union All
Select DATE '2022-01-21', DATE '2022-01-23' From dual Union All
Select DATE '2022-01-23', DATE '2022-01-24' From dual Union All
Select DATE '2022-01-20', DATE '2022-12-26' From dual Union All
Select DATE '2022-12-26', DATE '2022-12-27' From dual Union All
Select DATE '2022-12-27', DATE '2022-12-27' From dual
)
Using Order By END_DATE:
Select START_DATE, END_DATE,
CASE
WHEN START_DATE != LAG(END_DATE) OVER(ORDER BY END_DATE)
THEN 'Should be ' || LAG(END_DATE) OVER(ORDER BY END_DATE)
END "END_DATE_CHECK"
From tbl
START_DATE END_DATE END_DATE_CHECK
---------- --------- -------------------
20-JAN-22 21-JAN-22
21-JAN-22 23-JAN-22
23-JAN-22 24-JAN-22
20-JAN-22 26-DEC-22 Should be 24-JAN-22
26-DEC-22 27-DEC-22
27-DEC-22 27-DEC-22
Using Order By START_DATE
Select START_DATE, END_DATE,
CASE
WHEN START_DATE != LAG(END_DATE) OVER(ORDER BY START_DATE)
THEN 'Should be ' || LAG(END_DATE) OVER(ORDER BY START_DATE)
END "END_DATE_CHECK"
From tbl
START_DATE END_DATE END_DATE_CHECK
---------- --------- -------------------
20-JAN-22 21-JAN-22
20-JAN-22 26-DEC-22 Should be 21-JAN-22
21-JAN-22 23-JAN-22 Should be 26-DEC-22
23-JAN-22 24-JAN-22
26-DEC-22 27-DEC-22 Should be 24-JAN-22
27-DEC-22 27-DEC-22
It looks like there is something missing in your sample data (some ID column maybe). Let's say that there is some column the dates belong to and that we could partition the dates by that column like below. There is no checking problems at all:
3. Using Partition By
WITH
tbl (ID, START_DATE, END_DATE) as
( Select 1, DATE '2022-01-20', DATE '2022-01-21' From dual Union All
Select 1, DATE '2022-01-21', DATE '2022-01-23' From dual Union All
Select 1, DATE '2022-01-23', DATE '2022-01-24' From dual Union All
Select 2, DATE '2022-01-20', DATE '2022-12-26' From dual Union All
Select 2, DATE '2022-12-26', DATE '2022-12-27' From dual Union All
Select 2, DATE '2022-12-27', DATE '2022-12-27' From dual
)
Select ID, START_DATE, END_DATE,
CASE
WHEN START_DATE != LAG(END_DATE) OVER(Partition By ID ORDER BY START_DATE)
THEN 'Should be ' || LAG(END_DATE) OVER(Partition By ID ORDER BY START_DATE)
END "END_DATE_CHECK"
From tbl
ID START_DATE END_DATE END_DATE_CHECK
---------- ---------- --------- -------------------
1 20-JAN-22 21-JAN-22
1 21-JAN-22 23-JAN-22
1 23-JAN-22 24-JAN-22
2 20-JAN-22 26-DEC-22
2 26-DEC-22 27-DEC-22
2 27-DEC-22 27-DEC-22
In this case there is no difference using Start or End date ordering... More about LAG() OVER() here.

Grouping by Date inclusivity

Here is the data I'm working with here
Accountid
Month
123
08/01/2021
123
09/01/2021
123
03/01/2022
123
04/01/2022
123
05/01/2022
123
06/01/2022
I'm trying to insert into a new table where the data is like this
Accountid
Start Month
End Month
123
08/01/2021
09/01/2021
123
03/01/2022
06/01/2022
I'm not sure how to separate them with the gap, and group by the account id in this case.
Thanks in advance
In 12c+ you may also use match_recognize for gaps-and-islands problems to define grouping rules (islands) in a more readable and natural way.
select *
from input_
match_recognize(
partition by accountid
order by month asc
measures
first(month) as start_month,
last(month) as end_month
/*Any month followed by any number of subsequent month */
pattern(any_ next*)
define
/*Next is the month right after the previous one*/
next as months_between(month, prev(month)) = 1
)
ACCOUNTID
START_MONTH
END_MONTH
123
2021-08-01
2021-09-01
123
2022-03-01
2022-06-01
db<>fiddle here
That's a gaps and islands problem; one option to do it is:
Sample data:
SQL> with test (accountid, month) as
2 (select 123, date '2021-01-08' from dual union all
3 select 123, date '2021-01-09' from dual union all
4 select 123, date '2021-01-03' from dual union all
5 select 123, date '2021-01-04' from dual union all
6 select 123, date '2021-01-05' from dual union all
7 select 123, date '2021-01-06' from dual
8 ),
Query begins here:
9 temp as
10 (select accountid, month,
11 to_char(month, 'J') - row_number() Over
12 (partition by accountid order by month) diff
13 from test
14 )
15 select accountid,
16 min(month) as start_month,
17 max(month) as end_Month
18 from temp
19 group by accountid, diff
20 order by accountid, start_month;
ACCOUNTID START_MONT END_MONTH
---------- ---------- ----------
123 03/01/2021 06/01/2021
123 08/01/2021 09/01/2021
SQL>
Although related to MS SQL Server, have a look at Introduction to Gaps and Islands Analysis; should be interesting reading for you, I presume.

How to get min and max from 2 tables in SQL

I am Trying to get start date from min ID (ID=1) and end date from max ID (ID=3) but i am not sure how i can retrieve. Following is my data -
Table1 and Table2 are source table. I am trying to get output like 3rd table.
My requirement is get start date from first record of ID and End Date from last record of ID, we can recognize first and and last record with the help of ID field. If ID is min means first record and ID is max then last record
Please help me!
Here's one option; presuming you use Oracle (regarding you use Oracle SQL Developer), the x inline view selects
start_date which belongs to name with the lowest ID column value for that name (i.e. first_value partition by name order by id)
end_date which belongs to name with the highest ID column value for that name (i.e. first_value partition by name order by id DESC)
SQL> with
2 -- sample data
3 t1 (pid, name) as
4 (select 123, 'xyz' from dual union all
5 select 234, 'pqr' from dual
6 ),
7 t2 (id, name, start_date, end_date) as
8 (select 1, 'xyz', date '2020-01-01', date '2020-07-20' from dual union all
9 select 2, 'xyz', date '2020-02-01', date '2020-05-30' from dual union all
10 select 3, 'xyz', date '2020-06-30', date '2020-07-30' from dual union all
11 --
12 select 1, 'pqr', date '2020-04-30', date '2020-09-30' from dual union all
13 select 2, 'pqr', date '2020-05-30', date '2020-09-30' from dual union all
14 select 3, 'pqr', date '2020-06-30', date '2020-07-01' from dual
15 )
16 select a.pid,
17 x.name,
18 max(x.start_date) start_date,
19 max(x.end_date) end_date
20 from t1 a join
21 (
22 -- start_date: always for the lowest T2.ID value row
23 -- end_date : always for the highest T2.ID value row
24 select b.name,
25 first_value(b.start_date) over (partition by b.name order by b.id ) start_date,
26 first_value(b.end_date) over (partition by b.name order by b.id desc) end_date
27 from t2 b
28 ) x
29 on a.name = x.name
30 group by a.pid,
31 x.name
32 order by a.pid;
PID NAME START_DATE END_DATE
---------- ---- ---------- ----------
123 xyz 01/01/2020 07/30/2020
234 pqr 04/30/2020 07/01/2020
SQL>

Oracle SQL to find population by percentage range

I have a table with customers, purchase date and zip code. Key is (customer_id, purchase_dt and zip_cd)
I am trying to find zip codes where customers are doing business, ranges like 80% and above, 60 - 80%, 40-60%. Can someone help me out with a query to achieve this.
with tmp as
(
select 123 as cust_id, date '2017-01-01' purchase_dt, '10035' zip_cd from dual
union
select 1234 as cust_id, date '2019-06-01' purchase_dt, '11377' zip_cd from dual
union
select 12345 as cust_id, date '2019-07-01' purchase_dt, '11377' zip_cd from dual
union
select 234 as cust_id, date '2019-08-01' purchase_dt, '11377' zip_cd from dual
union
select 2345 as cust_id, date '2019-09-01' purchase_dt, '11417' zip_cd from dual
)
select * from tmp;
Expected output:
80% and above zip code: 11377 and so on..
You can use the combination of the average and analytical function count as follows:
with tmp as
(
select 123 as cust_id, date '2017-01-01' purchase_dt, '10035' zip_cd from dual
union
select 1234 as cust_id, date '2019-06-01' purchase_dt, '11377' zip_cd from dual
union
select 12345 as cust_id, date '2019-07-01' purchase_dt, '11377' zip_cd from dual
union
select 234 as cust_id, date '2019-08-01' purchase_dt, '11377' zip_cd from dual
union
select 2345 as cust_id, date '2019-09-01' purchase_dt, '11417' zip_cd from dual
)
select zip_cd, 100*(count(1)/cnt) percntg from
(select zip_cd, count(1) over () cnt from tmp)
group by zip_cd, cnt
order by percntg desc;
This answer will take into account if a customer has made purchases on multiple days and won't double count them. Additionally, this response adds in the grouping discussed in the question:
with tmp as
(
select 123 as cust_id, date '2017-01-01' purchase_dt, '10035' zip_cd from dual
union
select 1234 as cust_id, date '2019-06-01' purchase_dt, '11377' zip_cd from dual
union
select 12345 as cust_id, date '2019-07-01' purchase_dt, '11377' zip_cd from dual
union
select 234 as cust_id, date '2019-08-01' purchase_dt, '11377' zip_cd from dual
union
select 2345 as cust_id, date '2019-09-01' purchase_dt, '11417' zip_cd from dual
)
SELECT sub2.pct_range, listagg(sub2.zip_cd||' ('||sub2.zip_pct||')', ', ') WITHIN GROUP (ORDER BY zip_pct DESC) AS ZIP_CODES
FROM (SELECT CASE
WHEN sub.zip_pct BETWEEN 80 AND 100 THEN '80% and above'
WHEN sub.zip_pct BETWEEN 60 AND 79 THEN '60% to 79%'
WHEN sub.zip_pct BETWEEN 40 AND 59 THEN '40% to 59%'
WHEN sub.zip_pct BETWEEN 20 AND 39 THEN '20% to 39%'
ELSE 'Below 20%'
END AS PCT_RANGE,
sub.zip_cd,
sub.zip_pct
FROM (SELECT DISTINCT
zip_cd,
100*COUNT(DISTINCT cust_id) OVER (PARTITION BY zip_cd)/COUNT(DISTINCT cust_id) OVER () AS ZIP_PCT
FROM tmp) sub) sub2
GROUP BY pct_range
ORDER BY pct_range DESC;

Month counts between dates

I have the below table. I need to count how many ids were active in a given month. So thinking I'll need to create a row for each id that was active during that month so that id can be counted each month. A row should be generated for a term_dt during that month.
active_dt term_dt id
1/1/2018 101
1/1/2018 5/15/2018 102
3/1/2018 6/1/2018 103
1/1/2018 4/25/18 104
Apparently this is a "count number of overlapping intervals" problem. The algorithm goes like this:
Create a sorted list of all start and end points
Calculate a running sum over this list, add one when you encounter a start and subtract one when you encounter an end
If two points are same then perform subtractions first
You will end up with list of all points where the sum changed
Here is a rough outline of the query. It is for SQL Server but could be ported to any RDBMS that supports window functions:
WITH cte1(date, val) AS (
SELECT active_dt, 1 FROM #t AS t
UNION ALL
SELECT COALESCE(term_dt, '2099-01-01'), -1 FROM #t AS t
-- if end date is null then assume the row is valid indefinitely
), cte2 AS (
SELECT date, SUM(val) OVER(ORDER BY date, val) AS rs
FROM cte1
)
SELECT YEAR(date) AS YY, MONTH(date) AS MM, MAX(rs) AS MaxActiveThisYearMonth
FROM cte2
GROUP BY YEAR(date), MONTH(date)
DB Fiddle
I was toying with a simpler query, that seemed to do the trick, for Oracle:
with candidates (month_start) as (
select to_date ('2018-' || column_value || '-01','YYYY-MM-DD')
from
table
(sys.odcivarchar2list('01','02','03','04','05',
'06','07','08','09','10','11','12'))
), sample_data (active_dt, term_dt, id) as (
select to_date('01/01/2018', 'MM/DD/YYYY'), null, 101 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('05/15/2018', 'MM/DD/YYYY'), 102 from dual
union select to_date('03/01/2018', 'MM/DD/YYYY'),
to_date('06/01/2018', 'MM/DD/YYYY'), 103 from dual
union select to_date('01/01/2018', 'MM/DD/YYYY'),
to_date('04/25/2018', 'MM/DD/YYYY'), 104 from dual
)
select c.month_start, count(1)
from candidates c
join sample_data d
on c.month_start between d.active_dt and nvl(d.term_dt,current_date)
group by c.month_start
order by c.month_start
An alternative solution would be to use a hierarchical query, e.g.:
WITH your_table AS (SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, NULL term_dt, 101 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('15/05/2018', 'dd/mm/yyyy') term_dt, 102 ID FROM dual UNION ALL
SELECT to_date('01/03/2018', 'dd/mm/yyyy') active_dt, to_date('01/06/2018', 'dd/mm/yyyy') term_dt, 103 ID FROM dual UNION ALL
SELECT to_date('01/01/2018', 'dd/mm/yyyy') active_dt, to_date('25/04/2018', 'dd/mm/yyyy') term_dt, 104 ID FROM dual)
SELECT active_month,
COUNT(*) num_active_ids
FROM (SELECT add_months(TRUNC(active_dt, 'mm'), -1 + LEVEL) active_month,
ID
FROM your_table
CONNECT BY PRIOR ID = ID
AND PRIOR sys_guid() IS NOT NULL
AND LEVEL <= FLOOR(months_between(coalesce(term_dt, SYSDATE), active_dt)) + 1)
GROUP BY active_month
ORDER BY active_month;
ACTIVE_MONTH NUM_ACTIVE_IDS
------------ --------------
01/01/2018 3
01/02/2018 3
01/03/2018 4
01/04/2018 4
01/05/2018 3
01/06/2018 2
01/07/2018 1
01/08/2018 1
01/09/2018 1
01/10/2018 1
Whether this is more or less performant than the other answers is up to you to test.