partition over period of time for a given column SQL BigQuery

partition over period of time for a given column SQL BigQuery - sql

My query outputs data - row by row - for each record I grab a value from last_modified_date column that is the latest date in that column BUT is not later than the date column's value. I save new value in column custom_last_modified_date. Code works fine, but it only checks for the latest date available for one date - instead of every row in a table. Data looks like this:
id date last_modified_date
A 02/28/22 2017-02-28 22:44
A 03/05/22 2017-02-28 05:14
A 03/05/22 2017-02-28 07:49
A 03/22/22 2017-02-28 06:09
A 03/22/22 2022-03-01 06:49
B 03/25/22 2022-03-20 07:49
B 03/25/22 2022-04-01 09:24
Code:
SELECT
id,
date,
MAX(
IF(
date(
string(
TIMESTAMP(
DATETIME(
parse_datetime('%Y-%m-%d %H:%M', last_modified_date)
)
)
)
) <= date,
date(
string(
TIMESTAMP(
DATETIME(
parse_datetime('%Y-%m-%d %H:%M', last_modified_date)
)
)
)
),
null
)
) OVER (PARTITION BY date, id) as custom_last_modified_date
FROM `my_table`
Output is this:
id date custom_last_modified_date
A 02/28/22 02/28/17
A 03/05/22 02/28/17
A 03/05/22 02/28/17
A 03/22/22 03/01/22
A 03/22/22 03/01/22
B 03/25/22 03/20/22
B 03/25/22 03/20/22
Desired output is:
id date custom_last_modified_date
A 02/28/22 02/28/22
A 03/05/22 03/01/22
A 03/05/22 03/01/22
A 03/22/22 03/01/22
A 03/22/22 03/01/22
B 03/25/22 03/20/22
B 03/25/22 03/20/22

I tried your query and your just have a mistake on partion by, your query will work if you removed date in your partition by clause. Your query should look like this:
NOTE: I just created a CTE to convert dates to make it readable.
with sample_data as (
select 'A' as id, '03/05/22' as date_field, '2017-02-28 22:44' as last_modified_date,
union all select 'A' as id, '03/05/22' as date_field, '2017-02-28 05:14' as last_modified_date,
union all select 'A' as id, '03/05/22' as date_field, '2017-02-28 07:49' as last_modified_date,
union all select 'A' as id, '03/22/22' as date_field, '2017-02-28 06:09' as last_modified_date,
union all select 'A' as id, '03/22/22' as date_field, '2022-03-01 06:49' as last_modified_date,
union all select 'B' as id, '03/25/22' as date_field, '2022-03-20 07:49' as last_modified_date,
union all select 'B' as id, '03/25/22' as date_field, '2022-04-01 09:24' as last_modified_date,
),
conv_data as (
select
id,
format_datetime('%m/%d/%y',parse_date('%m/%d/%y',date_field)) as date_field,
format_datetime('%m/%d/%y',parse_datetime('%Y-%m-%d %H:%M', last_modified_date)) as last_modified_date,
from sample_data
)
select
id,
date_field,
last_modified_date,
max(if(last_modified_date <= date_field, last_modified_date, null)) over (partition by id) as custom_last_modified_date,
from conv_data
Output:

Related

create time range with 2 columns date_time

The problem I am facing is how to find distinct time periods from multiple time periods with overlap in Teradata ANSI SQL.
For example, the attached tables contain multiple overlapping time periods, how can I combine those time periods into 3 unique time periods in Teradata SQL???
I think I can do it in python with the loop function, but not sure how to do it in SQL
ID
Start Date
End Date
001
2005-01-01
2006-01-01
001
2005-01-01
2007-01-01
001
2008-01-01
2008-06-01
001
2008-04-01
2008-12-01
001
2010-01-01
2010-05-01
001
2010-04-01
2010-12-01
001
2010-11-01
2012-01-01
My expected result is:
ID
start_Date
end_date
001
2005-01-01
2007-01-01
001
2008-01-01
2008-12-01
001
2010-01-01
2012-01-01

From Oracle 12, you can use MATCH_RECOGNIZE to perform a row-by-row comparison:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id
ORDER BY start_date
MEASURES
FIRST(start_date) AS start_date,
MAX(end_date) AS end_date
ONE ROW PER MATCH
PATTERN (overlapping_ranges* last_range)
DEFINE overlapping_ranges AS NEXT(start_date) <= MAX(end_date)
)
Which, for the sample data:
CREATE TABLE table_name (ID, Start_Date, End_Date) AS
SELECT '001', DATE '2005-01-01', DATE '2006-01-01' FROM DUAL UNION ALL
SELECT '001', DATE '2005-01-01', DATE '2007-01-01' FROM DUAL UNION ALL
SELECT '001', DATE '2008-01-01', DATE '2008-06-01' FROM DUAL UNION ALL
SELECT '001', DATE '2008-04-01', DATE '2008-12-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-01-01', DATE '2010-05-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-04-01', DATE '2010-12-01' FROM DUAL UNION ALL
SELECT '001', DATE '2010-11-01', DATE '2012-01-01' FROM DUAL;
Outputs:
ID
START_DATE
END_DATE
001
2005-01-01 00:00:00
2007-01-01 00:00:00
001
2008-01-01 00:00:00
2008-12-01 00:00:00
001
2010-01-01 00:00:00
2012-01-01 00:00:00
db<>fiddle here
Update: Alternative query
SELECT id,
start_date,
end_date
FROM (
SELECT id,
dt,
SUM(cnt) OVER (PARTITION BY id ORDER BY dt) AS grp,
cnt
FROM (
SELECT ID,
dt,
SUM(type) OVER (PARTITION BY id ORDER BY dt, ROWNUM) * type AS cnt
FROM table_name
UNPIVOT (dt FOR type IN (start_date AS 1, end_date AS -1))
)
WHERE cnt IN (1,0)
)
PIVOT (MAX(dt) FOR cnt IN (1 AS start_date, 0 AS end_date))
Or, an equivalent that does not use UNPIVOT, PIVOT or ROWNUM and works in both Oracle and PostgreSQL:
SELECT id,
MAX(CASE cnt WHEN 1 THEN dt END) AS start_date,
MAX(CASE cnt WHEN 0 THEN dt END) AS end_date
FROM (
SELECT id,
dt,
SUM(cnt) OVER (PARTITION BY id ORDER BY dt) AS grp,
cnt
FROM (
SELECT ID,
dt,
SUM(type) OVER (PARTITION BY id ORDER BY dt, rn) * type AS cnt
FROM (
SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY dt ASC, type DESC) AS rn
FROM (
SELECT id, 1 AS type, start_date AS dt FROM table_name
UNION ALL
SELECT id, -1 AS type, end_date AS dt FROM table_name
) r
) p
) s
WHERE cnt IN (1,0)
) t
GROUP BY id, grp
Update 2: Another Alternative
SELECT id,
MIN(start_date) AS start_date,
MAX(end_Date) AS end_date
FROM (
SELECT t.*,
SUM(CASE WHEN start_date <= prev_max THEN 0 ELSE 1 END)
OVER (PARTITION BY id ORDER BY start_date) AS grp
FROM (
SELECT t.*,
MAX(end_date) OVER (
PARTITION BY id ORDER BY start_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS prev_max
FROM table_name t
) t
) t
GROUP BY id, grp
db<>fiddle Oracle PostgreSQL

This is a gaps and islands problem. Try this:
with u as
(select ID, start_date, end_date,
case
when start_date <= lag(end_date) over(partition by ID order by start_date, end_date) then 0
else 1 end as grp
from table_name),
v as
(select ID, start_date, end_date,
sum(grp) over(partition by ID order by start_date, end_date) as island
from u)
select ID, min(start_date) as start_Date, max(end_date) as end_date
from v
group by ID, island;
Fiddle
Basically you can identify "islands" by comparing start_date of current row to end_date of previous row (ordered by start_date, end_date), if it precedes it then it's the same island. Then you can do a rolling sum() to get the island numbers. Finally select min(start_date) and max(end_date) from each island to get the desired output.

This may work ,with little bit of change in function , I tried it in Dbeaver :
select ID,Start_Date,End_Date
from
(
select t.*,
dense_rank () over(partition by extract (year from Start_Date) order BY End_Date desc) drnk
from testing_123 t
) temp
where temp.drnk = 1
ORDER BY Start_Date;

Try this
WITH a as (
SELECT
ID,
LEFT(Start_Date, 4) as Year,
MIN(Start_Date) as New_Start_Date
FROM
TAB1
GROUP BY
ID,
LEFT(Start_Date, 4)
), b as (
SELECT
a.ID,
Year,
New_Start_Date,
End_Date
FROM
a
LEFT JOIN
TAB1
ON LEFT(a.New_Start_Date, 4) = LEFT(TAB1.Start_Date, 4)
)
select
ID,
New_Start_Date as Start_Date,
MAX(End_Date)
from
b
GROUP BY
ID,
New_Start_Date;
Example: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=97f91b68c635aebfb752538cdd752ace

Exclude duplicates and capture only changes

I have a scenario, where I have to exclude duplicates and capture only the changes. Also calculate the valid_from and valid_to on the fly. I have tried a query and it works but it is very slow in performance and it is failing with memory error .
Input : Only capture Entries where there is a change either in Amount/Check-In-Out.
Calculate Valid_from and Valid_to based on Date Changed.
Output:
SQL I tried.
select * from (select
lead(start_date, "window_offset" - rn + 1, '9999-12-31') over (order by "grp" ) as valid_to,
case when rn = max(rn) over (partition by "grp") then 1 else 0 end as "isLastUpdate",
start_date as valid_from,*
from (
select
min("DateChanged") over (partition by "grp") as start_date,
count(*) over (partition by "grp") as "window_offset",
row_number() over (partition by "grp" order by "DateChanged") as rn,
*
from (
select sum("isChanged") over (partition by OrderId order by "DateChanged") as "grp",*
from (
select
case when "Amount" = lag( "Amount" ) over (partition by OrderId order by "DateChanged") and
"Check-In" = lag( "Check-In" ) over (partition by OrderId order by "DateChanged") and
"Check-Out" = lag( "Check-Out" ) over (partition by OrderId order by "DateChanged")
then 0
else 1
end "isChanged",
*
FROM :in_table
)
))
where "isLastUpdate" = 1;

The logic of your expected answer is unclear as to why you get valid_from as 8-mar-21 for the first order_id and 9-apr-21 for the second order_id as both order_ids have overlapping ranges but you take the least of the previous check_out and the next check_in for the first order_id and the greatest of those two for the second and it is inconsistent.
If you want to get valid_from as the greatest of either the current check_in or the previous check_outs and valid_to as the greatest of either the current check_out or the next check_in or, if there are no more rows, 9999-12-31 then:
SELECT orderid,
amount,
check_in,
check_out,
GREATEST(
check_in,
COALESCE(
MAX(check_out) OVER (
PARTITION BY orderid
ORDER BY check_in, check_out
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
),
check_in
)
) AS valid_from,
GREATEST(
check_out,
LEAD(check_in, 1, DATE '9999-12-31') OVER (
PARTITION BY orderid ORDER BY check_in, check_out
)
) AS valid_to
FROM (
SELECT DISTINCT *
FROM table_name
)
Which, for the sample data:
CREATE TABLE table_name (orderid, datechanged, amount, check_in, check_out) AS
SELECT 1, DATE '2021-03-3', 12.12, DATE '2021-03-03', DATE '2021-03-10' FROM DUAL UNION ALL
SELECT 1, DATE '2021-03-3', 12.12, DATE '2021-03-03', DATE '2021-03-10' FROM DUAL UNION ALL
SELECT 1, DATE '2021-03-3', 12.12, DATE '2021-03-03', DATE '2021-03-10' FROM DUAL UNION ALL
SELECT 1, DATE '2021-03-8', 21.12, DATE '2021-03-08', DATE '2021-03-18' FROM DUAL UNION ALL
SELECT 1, DATE '2021-03-8', 21.12, DATE '2021-03-08', DATE '2021-03-18' FROM DUAL UNION ALL
SELECT 2, DATE '2021-04-4', 9.10, DATE '2021-04-04', DATE '2021-04-09' FROM DUAL UNION ALL
SELECT 2, DATE '2021-04-4', 9.10, DATE '2021-04-04', DATE '2021-04-09' FROM DUAL UNION ALL
SELECT 2, DATE '2021-04-4', 10.20, DATE '2021-04-04', DATE '2021-04-12' FROM DUAL;
Outputs:
ORDERID
AMOUNT
CHECK_IN
CHECK_OUT
VALID_FROM
VALID_TO
1
12.12
2021-03-03 00:00:00
2021-03-10 00:00:00
2021-03-03 00:00:00
2021-03-10 00:00:00
1
21.12
2021-03-08 00:00:00
2021-03-18 00:00:00
2021-03-10 00:00:00
9999-12-31 00:00:00
2
9.1
2021-04-04 00:00:00
2021-04-09 00:00:00
2021-04-04 00:00:00
2021-04-09 00:00:00
2
10.2
2021-04-04 00:00:00
2021-04-12 00:00:00
2021-04-09 00:00:00
9999-12-31 00:00:00
db<>fiddle here

SQL Select only missing months

Notice the 2017-04-01, 2018-02-01, 2018-07-01, and 2019-01-01 months are missing in the output. I want to show only those months which are missing. Does anyone know how to go about this?
Query:
SELECT TO_DATE("Month", 'mon''yy') as dates FROM sample_sheet
group by dates
order by dates asc;
Output:
2017-01-01
2017-02-01
2017-03-01
2017-05-01
2017-06-01
2017-07-01
2017-08-01
2017-09-01
2017-10-01
2017-11-01
2017-12-01
2018-01-01
2018-03-01
2018-04-01
2018-05-01
2018-06-01
2018-08-01
2018-09-01
2018-10-01
2018-11-01
2018-12-01
2019-02-01
2019-03-01
2019-04-01

I don't know Vertica, so I wrote a working proof of concept in Microsoft SQL Server and tried to convert it to Vertica syntax based on the online documentation.
It should look like this:
with
months as (
select 2017 as date_year, 1 as date_month, to_date('2017-01-01', 'YYYY-MM-DD') as first_date, to_date('2017-01-31', 'yyyy-mm-dd') as last_date
union all
select
year(add_months(first_date, 1)) as date_year,
month(add_months(first_date, 1)) as date_month,
add_months(first_date, 1) as first_date,
last_day(add_months(first_date, 1)) as last_date
from months
where first_date < current_date
),
sample_dates (a_date) as (
select to_date('2017-01-15', 'YYYY-MM-DD') union all
select to_date('2017-01-22', 'YYYY-MM-DD') union all
select to_date('2017-02-01', 'YYYY-MM-DD') union all
select to_date('2017-04-15', 'YYYY-MM-DD') union all
select to_date('2017-06-15', 'YYYY-MM-DD')
)
select *
from sample_dates right join months on sample_dates.a_date between first_date and last_date
where sample_dates.a_date is null
Months is a recursive dynamic table that holds all months since 2017-01, with first and last day of the month. sample_dates is just a list of dates to test the logic - you should replace it with your own table.
Once you build that monthly calendar table all you need to do is check your dates against it using an outer query to see what dates are not between any of those periods between first_date and last_date columns.

You can build a TIMESERIES of all dates between the first input date and the last input date (The highest granularity of a TIMESERIES is the day.), and filter out only the months' first days out of that; then left join that created sequence of firsts of month with your input to find out where the join would fail, checking for NULLS from the input branch of the join:
WITH
-- your input
input(mth1st) AS (
SELECT DATE '2017-01-01'
UNION ALL SELECT DATE '2017-02-01'
UNION ALL SELECT DATE '2017-03-01'
UNION ALL SELECT DATE '2017-05-01'
UNION ALL SELECT DATE '2017-06-01'
UNION ALL SELECT DATE '2017-07-01'
UNION ALL SELECT DATE '2017-08-01'
UNION ALL SELECT DATE '2017-09-01'
UNION ALL SELECT DATE '2017-10-01'
UNION ALL SELECT DATE '2017-11-01'
UNION ALL SELECT DATE '2017-12-01'
UNION ALL SELECT DATE '2018-01-01'
UNION ALL SELECT DATE '2018-03-01'
UNION ALL SELECT DATE '2018-04-01'
UNION ALL SELECT DATE '2018-05-01'
UNION ALL SELECT DATE '2018-06-01'
UNION ALL SELECT DATE '2018-08-01'
UNION ALL SELECT DATE '2018-09-01'
UNION ALL SELECT DATE '2018-10-01'
UNION ALL SELECT DATE '2018-11-01'
UNION ALL SELECT DATE '2018-12-01'
UNION ALL SELECT DATE '2019-02-01'
UNION ALL SELECT DATE '2019-03-01'
UNION ALL SELECT DATE '2019-04-01'
)
,
-- need a series of month's firsts
-- TIMESERIES works for INTERVAL DAY TO SECOND
-- so build that timeseries, and filter out
-- the month's firsts
limits(mth1st) AS (
SELECT MIN(mth1st) FROM input
UNION ALL SELECT MAX(mth1st) FROM input
)
,
alldates AS (
SELECT dt::DATE FROM limits
TIMESERIES dt AS '1 day' OVER(ORDER BY mth1st::TIMESTAMP)
)
,
allfirsts(mth1st) AS (
SELECT dt FROM alldates WHERE DAY(dt)=1
)
SELECT
allfirsts.mth1st
FROM allfirsts
LEFT JOIN input USING(mth1st)
WHERE input.mth1st IS NULL;
-- out mth1st
-- out ------------
-- out 2017-04-01
-- out 2018-02-01
-- out 2018-07-01
-- out 2019-01-01

Identify contiguous and discontinuous date ranges

I have a table named x . The data is as follows.
Acccount_num start_dt end_dt
A111326 02/01/2016 02/11/2016
A111326 02/12/2016 03/05/2016
A111326 03/02/2016 03/16/2016
A111331 02/28/2016 02/29/2016
A111331 02/29/2016 03/29/2016
A999999 08/25/2015 08/25/2015
A999999 12/19/2015 12/22/2015
A222222 11/06/2015 11/10/2015
A222222 05/16/2016 05/17/2016
Both A111326 and A111331 should be identified as contiguous data and A999999 and
A222222 should be identified as discontinuous data.In my code I currently use the following query to identify discontinuous data. The A111326 is also erroneously identified as discontinuous data. Please help to modify the below code so that A111326 is not identified as discontinuous data.Thanks in advance for your help.
(SELECT account_num
FROM (SELECT account_num,
(MAX (
END_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
START_DT,
(LEAD (
START_DT)
OVER (PARTITION BY account_num
ORDER BY START_DT))
END_DT
FROM x
WHERE (START_DT + 1) <=
(END_DT - 1))
WHERE START_DT < END_DT);

Oracle Setup:
CREATE TABLE accounts ( Account_num, start_dt, end_dt ) AS
SELECT 'A', DATE '2016-02-01', DATE '2016-02-11' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-02-12', DATE '2016-03-05' FROM DUAL UNION ALL
SELECT 'A', DATE '2016-03-02', DATE '2016-03-16' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-28', DATE '2016-02-29' FROM DUAL UNION ALL
SELECT 'B', DATE '2016-02-29', DATE '2016-03-29' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-08-25', DATE '2015-08-25' FROM DUAL UNION ALL
SELECT 'C', DATE '2015-12-19', DATE '2015-12-22' FROM DUAL UNION ALL
SELECT 'D', DATE '2015-11-06', DATE '2015-11-10' FROM DUAL UNION ALL
SELECT 'D', DATE '2016-05-16', DATE '2016-05-17' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-01', DATE '2016-01-02' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-05', DATE '2016-01-06' FROM DUAL UNION ALL
SELECT 'E', DATE '2016-01-03', DATE '2016-01-07' FROM DUAL;
Query:
WITH times ( account_num, dt, lvl ) AS (
SELECT Account_num, start_dt - 1, 1 FROM accounts
UNION ALL
SELECT Account_num, end_dt, -1 FROM accounts
)
, totals ( account_num, dt, total ) AS (
SELECT account_num,
dt,
SUM( lvl ) OVER ( PARTITION BY Account_num ORDER BY dt, lvl DESC )
FROM times
)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM totals
GROUP BY Account_Num
ORDER BY Account_Num;
Output:
ACCOUNT_NUM IS_CONTIGUOUS
----------- -------------
A Y
B Y
C N
D N
E Y
Alternative Query:
(It's exactly the same method just using UNPIVOT rather than UNION ALL.)
SELECT Account_num,
CASE WHEN COUNT( CASE total WHEN 0 THEN 1 END ) > 1
THEN 'N'
ELSE 'Y'
END AS is_contiguous
FROM (
SELECT Account_num,
SUM( lvl ) OVER ( PARTITION BY Account_Num
ORDER BY CASE lvl WHEN 1 THEN dt - 1 ELSE dt END,
lvl DESC
) AS total
FROM accounts
UNPIVOT ( dt FOR lvl IN ( start_dt AS 1, end_dt AS -1 ) )
)
GROUP BY Account_Num
ORDER BY Account_Num;

WITH cte AS (
SELECT
AccountNumber
,CASE
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) IS NULL THEN 0
WHEN
LAG(End_Dt) OVER (PARTITION BY AccountNumber ORDER BY End_Dt) >= Start_Dt - 1 THEN 0
ELSE 1
END as discontiguous
FROM
#Table
)
SELECT
AccountNumber
,CASE WHEN SUM(discontiguous) > 0 THEN 'discontiguous' ELSE 'contiguous' END
FROM
cte
GROUP BY
AccountNumber;
One of your problems is that your contiguous desired result also includes overlapping date ranges in your example data set. Example A111326 Starts on 3/2/2016 but ends the row before on 3/5/2015 meaning it overlaps by 3 days.

Finding missing dates in a sequence

I have following table with ID and DATE
ID DATE
123 7/1/2015
123 6/1/2015
123 5/1/2015
123 4/1/2015
123 9/1/2014
123 8/1/2014
123 7/1/2014
123 6/1/2014
456 11/1/2014
456 10/1/2014
456 9/1/2014
456 8/1/2014
456 5/1/2014
456 4/1/2014
456 3/1/2014
789 9/1/2014
789 8/1/2014
789 7/1/2014
789 6/1/2014
789 5/1/2014
789 4/1/2014
789 3/1/2014
In this table, I have three customer ids, 123, 456, 789 and date column which shows which month they worked.
I want to find out which of the customers have gap in their work.
Our customers work record is kept per month...so, dates are monthly..
and each customer have different start and end dates.
Expected results:
ID First_Absent_date
123 10/01/2014
456 06/01/2014

To get a simple list of the IDs with gaps, with no further details, you need to look at each ID separately, and as #mikey suggested you can count the number of months and look at the first and last date to see if how many months that spans.
If your table has a column called month (since date isn't allowed unless it's a quoted identifier) you could start with:
select id, count(month), min(month), max(month),
months_between(max(month), min(month)) + 1 as diff
from your_table
group by id
order by id;
ID COUNT(MONTH) MIN(MONTH) MAX(MONTH) DIFF
---------- ------------ ---------- ---------- ----------
123 8 01-JUN-14 01-JUL-15 14
456 7 01-MAR-14 01-NOV-14 9
789 7 01-MAR-14 01-SEP-14 7
Then compare the count with the month span, in a having clause:
select id
from your_table
group by id
having count(month) != months_between(max(month), min(month)) + 1
order by id;
ID
----------
123
456
If you can actually have multiple records in a month for an ID, and/or the date recorded might not be the start of the month, you can do a bit more work to normalise the dates:
select id,
count(distinct trunc(month, 'MM')),
min(trunc(month, 'MM')),
max(trunc(month, 'MM')),
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1 as diff
from your_table
group by id
order by id;
select id
from your_table
group by id
having count(distinct trunc(month, 'MM')) !=
months_between(max(trunc(month, 'MM')), min(trunc(month, 'MM'))) + 1
order by id;

Oracle Setup:
CREATE TABLE your_table ( ID, "DATE" ) AS
SELECT 123, DATE '2015-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-06-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-05-01' FROM DUAL UNION ALL
SELECT 123, DATE '2015-04-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 123, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-11-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-10-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 456, DATE '2014-03-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-09-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-08-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-07-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-06-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-05-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-04-01' FROM DUAL UNION ALL
SELECT 789, DATE '2014-03-01' FROM DUAL;
Query:
SELECT ID,
MIN( missing_date )
FROM (
SELECT ID,
CASE WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
= ADD_MONTHS( "DATE", 1 ) THEN NULL
WHEN LEAD( "DATE" ) OVER ( PARTITION BY ID ORDER BY "DATE" )
IS NULL THEN NULL
ELSE ADD_MONTHS( "DATE", 1 )
END AS missing_date
FROM your_table
)
GROUP BY ID
HAVING COUNT( missing_date ) > 0;
Output:
ID MIN(MISSING_DATE)
---------- -------------------
123 2014-10-01 00:00:00
456 2014-06-01 00:00:00

You could use a Lag() function to see if records have been skipped for a particular date or not.Lag() basically helps in comparing the data in current row with previous row. So if we order by DATE, we could easily compare and find any gaps.
select * from
(
select ID,DATE_, case when DATE_DIFF>1 then 1 else 0 end comparison from
(
select ID, DATE_ ,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;
This groups all the entries by id, and then arranges the records by date. If a customer is always present, there would not be a gap in his date. So anyone who has a date difference greater than 1 had a gap. You could tweak this as per your requirement.
EDIT : Just observed that you are storing data in mm/dd/yyyy format, when I closely observed above answers.You are storing only first date of every month. So, the above query can be tweaked as :
select * from
(
select ID,DATE_,PREV_DATE,last_day(PREV_DATE)+1 ABSENT_DATE, case when DATE_DIFF>31 then 1 else 0 end comparison from
(
select ID, DATE_ ,LAG(DATE_,1) OVER (PARTITION BY ID ORDER BY DATE_) PREV_DATE,DATE_-LAG(DATE_, 1) OVER (PARTITION BY ID ORDER BY DATE_) date_diff from trial
)
)
where comparison=1 order by ID,DATE_;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

partition over period of time for a given column SQL BigQuery - sql

Related

create time range with 2 columns date_time

Exclude duplicates and capture only changes

SQL Select only missing months

Identify contiguous and discontinuous date ranges

Finding missing dates in a sequence

Categories

Resources