Date period with breaks: make 3 rows out of 2 - sql

Edit: added an id, to make it more graspable
I stumbled over this problem a couple of times and always solved it per PL/SQL, but I am wondering, if there is a SQL-solution.
There is a table with a from_date and a to_date. The data in there is seamless for every to_date, there is a new row with a from_date on the next day.
create table test_date
(
id number,
from_date date,
to_date date
)
/
insert into test_date values(1, to_date('01022003', 'ddmmyyyy'), to_date('28022003', 'ddmmyyyy'))
/
insert into test_date values(2, to_date('01032003', 'ddmmyyyy'), to_date('31032003', 'ddmmyyyy'))
/
There is another table, which breaks this time periods.
create table test_date2
(
id number,
from_date date,
to_date date
)
/
insert into test_date2 values(3, to_date('05022003', 'ddmmyyyy'), to_date('10022003', 'ddmmyyyy'))
/
So, I want a view, that shows this time periods and the "breaks" in different columns, but this should also be seamless after the "break" with test_date2 it should go right on with the data in test_date and I can't get that going:
select typ, id, from_date, decode(typ, 1, decode(to_date+1, lead_from_date, to_date, lead_from_date-1), to_date) to_date
from(
select typ, id, from_date, to_date, lead(from_date) over (order by from_date, typ) lead_from_date
from
(select 1 typ, id, from_date, to_date
from test_date t
union all
select 2 typ, id, from_date, to_date
from test_date2 t2
) a
)
What I get here is
1 1 01/02/2003 04/02/2003
2 3 05/02/2003 10/02/2003
1 2 01/03/2003
the period between 11/02/2003 and 28/02/2003 (for the row in test_data with id=1) is missing.
So, what I want, is this:
1 1 01/02/2003 04/02/2003
2 3 05/02/2003 10/02/2003
1 1 11/02/2003 28/02/2003
1 2 01/03/2003

I think this is what you're after; your're not getting the same answer because you're not generating the full list of dates. If you normalise your data in order to get a unique list of dates you can then use LEAD() or LAG() to find the next/previous date, and re-generate your list.
I use UNPIVOT here to transform the from_date and to_date into a single column but 4 unions will provide the same result:
with all_tables as (
select *
from test_date
union all
select *
from test_date2
)
, all_dates as (
select dt
from all_tables
unpivot ( dt for dates in ( from_date, to_date ))
)
select dt
, lead(dt) over (order by dt) as to_date
from all_dates;
DT TO_DATE
---------- ----------
01/02/2003 05/02/2003
05/02/2003 10/02/2003
10/02/2003 28/02/2003
28/02/2003 01/03/2003
01/03/2003 31/03/2003
31/03/2003
6 rows selected.

Related

ORACLE SQL I need output in three column generation_name , date, total

this a table
expected output as
if you only want three columns use following statement:
SELECT generator_name, from_date Date, total from <your tablename>;
If this is not what you are searching, please give more details.
You are looking for a recursive query:
with cte(generator_name, from_date, to_date, total) as
(
select generator_name, from_date, to_date, total from mytable
union all
select generator_name, from_date + 1, to_date, total from cte where from_date < to_date
)
select generator_name, from_date as date, total
from cte
order by generator_name, date, total;

Reducing Series of Dates to minimal representation in BigQuery

If I have a table like:
start_date|end_date
1/1/2018|1/5/2018
1/4/2018|1/10/2018
1/9/2018|1/22/2018
2/1/2018|2/1/2018
1/31/2018|2/5/2018
And I want to get all the date ranges that are covered by these rows. So I would want something returned like:
1/1/2018|1/22/2018
1/31/2018|2/5/2018
Is there a function in BigQuery that can handle this?
There is no such function - but you can try something like below (BigQuery Standard SQL)
#standardSQL
WITH `project.dataset.table` AS (
SELECT '1/1/2018' start_date, '1/5/2018' end_date UNION ALL
SELECT '1/4/2018', '1/10/2018' UNION ALL
SELECT '1/9/2018', '1/22/2018' UNION ALL
SELECT '2/1/2018', '2/1/2018' UNION ALL
SELECT '1/31/2018', '2/5/2018'
), parsed_as_dates AS (
SELECT PARSE_DATE('%m/%d/%Y', start_date) start_date, PARSE_DATE('%m/%d/%Y', end_date) end_date
FROM `project.dataset.table`
), days AS (
SELECT day FROM
(SELECT MIN(start_date) min_date, MAX(end_date) max_date FROM parsed_as_dates),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
), temp AS (
SELECT day, SIGN(COUNTIF(day BETWEEN start_date AND end_date)) flag
FROM days CROSS JOIN parsed_as_dates GROUP BY day
)
SELECT MIN(day) start_date, MAX(day) end_date
FROM (
SELECT day, flag, SUM(start) OVER(ORDER BY day) grp
FROM (
SELECT day, flag, ABS(flag - IFNULL(LAG(flag) OVER(ORDER BY day), 0)) start
FROM temp
)
)
WHERE flag = 1
GROUP BY grp
-- ORDER BY start_date
with below result
Row start_date end_date
1 2018-01-01 2018-01-22
2 2018-01-31 2018-02-05
Just "quick" idea - you might want to refactor it a little - as it looks a little over-engineered to me :o) but at least does its work

SQL- how to retrieve by similar dates

Okay, so I have a table with a user_id column and a submitted_dtm column.
I want to find instances where users submitted multiple records within 1 day of each other, and count how many times that has happened.
I've tried something like
select * from table_t t where
(select count(*) from table_t t2 where
t.user_id = t2.user_id and
t.pk!=t2.pk and
t.submitted_dtm between t2.submitted_dtm-.5 and t2.submitted_dtm+.5)>0;
The problem is that this query returns a result for each record in a date group. Instead, I just want a result per date group. Ideally, I'd just get the count in that group.
That is, if I have 6 records:
user_id submitted_dtm
--------------------------
1 12/04/2017 1:15
1 12/04/2017 5:50
2 11/25/2017 2:00
2 11/25/2017 3:25
2 11/25/2017 6:05
2 10/06/2017 4:00
I want 2 results, a count of 2 and a count of 3.
Is it possible to do this in sql?
Following up on Dessma's answer.
select user_id, trunc(submitted_dtm), count(1)
from table_t
group by user_id, trunc(submitted_dtm)
having count(1) > 1;
Sqlfiddle
In Oracle 12.1 and higher, you can solve such problems easily with the match_recognize clause. Link to documentation (with examples) below; my only note about the solution below is that I left the date in DATE data type, especially important if the output is used in further computations. If it isn't, you can wrap within TO_CHAR() with whatever format model is appropriate for your users.
https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956
with
inputs ( user_id, submitted_dtm ) as (
select 1, to_date('12/04/2017 1:15', 'mm/dd/yyyy hh24:mi') from dual union all
select 1, to_date('12/04/2017 5:50', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('11/25/2017 2:00', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('11/25/2017 3:25', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('11/25/2017 6:05', 'mm/dd/yyyy hh24:mi') from dual union all
select 2, to_date('10/06/2017 4:00', 'mm/dd/yyyy hh24:mi') from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins below this line. Use your actual table and column names.
select user_id, submitted_dtm, cnt
from inputs
match_recognize(
partition by user_id
order by submitted_dtm
measures trunc(a.submitted_dtm) as submitted_dtm,
count(*) as cnt
pattern ( a b+ )
define b as trunc(submitted_dtm) = trunc(a.submitted_dtm)
);
USER_ID SUBMITTED_DTM CNT
---------- ------------------- ----------
1 2017-12-04 00:00:00 2
2 2017-11-25 00:00:00 3
I don't have data to test it but I suspect something like this would do the trick :
SELECT user_id,To_char(t.submitted_dtm, 'dd/mm/yyyy'), COUNT(*)
FROM table_t t
INNER JOIN table_t t2
ON t.user_id = t2.user_id
AND t.pk != t2.pk
AND t.submitted_dtm BETWEEN t2.submitted_dtm - .5 AND
t2.submitted_dtm + .5
GROUP BY user_id,To_char(t.submitted_dtm, 'dd/mm/yyyy')
HAVING COUNT(*) > 1
This is a general idea of how to get the instances.
select user_id, t1.submitted_dtm t1submitted, t2.submitted_dtm t2submtted
from table_t t1 join table_t t2 using (user_id)
where t2.submitted_dtm > t1.submitted_dtm
and t2.submitted_dtm - t1.submitted_dtm <= 1;
The last line could be modified somehow depending on what you mean by within a day.
To count the instances, create a derived table from the above and select count(*) from it.

Lowest continuous date without break

I have a table and each record has a date. We can assume that a date range is contiguous if there's not a 3 month break. How can I find the start of the most recent contiguous date range?
For example, imagine if I had this data:
1990-5-1
1990-6-4
1990-10-28
1990-11-14
1990-12-19
1991-1-20
1991-4-30
1991-5-13
I'd like for it to return 1991-4-30 because it's the start of the most recent contiguous range of dates.
I think this does what you're looking for. Using my own table and column names as test data. This is on Oracle.
select * from (
select * from sm_ss_tickets t1 where exists (
select * from sm_ss_tickets t2 where t2.created_date between t1.created_date and t1.created_date+90 and t1.rowid <> t2.rowid
) order by created_date asc
) where rownum = 1;
Maybe something like the following would work:
WITH d1 AS (
SELECT date'1990-05-01' AS dt FROM dual
UNION ALL
SELECT date'1990-06-04' AS dt FROM dual
UNION ALL
SELECT date'1990-10-28' AS dt FROM dual
UNION ALL
SELECT date'1990-11-14' AS dt FROM dual
UNION ALL
SELECT date'1990-12-19' AS dt FROM dual
UNION ALL
SELECT date'1991-01-20' AS dt FROM dual
UNION ALL
SELECT date'1991-04-30' AS dt FROM dual
UNION ALL
SELECT date'1991-05-13' AS dt FROM dual
)
SELECT MAX(dt) FROM (
SELECT dt, LAG(dt) OVER ( ORDER BY dt ) AS prev_dt, LEAD(dt) OVER ( ORDER BY dt ) AS next_dt
FROM d1
) WHERE ( dt > ADD_MONTHS(prev_dt, 3) OR prev_dt IS NULL )
AND dt > ADD_MONTHS(next_dt, -3)
In the above, a date can only be the start of a contiguous sequence if there is no prior date within 3 months (either it is more than three months ago or it doesn't exist at all) and there is also a subsequent date within 3 months.
You can use LAG and LEAD. Find the query below. I think it works fine.
tmp_year is the table I have created. tdate is the column.
The records in the table are
28-JAN-15
27-JAN-15
26-JAN-15
25-JAN-15
12-JUL-14
11-JUL-14
10-JUL-14
09-JUL-14
24-DEC-13
23-DEC-13
22-DEC-13
21-DEC-13
15-SEP-13
07-JUN-13
27-FEB-13
19-NOV-12
11-AUG-12
Please find the query which returns 25th Jan 2015.
select max(d.tdate) from (
select c.tdate,c.next_date,c.date_diff,lag(date_diff) over( order by tdate) prev_diff from (
select b.tdate ,b.next_date,(next_date-tdate) date_diff from
(select a.tdate,lead(a.tdate) over(order by a.tdate) next_date from tmp_year a ) b ) c) d where d.date_diff<90 and d.prev_diff>=90;

Check date split periods are continuous

I have data in an Ingres table something like this;
REF FROM_DATE TO_DATE
A 01.04.1997 01.04.1998
A 01.04.1998 27.05.1998
A 27.05.1998 01.04.1999
B 01.04.1997 01.04.1998
B 01.04.1998 26.07.1998
B 01.04.2012 01.04.2013
Some refs have continuous periods from the min(from_date) to the max(to_date), but some have gaps in the period.
I would like to know a way in Ingres SQL of identifying which refs have gaps in the date periods.
I am doing this as a Unix shell script calling the Ingres sql command.
Please advise.
I am not familiar with the date functions in Ingres. Let me assume that - gets the difference between two dates in days.
If there are no overlaps in the data, then you can do what you want pretty easily. If there are no gaps, then the difference between the minimum and maximum date is the same as the sum of the differences on each line. If the difference is greater than 0, then there are gaps.
So:
select ref,
((max(to_date) - min(from_date)) -
sum(to_date - from_date)
) as total_gaps
from t
group by ref;
I believe this will work in your case. In other cases, there might be an "off-by-1" problem, depending on whether or not the end date is included in the period.
This query works in SQL SERVER. PARTITION is a ANSI SQL command, I don't know if INGRES supports it. if partition is supported probably you would have an equivalent to Dense_Rank()
select *
INTO #TEMP
from (
select 'A' as Ref, Cast('1997-01-04' as DateTime) as From_date, Cast('1998-01-04' as DateTime) as to_date
union
select 'A' as Ref, Cast('1998-01-04' as DateTime) as From_date, Cast('1998-05-27' as DateTime) as to_date
union
select 'A' as Ref, Cast('1998-05-27' as DateTime) as From_date, Cast('1999-01-04' as DateTime) as to_date
union
select 'B' as Ref, Cast('1997-01-04' as DateTime) as From_date, Cast('1998-01-04' as DateTime) as to_date
union
select 'B' as Ref, Cast('1998-01-04' as DateTime) as From_date, Cast('1998-07-26' as DateTime) as to_date
union
select 'B' as Ref, Cast('2012-01-04' as DateTime) as From_date, Cast('2013-01-04' as DateTime) as to_date
) X
SELECT *
FROM
(
SELECT Ref, Min(NewStartDate) From_Date, MAX(To_Date) To_Date, COUNT(1) OVER (PARTITION BY Ref ) As [CountRanges]
FROM
(
SELECT Ref, From_Date, To_Date,
NewStartDate = Range_UNTIL_NULL.From_Date + NUMBERS.number,
NewStartDateGroup = DATEADD(d,
1 - DENSE_RANK() OVER (PARTITION BY Ref ORDER BY Range_UNTIL_NULL.From_Date + NUMBERS.number),
Range_UNTIL_NULL.From_Date + NUMBERS.number)
FROM
(
--This subquery is necesary needed to "expand the To_date" to the next day and allowing it to be null
SELECT
REF, From_date, DATEADD(d, 1, ISNULL(To_Date, From_Date)) AS to_date
FROM #Temp T1
WHERE
NOT EXISTS ( SELECT *
FROM #Temp t2
WHERE T1.Ref = T2.Ref and T1.From_Date > T2.From_Date AND T2.To_Date IS NULL
)
) AS Range_UNTIL_NULL
CROSS APPLY Enumerate ( ABS(DATEDIFF(d, From_Date, To_Date))) AS NUMBERS
) X
GROUP BY Ref, NewStartDateGroup
) OVERLAPED_RANGES_WITH_COUNT
-- WHERE OVERLAPED_RANGES_WITH_COUNT.CountRanges >= 2 --This filter is for identifying ranges that have at least one gap
ORDER BY Ref, From_Date
The result for the given example is:
Ref From_Date To_Date CountRanges
---- ----------------------- ----------------------- -----------
A 1997-01-04 00:00:00.000 1999-01-05 00:00:00.000 1
B 1997-01-04 00:00:00.000 1998-07-27 00:00:00.000 2
B 2012-01-04 00:00:00.000 2013-01-05 00:00:00.000 2
as you can see those ref having "CountRanges" > 1 have at least one gap
This answer goes far beyound the initial question, because:
Ranges can be overlaped, is not clear if in the initial question that can happen
The question only ask which refs have gaps but with this query you can list the gaps
Tis query allows To_date in null, representing a semi segment to the infinite