DB2 SQL Pairing Dates - sql

I am trying to pair up dates that I am getting from my SQL. The output at the moment looks something like this:
start_date end_date
2015-02-02 2015-02-02
2015-02-02 2015-02-03
2015-02-03 2015-02-03
2015-04-12 2015-02-12
I would like the ouput to be paired up so that the smallest and the biggest date of a date group is chosen, so that the output would look like this:
start_date end_date
2015-02-02 2015-02-03
2015-04-12 2015-02-12
Using the first response I get something like this, I believe I have formatted this wrong, I am getting the same date pairs as before, but it does run.
select min(date), max(date)
from (select date,
sum(case when sum(inc) = 0 then 1 else 0 end) over (order by date desc) as grp
from (select t1.datev as date, 1 as inc
from table2 t1,
table3 c,
table4 cr
where t1.datev between date(c.e_start_date) and date(c.e_end_date)
and t1.datev not in (select date(temp.datev) from mdmins11.temp temp where temp.number < 4000 and temp.organisation_id = 11111)
and c.tp_cd in (1,6)
and cr.from_id = c.id
and cr.organisation_id = 11111
union all
select t.datev as date, -1 as inc
from table1 t,
table3 c,
table4 cr
where t.datev between date(c.e_start_date) and date(c.e_end_date)
and t.datev not in (select date(temp.datev) from mdmins11.temp temp where temp.number < 4000 and temp.organisation_id = 11111)
and c.tp_cd in (1,6)
and cr.from_id = c.id
and cr.organisation_id = 11111
) t
group by date
) t
group by grp;

One method is to determine where groups of non-overlapping dates start. For this, you can use not exists. Then count up this flag over all records. This uses window functions. However, this poses problems because you have multiple starts on the same date.
Another method is to keep track of starts and stops and note where the sum is zero. These represent boundaries between groups. The following should work on your data:
select min(date), max(date)
from (select date,
sum(case when sum(inc) = 0 then 1 else 0 end) over (order by date desc) as grp
from (select start_date as date, 1 as inc
from table
union all
select end_date as date, -1 as inc
from table
) t
group by date
) t
group by grp;
This type of problem is made more complicated when duplicate values are allowed on a given date. Given only the dates, this is challenging. With a separate unique id for each row, then there are more robust solutions.
EDIT:
A more robust solution:
select min(start_date), max(end_date)
from (select t.*, sum(StartGroupFlag) over (order by start_date) as grp
from (select t.*,
(case when not exists (select 1
from table t2
where t2.start_date < t.start_date and
t2.end_date >= t.start_date
)
then 1 else 0
end) as StartGroupFlag
from table t
) t
) t
group by grp;

Related

Oracle SQL LAG() function results in duplicate rows

I have a very simple query that results in two rows:
SELECT DISTINCT
id,
trunc(start_date) start_date
FROM example.table
WHERE ID = 1
This results in the following rows:
id start_date
1 7/1/2012
1 9/1/2016
I want to add a column that simply shows the previous date for each row. So I'm using the following:
SELECT DISTINCT id,
Trunc(start_date) start_date,
Lag(start_date, 1)
over (
ORDER BY start_date) pdate
FROM example.table
WHERE id = 1
However, when I do this, I get four rows instead of two:
id start_date pdate
1 7/1/2012 NULL
1 7/1/2012 7/1/2012
1 9/1/2016 7/1/2012
1 9/1/2016 9/1/2012
If I change the offset to 2 or 3 the results remain the same. If I change the offset to 0, I get two rows again but of course now the start_date == pdate.
I can't figure out what's going on
Use an explicit GROUP BY instead:
SELECT id, trunc(start_date) as start_date,
LAG(trunc(start_date)) OVER (PARTITION BY id ORDER BY trunc(start_date))
FROM example.table
WHERE ID = 1
GROUP BY id, trunc(start_date)
The reason for this is: the order of execution of an SQL statements, is that LAG runs before the DISTINCT.
You actually want to run the LAG after the DISTINCT, so the right query should be:
WITH t1 AS (
SELECT DISTINCT id, trunc(start_date) start_date
FROM example.table
WHERE ID = 1
)
SELECT *, LAG(start_date, 1) OVER (ORDER BY start_date) pdate
FROM t1

SQL - get counts based on rolling window per unique id

I'm working with a table that has an id and date column. For each id, there's a 90-day window where multiple transactions can be made. The 90-day window starts when the first transaction is made and the clock resets once the 90 days are over. When the new 90-day window begins triggered by a new transaction I want to start the count from the beginning at one. I would like to generate something like this with the two additional columns (window and count) in SQL:
id date window count
name1 7/7/2019 first 1
name1 12/31/2019 second 1
name1 1/23/2020 second 2
name1 1/23/2020 second 3
name1 2/12/2020 second 4
name1 4/1/2020 third 1
name2 6/30/2019 first 1
name2 8/14/2019 first 2
I think getting the rank of the window can be done with a CASE statement and MIN(date) OVER (PARTITION BY id). This is what I have in mind for that:
CASE WHEN MIN(date) OVER (PARTITION BY id) THEN 'first'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 90 THEN 'first'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) > 90 AND DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 180 THEN 'third'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) > 180 AND DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 270 THEN 'fourth'
ELSE NULL END
And incrementing the counts within the windows would be ROW_NUMBER() OVER (PARTITION BY id, window)?
You cannot solve this problem with window functions only. You need to iterate through the dataset, which can be done with a recursive query:
with
tab as (
select t.*, row_number() over(partition by id order by date) rn
from mytable t
)
cte as (
select id, date, rn, date date0 from tab where rn = 1
union all
select t.id, t.date, t.rn, greatest(t.date, c.date + interval '90' day)
from cte c
inner join tab t on t.id = c.id and t.rn = c.rn + 1
)
select
id,
date,
dense_rank() over(partition by id order by date0) grp,
count(*) over(partition by id order by date0, date) cnt
from cte
The first query in the with clause ranks records having the same id by increasing date; then, the recursive query traverses the data set and computes the starting date of each group. The last step is numbering the groups and computing the window count.
GMB is totally correct that a recursive CTE is needed. I offer this as an alternative form for two reasons. First, because it uses SQL Server syntax, which appears to be the database being used in the question. Second, because it directly calculates window and count without window functions:
with t as (
select t.*, row_number() over (partition by id order by date) as seqnum
from tbl t
),
cte as (
select t.id, t.date, dateadd(day, 90, t.date) as window_end, 1 as window, 1 as count, seqnum
from t
where seqnum = 1
union all
select t.id, t.date,
(case when t.date > cte.window_end then dateadd(day, 90, t.date)
else cte.window_end
end) as window_end,
(case when t.date > cte.window_end then window + 1 else window end) as window,
(case when t.date > cte.window_end then 1 else cte.count + 1 end) as count,
t.seqnum
from cte join
t
on t.id = cte.id and
t.seqnum = cte.seqnum + 1
)
select id, date, window, count
from cte
order by 1, 2;
Here is a db<>fiddle.

How to select overlapping date ranges in SQL

I have a table with the following columns :
sID, start_date and end_date
Some of the values are as follows:
1 1995-07-28 2003-07-20
1 2003-07-21 2010-05-04
1 2010-05-03 2010-05-03
2 1960-01-01 2011-03-01
2 2011-03-02 2012-03-13
2 2012-03-12 2012-10-21
2 2012-10-22 2012-11-08
3 2003-07-23 2010-05-02
I only want the 2nd and 3rd rows in my result as they are the overlapping date ranges.
I tried this but it would not get rid of the first row. Not sure where I am going wrong?
select a.sID from table a
inner join table b
on a.sID = b.sID
and ((b.start_date between a.start_date and a.end_date)
and (b.end_date between a.start_date and b.end_date ))
order by end_date desc
I am trying to do in SQL Server
One way of doing this reasonably efficiently is
WITH T1
AS (SELECT *,
MAX(end_date) OVER (PARTITION BY sID ORDER BY start_date) AS max_end_date_so_far
FROM YourTable),
T2
AS (SELECT *,
range_start = IIF(start_date <= LAG(max_end_date_so_far) OVER (PARTITION BY sID ORDER BY start_date), 0, 1),
next_range_start = IIF(LEAD(start_date) OVER (PARTITION BY sID ORDER BY start_date) <= max_end_date_so_far, 0, 1)
FROM T1)
SELECT SId,
start_date,
end_date
FROM T2
WHERE 0 IN ( range_start, next_range_start );
if you have an index on (sID, start_date) INCLUDE (end_date) this can perform the work with a single ordered scan.
Your logic is not totally correct, although it almost works on your sample data. The specific reason it fails is because between includes the end points, so any given row matches itself. That said, the logic still isn't correct because it doesn't catch this situation:
a-------------a
b----b
Here is correct logic:
select a.*
from table a
where exists (select 1
from table b
where a.sid = b.sid and
a.start_date < b.end_date and
a.end_date > b.start_date and
(a.start_date <> b.start_date or -- filter out the record itself
a.end_date <> b.end_date
)
)
order by a.end_date;
The rule for overlapping time periods (or ranges of any sort) is that period 1 overlaps with period 2 when period 1 starts before period 2 ends and period 1 ends after period 2 starts. Happily, there is no need or use for between for this purpose. (I strongly discourage using between with date/time operands.)
I should note that this version does not consider two time periods to overlap when one ends on the same day another begins. That is easily adjusted by changing the < and > to <= and >=.
Here is a SQL Fiddle.

Alternative to Datediff over()?

I have data of this form:
user_id event started ended date
1 started 1 0 3/1/2018
1 ended 0 1 3/2/2018
2 started 1 0 3/5/2018
2 ended 0 1 3/22/2018
3 started 1 0 3/25/2018
There are other events and columns for 0/1 but they are irrelevant.
I am trying to get how long it takes each user to get from started to ended.
I tried datediff(day, case when started=1 then date end, case when ended=1 then date end) but since they are on different rows it doesnt work. Something along the lines of datediff over() could work, but that is obviously not a valid function.
Thanks in advance!
Assuming that you can't end before you started, you simply need MIN & MAX as Windowed Aggregates:
select user_id,
datediff(day,
min(date) over (partition by user_id),
max(date) over (partition by user_id))
from myTable
where event in ('started', 'ended')
Using this you can add any additional columns, too.
If one result row is also ok, you can do simple aggregation:
select user_id,
min(date) as started,
max(date) as ended,
datediff(day,
min(date),
max(date)) as duration
from myTable
where event in ('started', 'ended')
group by user_id
You could inner join the table on itself using the user_id column:
SELECT a.[user_id]
, a.[date] AS StartDate
, b.EndDate
, DATEDIFF(DAY, a.[date], b.EndDate) AS DateDifference
FROM dbo.TableNameHere AS a
INNER JOIN
(
SELECT [user_id]
, [date] AS EndDate
FROM dbo.TableNameHere
WHERE [ended] = 1
) AS b
ON a.[user_id] = b.[user_id]
WHERE a.[started] = 1
In my example above, you don't really need any of the columns in the first SELECT besides the DateDifference, I just had them for visibility in my testing.

Select rows where value is equal given value or lower and nearest to it

Sorry for confusing title. Please, tell, if it's possible to do via db request. Assume we have following table
ind_id name value date
----------- -------------------- ----------- ----------
1 a 10 2010-01-01
1 a 20 2010-01-02
1 a 30 2010-01-03
2 b 10 2010-01-01
2 b 20 2010-01-02
2 b 30 2010-01-03
2 b 40 2010-01-04
3 c 10 2010-01-01
3 c 20 2010-01-02
3 c 30 2010-01-03
3 c 40 2010-01-04
3 c 50 2010-01-05
4 d 10 2010-01-05
I need to query all rows to include each ind_id once for the given date, and if there's no ind_id for given date, then take the nearest lower date, if there's no any lower dates, then return ind_id + name (name/ind_id pairs are equal) with nulls.
For example, date is 2010-01-04, I expect following result:
ind_id name value date
----------- -------------------- ----------- ----------
1 a 30 2010-01-03
2 b 40 2010-01-04
3 c 40 2010-01-04
4 d NULL NULL
If it's possible, I'll be very grateful if someone help me with building query. I'm using SQL server 2008.
Check this SQL FIDDLE DEMO
with CTE_test
as
(
select int_id,
max(date) MaxDate
from test
where date<='2010-01-04 00:00:00:000'
group by int_id
)
select A.int_id, A.[Value], A.[Date]
from test A
inner join CTE_test B
on a.int_id=b.int_id
and a.date = b.Maxdate
union all
select int_id, null, null
from test
where int_id not in (select int_id from CTE_test)
(Updated) Try:
with cte as
(select m.*,
max(date) over (partition by ind_id) max_date,
max(case when date <= #date then date end) over
(partition by ind_id) max_acc_date
from myTable m)
select ind_id,
name,
case when max_acc_date is null then null else value end value,
max_acc_date date
from cte c
where date = coalesce(max_acc_date, max_date)
(SQLFiddle here)
Here is a query that returns the result that you are looking for:
SELECT
t1.ind_id
, CASE WHEN t1.date <= '2010-01-04' THEN t1.value ELSE null END
FROM test t1
WHERE t1.date=COALESCE(
(SELECT MAX(DATE)
FROM test t2
WHERE t2.ind_id=t1.ind_id AND t2.date <= '2010-01-04')
, t1.date)
The idea is to pick a row in a correlated query such that its ID matches that of the current row, and the date is the highest one prior to your target date of '2010-01-04'.
When such row does not exist, the date for the current row is returned. This date needs to be replaced with a null; this is what the CASE statement at the top is doing.
Here is a demo on sqlfiddle.
You can use something like:
declare #date date = '2010-01-04'
;with ids as
(
select distinct ind_id
from myTable
)
,ranks as
(
select *
, ranking = row_number() over (partition by ind_id order by date desc)
from myTable
where date <= #date
)
select ids.ind_id
, ranks.value
, ranks.date
from ids
left join ranks on ids.ind_id = ranks.ind_id and ranks.ranking = 1
SQL Fiddle with demo.
Ideally you wouldn't be using the DISTINCT statement to get the ind_id values to include, but I've used it in this case to get the results you needed.
Also, standard disclaimer for these sorts of queries; if you have duplicate data you should consider a tie-breaker column in the ORDER BY or using RANK instead of ROW_NUMBER.
Edited after OPs update
Just add the new column into the existing query:
with ids as
(
select distinct ind_id, name
from myTable
)
,ranks as
(
select *
, ranking = row_number() over (partition by ind_id order by date desc)
from myTable
where date <= #date
)
select ids.ind_id
, ids.name
, ranks.value
, ranks.date
from ids
left join ranks on ids.ind_id = ranks.ind_id and ranks.ranking = 1
SQL Fiddle with demo.
As with the previous one it would be best to get the ind_id/name information through joining to a standing data table if available.
Try
DECLARE #date DATETIME;
SET #date = '2010-01-04';
WITH temp1 AS
(
SELECT t.ind_id
, t.name
, CASE WHEN t.date <= #date THEN t.value ELSE NULL END AS value
, CASE WHEN t.date <= #date THEN t.date ELSE NULL END AS date
FROM test1 AS t
),
temp AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ind_id ORDER BY t.date DESC) AS rn
FROM temp1 AS t
WHERE t.date <= #date OR t.date IS NULL
)
SELECT *
FROM temp AS t
WHERE rn = 1
Use option with EXISTS operator
DECLARE #date date = '20100104'
SELECT ind_id,
CASE WHEN date <= #date THEN value END AS value,
CASE WHEN date <= #date THEN date END AS date
FROM dbo.test57 t
WHERE EXISTS (
SELECT 1
FROM dbo.test57 t2
WHERE t.ind_id = t2.ind_id AND t2.date <= #date
HAVING ISNULL(MAX(t2.date), t.date) = t.date
)
Demo on SQLFiddle
This is not the exact answer but will give you the concept as i just write it down quickly without any testing.
use
go
if
(Select value from table where col=#col1) is not null
--you code to get the match value
else if
(Select LOWER(Date) from table ) is not null
-- your query to get the nerst dtae record
else
--you query withh null value
end