Need to get maximum date range which is overlapping in SQL - sql

I have a table with 3 columns id, start_date, end_date
Some of the values are as follows:
1 2018-01-01 2030-01-01
1 2017-10-01 2018-10-01
1 2019-01-01 2020-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2010-10-01 2010-12-01
2 2008-01-01 2009-01-01
I have the above kind of data set where I have to filter out overlap date range by keeping maximum datarange and keep the other date range which is not overlapping for a particular id.
Hence desired output should be:
1 2018-01-01 2030-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2008-01-01 2009-01-01
I am unable to find the right way to code in impala. Can someone please help me.
I have tried like,
with cte as(
select a*, row_number() over(partition by id order by datediff(end_date , start_date) desc) as flag from mytable a) select * from cte where flag=1
but this will remove other date range which is not overlapping. Please help.

use row number with countItem for each id
with cte as(
select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable
)
select id,start_date,end_date
from cte
where seq = 1 or seq = countItem
or without cte
select id,start_date,end_date
from
(select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable) t
where seq = 1 or seq = countItem
demo in db<>fiddle

You can use a cumulative max to see if there is any overlap with preceding rows. If there is not, then you have the first row of a new group (row in the result set).
A cumulative sum of the starts assigns each row in the source to a group. Then aggregate:
select id, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date >= start_date then 0 else 1 end) over
(partition by id
order by start_date
rows between unbounded preceding and current row
) as grp
from (select t.*,
max(end_date) over (partition by id
order by start_date
rows between unbounded preceding and 1 preceding
) as prev_end_date
from t
) t
) t
group by id, grp;

Related

How to cross join but using latest value in BIGQUERY

I have this table below
date
id
value
2021-01-01
1
3
2021-01-04
1
5
2021-01-05
1
10
And I expect output like this, where the date column is always increase daily and value column will generate the last value on an id
date
id
value
2021-01-01
1
3
2021-01-02
1
3
2021-01-03
1
3
2021-01-04
1
5
2021-01-05
1
10
2021-01-06
1
10
I think I can use cross join but I can't get my expected output and think that there are a special syntax/logic to solve this
Consider below approach
select * from `project.dataset.table`
union all
select missing_date, prev_row.id, prev_row.value
from (
select *, lag(t) over(partition by id order by date) prev_row
from `project.dataset.table` t
), unnest(generate_date_array(prev_row.date + 1, date - 1)) missing_date
I would write this using:
select dte, t.id, t.value
from (select t.*,
lead(date, 1, date '2021-01-06') over (partition by id order by date) as next_day
from `table` t
) t cross join
unnest(generate_date_array(
date,
ifnull(
date_add(next_date, interval -1 day), -- generate missing date rows
(select max(date) from `table`) -- add last row
)
)) dte;
Note that this requires neither union all nor window function to fill in the values.
alternative solution using last_value. You may explore the following query and customize your logic to generate days (if needed)
WITH
query AS (
SELECT
date,
id,
value
FROM
`mydataset.newtable`
ORDER BY
date ),
generated_days AS (
SELECT
day
FROM (
SELECT
MIN(date) min_dt,
MAX(date) max_dt
FROM
query),
UNNEST(GENERATE_DATE_ARRAY(min_dt, max_dt)) day )
SELECT
g.day,
LAST_VALUE(q.id IGNORE NULLS) OVER(ORDER BY g.day) id,
LAST_VALUE(q.value IGNORE NULLS) OVER(ORDER BY g.day) value,
FROM
generated_days g
LEFT OUTER JOIN
query q
ON
g.day = q.date
ORDER BY
g.day

How to use SQL to get column count for a previous date?

I have the following table,
id status price date
2 complete 10 2020-01-01 10:10:10
2 complete 20 2020-02-02 10:10:10
2 complete 10 2020-03-03 10:10:10
3 complete 10 2020-04-04 10:10:10
4 complete 10 2020-05-05 10:10:10
Required output,
id status_count price ratio
2 0 0 0
2 1 10 0
2 2 30 0.33
I am looking to add the price for previous row. Row 1 is 0 because it has no previous row value.
Find ratio ie 10/30=0.33
You can use analytical function ROW_NUMBER and SUM as follows:
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id ORDER BY date), 0) - price as price
FROM yourTable;
DB<>Fiddle demo
I think you want something like this:
SELECT
id,
COUNT(*) OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id
ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING), 0) price
FROM yourTable;
Demo
Please also check another method:
with cte
as(*,ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
SUM(price) OVER (PARTITION BY id ORDER BY date) ss from yourTable)
select id,status_count,isnull(ss,0)-price price
from cte

Get latest entry in each week over several years period

I have the following table to store history for entities:
Date Id State
-------------------------------------
2017-10-10 1 0
2017-10-12 1 4
2018-5-30 1 8
2019-4-1 2 0
2018-3-6 2 4
2018-3-7 2 0
I want to get last entry for each Id in one week period e.g.
Date Id State
-------------------------------------
2017-10-12 1 4
2018-5-30 1 8
2019-4-1 2 0
2018-3-7 2 0
I'd try to use Partition by:
select
ID
,Date
,State
,DatePart(week,Date) as weekNumber
from TableA
where Date = (
select max(Date) over (Partition by Id Order by DatePart(week, Date) Desc)
)
order by ID
but it still gives me more than one result per week.
You can use ROW_NUMBER():
SELECT a.*
FROM (SELECT a.*, ROW_NUMBER() OVER (PARTITION BY a.id, DATEPART(WK, a.Date) ORDER BY a.Date DESC) AS Seq
FROM tablea a
) a
WHERE seq = 1
ORDER BY id, Date;

Teradara SQL - Operation with max-min dates

suppose I have the following data frame in Reradata SQL.
How can I get the variation between the highest and lowest date, at user level? Regards
Initial table
user date price
1 1-1 10
1 2-1 20
1 3-1 30
2 1-1 12
2 2-1 22
2 3-1 32
3 1-1 13
3 2-1 23
3 3-1 33
Final table
user var_price
1 30/10-1
2 32/12-1
3 33/13-1
Try this-
SELECT B.[user],
CAST(SUM(B.max_price) AS VARCHAR)+'/'+CAST(SUM(B.min_price) AS VARCHAR)+ '-1' var_price,
SUM(B.max_price)/SUM(B.min_price) -1 calculated_var_price
FROM
(
SELECT * FROM
(
SELECT [user],0 max_price,price min_price,ROW_NUMBER() OVER (PARTITION BY [user] ORDER BY DATE) RN
FROM your_table
)A WHERE RN = 1
UNION ALL
SELECT * FROM
(
SELECT [user],price max_price,0 min_price, ROW_NUMBER() OVER (PARTITION BY [user] ORDER BY DATE DESC) RN
FROM your_table
)A WHERE RN = 1
)B
GROUP BY B.[user]
Output is-
user var_price calculated_var_price
1 30/10-1 2
2 32/12-1 1
3 33/13-1 1
Is this what you want?
select user, max(price) / min(price) - 1
from t
group by user;
Your values are monotonically increasing, so max() and min() seems like the simplest solution.
EDIT:
You can use window functions:
select user, max(last_price) / max(first_price) - 1
from (select t.*,
first_value(price) over (partition by user order by date rows between unbounded preceding and current_row) as first_price,
first_value(price) over (partition by user order by date desc rows between unbounded preceding and current_row) as last_price
from t
) t
group by user;
select user
,price as first_price
,last_value(price)
over (paritition by user
order by date
rows between unbounded preceding and unbounded following) as last_price
from mytab
qualify
row_number() -- lowest date only
over (paritition by user
order by date) = 1
This returns the row with the lowest date and adds the price of the latest date

SQL partition by on date range

Assume this is my table:
ID NUMBER DATE
------------------------
1 45 2018-01-01
2 45 2018-01-02
2 45 2018-01-27
I need to separate using partition by and row_number where the difference between one date and another is greater than 5 days. Something like this would be the result of the above example:
ROWNUMBER ID NUMBER DATE
-----------------------------
1 1 45 2018-01-01
2 2 45 2018-01-02
1 3 45 2018-01-27
My actual query is something like this:
SELECT ROW_NUMBER() OVER(PARTITION BY NUMBER ODER BY ID DESC) AS ROWNUMBER, ...
But as you can notice, it doesn't work for the dates. How can I achieve that?
You can use lag function :
select *, row_number() over (partition by number, grp order by id) as [ROWNUMBER]
from (select *, (case when datediff(day, lag(date,1,date) over (partition by number order by id), date) <= 1
then 1 else 2
end) as grp
from table
) t;
by using lag and datediff funtion
select * from
(
select t.*,
datediff(day,
lag(DATE) over (partition by NUMBER order by id),
DATE
) as diff
from t
) as TT where diff>5
http://sqlfiddle.com/#!18/130ae/11
I think you want to identify the groups, using lag() and datediff() and a cumulative sum. Then use row_number():
select t.*,
row_number() over (partition by number, grp order by date) as rownumber
from (select t.*,
sum(grp_start) over (partition by number order by date) as grp
from (select t.*,
(case when lag(date) over (partition by number order by date) < dateadd(day, 5, date)
then 1 else 0
end) as grp_start
from t
) t
) t;