suppose I have the following data frame in Reradata SQL.
How can I get the variation between the highest and lowest date, at user level? Regards
Initial table
user date price
1 1-1 10
1 2-1 20
1 3-1 30
2 1-1 12
2 2-1 22
2 3-1 32
3 1-1 13
3 2-1 23
3 3-1 33
Final table
user var_price
1 30/10-1
2 32/12-1
3 33/13-1
Try this-
SELECT B.[user],
CAST(SUM(B.max_price) AS VARCHAR)+'/'+CAST(SUM(B.min_price) AS VARCHAR)+ '-1' var_price,
SUM(B.max_price)/SUM(B.min_price) -1 calculated_var_price
FROM
(
SELECT * FROM
(
SELECT [user],0 max_price,price min_price,ROW_NUMBER() OVER (PARTITION BY [user] ORDER BY DATE) RN
FROM your_table
)A WHERE RN = 1
UNION ALL
SELECT * FROM
(
SELECT [user],price max_price,0 min_price, ROW_NUMBER() OVER (PARTITION BY [user] ORDER BY DATE DESC) RN
FROM your_table
)A WHERE RN = 1
)B
GROUP BY B.[user]
Output is-
user var_price calculated_var_price
1 30/10-1 2
2 32/12-1 1
3 33/13-1 1
Is this what you want?
select user, max(price) / min(price) - 1
from t
group by user;
Your values are monotonically increasing, so max() and min() seems like the simplest solution.
EDIT:
You can use window functions:
select user, max(last_price) / max(first_price) - 1
from (select t.*,
first_value(price) over (partition by user order by date rows between unbounded preceding and current_row) as first_price,
first_value(price) over (partition by user order by date desc rows between unbounded preceding and current_row) as last_price
from t
) t
group by user;
select user
,price as first_price
,last_value(price)
over (paritition by user
order by date
rows between unbounded preceding and unbounded following) as last_price
from mytab
qualify
row_number() -- lowest date only
over (paritition by user
order by date) = 1
This returns the row with the lowest date and adds the price of the latest date
Related
I have a table:
date
user_id
state
8/12/2021
1
visit
9/12/2021
1
registered
12/12/2021
1
order
In this table I only have updated of state of users, but I don't see the state by some particular date. How can I add rows with missing dates and fill them with previous value, so that the table will be:
date
user_id
state
8/12/2021
1
visit
9/12/2021
1
registered
10/12/2021
1
registered
11/12/2021
1
registered
12/12/2021
1
order
Here's one attempt. The cte user_dates gets min and max dates for each user that is then fed to generate_series. I.e. each user is associated with all dates between there first and last date.
In the inner select we create a group for each first_value and consecutive null states.
In the outer select we pick the first_value for each such grp.
with user_dates(f, t, user_id) as (
select min(T.dt), max(T.dt), user_id
from T
group by user_id
)
select user_id, dt, grp, first_value(state) over (partition by user_id, grp order by dt)
from (
select ud.user_id
, cal.dt::date
, state
, count(T.state) over (partition by user_id
order by cal.dt) as grp
from user_dates ud
cross join generate_series(ud.f::timestamp, ud.t::timestamp , interval '1 day') cal (dt)
left join T
using (dt, user_id)
) as tmp
order by user_id, dt
;
user_id dt grp first_value
1 2021-12-08 1 visit
1 2021-12-09 2 registered
1 2021-12-10 2 registered
1 2021-12-11 2 registered
1 2021-12-12 3 order
You can remove grp from the select, it's merely there for informative purposes.
Fiddle
I have a table with 3 columns id, start_date, end_date
Some of the values are as follows:
1 2018-01-01 2030-01-01
1 2017-10-01 2018-10-01
1 2019-01-01 2020-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2010-10-01 2010-12-01
2 2008-01-01 2009-01-01
I have the above kind of data set where I have to filter out overlap date range by keeping maximum datarange and keep the other date range which is not overlapping for a particular id.
Hence desired output should be:
1 2018-01-01 2030-01-01
1 2015-01-01 2016-01-01
2 2010-01-01 2011-02-01
2 2008-01-01 2009-01-01
I am unable to find the right way to code in impala. Can someone please help me.
I have tried like,
with cte as(
select a*, row_number() over(partition by id order by datediff(end_date , start_date) desc) as flag from mytable a) select * from cte where flag=1
but this will remove other date range which is not overlapping. Please help.
use row number with countItem for each id
with cte as(
select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable
)
select id,start_date,end_date
from cte
where seq = 1 or seq = countItem
or without cte
select id,start_date,end_date
from
(select *,
row_number() over(partition by id order by id) as seq,
count(*) over(partition by id order by id) as countItem
from mytable) t
where seq = 1 or seq = countItem
demo in db<>fiddle
You can use a cumulative max to see if there is any overlap with preceding rows. If there is not, then you have the first row of a new group (row in the result set).
A cumulative sum of the starts assigns each row in the source to a group. Then aggregate:
select id, min(start_date), max(end_date)
from (select t.*,
sum(case when prev_end_date >= start_date then 0 else 1 end) over
(partition by id
order by start_date
rows between unbounded preceding and current row
) as grp
from (select t.*,
max(end_date) over (partition by id
order by start_date
rows between unbounded preceding and 1 preceding
) as prev_end_date
from t
) t
) t
group by id, grp;
I have the following table,
id status price date
2 complete 10 2020-01-01 10:10:10
2 complete 20 2020-02-02 10:10:10
2 complete 10 2020-03-03 10:10:10
3 complete 10 2020-04-04 10:10:10
4 complete 10 2020-05-05 10:10:10
Required output,
id status_count price ratio
2 0 0 0
2 1 10 0
2 2 30 0.33
I am looking to add the price for previous row. Row 1 is 0 because it has no previous row value.
Find ratio ie 10/30=0.33
You can use analytical function ROW_NUMBER and SUM as follows:
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id ORDER BY date), 0) - price as price
FROM yourTable;
DB<>Fiddle demo
I think you want something like this:
SELECT
id,
COUNT(*) OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id
ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING), 0) price
FROM yourTable;
Demo
Please also check another method:
with cte
as(*,ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
SUM(price) OVER (PARTITION BY id ORDER BY date) ss from yourTable)
select id,status_count,isnull(ss,0)-price price
from cte
I have the following table to store history for entities:
Date Id State
-------------------------------------
2017-10-10 1 0
2017-10-12 1 4
2018-5-30 1 8
2019-4-1 2 0
2018-3-6 2 4
2018-3-7 2 0
I want to get last entry for each Id in one week period e.g.
Date Id State
-------------------------------------
2017-10-12 1 4
2018-5-30 1 8
2019-4-1 2 0
2018-3-7 2 0
I'd try to use Partition by:
select
ID
,Date
,State
,DatePart(week,Date) as weekNumber
from TableA
where Date = (
select max(Date) over (Partition by Id Order by DatePart(week, Date) Desc)
)
order by ID
but it still gives me more than one result per week.
You can use ROW_NUMBER():
SELECT a.*
FROM (SELECT a.*, ROW_NUMBER() OVER (PARTITION BY a.id, DATEPART(WK, a.Date) ORDER BY a.Date DESC) AS Seq
FROM tablea a
) a
WHERE seq = 1
ORDER BY id, Date;
I am trying to analyse a bunch of transaction data and have set up a series of different ranks to help me. The one I can't get right is the beneficiary rank. I want it to partition where there is a change in beneficiary chronologically rather than alphabetically.
Where the same beneficiary is paid from January to March and then again in June I would like the June to be classed a separate 'session'.
I am using Teradata SQL if that makes a difference.
I thought the solution was going to be a DENSE_RANK but if I PARTITION BY (CustomerID, Beneficiary) ORDER BY SystemDate it counts up the number of months. If I PARTITION BY (CustomerID) ORDER BY Beneficiary then it is not chronological, I need the highest rank to be the latest Beneficiary.
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
FROM table ORDER BY CustomerID, PaymentRank
CustomerID Beneficiary Amount DateStamp Month PaymentRank MonthRank BeneficiaryRank TransactionRank
a aa 10 Jan 1 1 1 1
a aa 20 Feb 2 2 2 1
a aa 20 Mar 3 3 3 2
a aa 20 Apr 4 4 4 3
a bb 20 May 5 5 1 1
a bb 30 Jun 6 6 2 1
a aa 30 Jul 7 7 5 2
a aa 30 Aug 8 8 6 1
a cc 5 Sep 9 9 1 1
a cc 5 Oct 10 10 2 2
a cc 5 Nov 11 11 3 3
b cc 5 Dec 1 1 1 1
This is what I have so far, I want a column alongside this which will look like the below
CustomerID Beneficiary Amount DateStamp Month NewRank
a aa 10 Jan 1
a aa 20 Feb 1
a aa 20 Mar 1
a aa 20 Apr 1
a bb 20 May 2
a bb 30 Jun 2
a aa 30 Jul 3
a aa 30 Aug 3
a cc 5 Sep 4
a cc 5 Oct 4
a cc 5 Nov 4
b cc 5 Dec 1
This is a type of gaps-and-islands problem. I would recommend lag() and a cumulative sum:
select t.*,
sum(case when prev_systemdate > systemdate - interval '1' month then 0 else 1 end) over (partition by customerid, beneficiary order by systemdate)
from (select t.*,
lag(systemdate) over (partition by customerid, beneficiary order by systemdate) as prev_systemdate
from t
) t
Credits to #Gordon and #dnoeth for providing the ideas and code to get me on the right track.
The below is mostly ripped from dnoeth but needed to add ROWS unbounded preceding to get the aggregation correct. Without this it was just showing the total for the partition. I also changed the systemdate to paymentrank as I had to fiddle about a bit with duplicate entries on a day.
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag) OVER(PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = MIN(Beneficiary) OVER (PARTITION BY CustomerID ORDER BY PaymentRank ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) THEN 0 ELSE 1 END AS Flag ) AS dt
ORDER BY CustomerID, PaymentRank
The inner query sets a flag whenever the beneficiary changes. The outer query then does a cumulative sum on those.
I was unsure what the unbounded preceding was doing and #dnoeth has a great explanation here Below is taken from that explanation.
•UNBOUNDED PRECEDING, all rows before the current row -> fixed
•UNBOUNDED FOLLOWING, all rows after the current row -> fixed
•x PRECEDING, x rows before the current row -> relative
•y FOLLOWING, y rows after the current row -> relative
SELECT dt.*,
-- now do a Cumulative Sum over those 0/1
SUM(flag)
OVER(PARTITION BY CustomerID
ORDER BY SystemDate ASC
,flag DESC -- needed if the order by columns are not unique
ROWS UNBOUNDED PRECEDING) AS NewRank
FROM
(
SELECT CustomerID, Beneficiary, Amount, SystemDate, Month
,RANK() OVER(PARTITION BY CustomerID ORDER BY SystemDate ASC) AS PaymentRank
,RANK() OVER(PARTITION BY CustomerID ORDER BY PaymentMonth ASC) AS MonthRank
,RANK() OVER(PARTITION BY CustomerID , Beneficiary ORDER BY SystemDate ASC) AS Beneficiary
,RANK() OVER(PARTITION BY CustomerID , Beneficiary, ROUND(TRNSCN_AMOUNT, 0) ORDER BY SYSTEM_DATE ASC) AS TransRank
-- assign a 0 if current & previous Beneficiary are the same, otherwise 1
,CASE WHEN Beneficiary = LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate) THEN 0 ELSE 1 END AS flag
FROM table
) AS dt
ORDER BY CustomerID, PaymentRank
Your problem with Gordon's query is probably caused by your Teradata release, LAG is only supported in 16.10+. But there's a simple workaround:
LAG(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate)
--is equivalent to
MIN(Beneficiary) OVER(PARTITION BY CustomerID ORDER BY SystemDate
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING))