Current record with group by function - sql

Trying to get userid recent aggregate value for session_id.
(session_id 3 has two records, recent agg value is 80.00
session_id 4 has four records, recent agg value is 95.00
session_id 6 has three records, recent agg value is 72.00
Table:session_agg
id session_id userid agg date
-- ---------- ------ ----- -------
1 3 11 60.00 1573561586
4 3 11 80.00 1573561586
6 4 11 35.00 1573561749
7 4 11 50.00 1573561751
8 4 11 70.00 1573561912
10 4 11 95.00 1573561921
11 6 14 40.00 1573561945
12 6 14 67.00 1573561967
13 6 14 72.00 1573561978
select id, session_id, userid, agg, date from session_agg
WHERE date IN (select MAX(date) from session_agg GROUP BY session_id) AND
userid = 11

If you want to stick with your current approach, then you need to correlate the session_id in the subquery which checks for the max date for each session:
SELECT id, session_id, userid, add, date
FROM session_agg sa1
WHERE
date = (SELECT MAX(date) FROM session_agg sa2 WHERE sa2.session_id = sa1.session_id) AND
userid = 11;
But, if your version of SQL supports analytic functions, ROW_NUMBER is an easier way to do this:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY date DESC) rn
FROM session_agg
)
SELECT id, session_id, userid, add, date
FROM cte
WHERE rn = 1;

Related

To write a Oracle stored procedure to get data between the months

I have a procedure sp_data_between_months (p_from_date DATE, p_to_date DATE) // example p_from_date = '01-jan-2021' and 'p_to_date' = '31-mar-2021'.
I need to get the latest record for the ID for each month, add these values, and populate against p_to_date for each ID from the below table using PLSQL.
Table Name: ID_Value
ID
Date
value
1
1-jan-2021
10
1
10-jan-2021
20
2
15-jan-2021
15
2
16-jan-2021
20
2
02-feb-2021
10
2
06-feb-2021
15
1
17-feb-2021
10
1
5-mar-2021
15
1
17-mar-2021
10
2
10-mar-2021
10
the expected output is to get the latest value for each ID for each month-end and the sum of its value between those months between the ranges.
Output: p_to_date ID Sum of latest record of value for each month
DATE
ID
VALUE
31-Mar-2021
1
40 //(20+10+10) sum of value oflatest record foreach month
31-Mar-2021
2
45 //(20+15+10)
Here you are. Read comments within code.
SQL> with
2 temp as
3 -- analytic function will return 1 for the latest row for that ID in that month
4 (select id, datum, value,
5 row_number() over (partition by id, trunc(datum, 'mm') order by datum desc) rn
6 from id_value
7 )
8 -- finally, select last day in MAX month and sum all values for RN = 1
9 select
10 id,
11 last_day(max(datum)) datum,
12 sum(value)
13 from temp
14 where rn = 1
15 group by id;
ID DATUM SUM(VALUE)
---------- ----------- ----------
1 31-mar-2021 40
2 31-mar-2021 45
SQL>

SQL query to find continuous local max, min of date based on category column

I have the following data set
Customer_ID Category FROM_DATE TO_DATE
1 5 1/1/2000 12/31/2001
1 6 1/1/2002 12/31/2003
1 5 1/1/2004 12/31/2005
2 7 1/1/2010 12/31/2011
2 7 1/1/2012 12/31/2013
2 5 1/1/2014 12/31/2015
3 7 1/1/2010 12/31/2011
3 7 1/5/2012 12/31/2013
3 5 1/1/2014 12/31/2015
The result I want to achieve is to find continuous local min/max date for Customers with the same category and identify any gap in dates:
Customer_ID FROM_Date TO_Date Category
1 1/1/2000 12/31/2001 5
1 1/1/2002 12/31/2003 6
1 1/1/2004 12/31/2005 5
2 1/1/2010 12/31/2013 7
2 1/1/2014 12/31/2015 5
3 1/1/2010 12/31/2011 7
3 1/5/2012 12/31/2013 7
3 1/1/2014 12/31/2015 5
My code works fine for customer 1 (return all 3 rows) and customer 2(return 2 rows with min and max date for each category) but for customer 3, it cannot identify the gap between 12/31/2011 and 1/5/2012 for category 7.
Customer_ID FROM_Date TO_Date Category
3 1/1/2010 12/31/2013 7
3 1/1/2014 12/31/2015 5
Here is my code:
SELECT Customer_ID, Category, min(From_Date), max(To_Date) FROM
(
SELECT Customer_ID, Category, From_Date,To_Date
,row_number() over (order by member_id, To_Date) - row_number() over (partition by Customer_ID order by Category) as p
FROM FFS_SAMP
) X
group by Customer_ID,Category,p
order by Customer_ID,min(From_Date),Max(To_Date)
This is a type of gaps and islands problem. Probably the safest method is to use a cumulative max() to look for overlaps with previous records. Where there is no overlap, then an "island" of records starts. So:
select customer_id, min(from_date), max(to_date), category
from (select t.*,
sum(case when prev_to_date >= from_date then 0 else 1 end) over
(partition by customer_id, category
order by from_date
) as grp
from (select t.*,
max(to_date) over (partition by customer_id, category
order by from_date
rows between unbounded preceding and 1 preceding
) as prev_to_date
from t
) t
) t
group by customer_id, category, grp;
Your attempt is quite close. You just need to fix the over() clause of the window functions:
select customer_id, category, min(from_date), max(to_date)
from (
select
fs.*,
row_number() over (partition by customer_id order from_date)
- row_number() over (partition by customer_id, category order by from_date) as grp
from ffs_samp fs
) x
group by customer_id, category, grp
order by customer_id, min(from_date)
Note that this method assumes no gaps or overlalp in the periods of a given customer, as show in your sample data.

Estimation of Cumulative value every 3 months in SQL

I have a table like this:
ID Date Prod
1 1/1/2009 5
1 2/1/2009 5
1 3/1/2009 5
1 4/1/2009 5
1 5/1/2009 5
1 6/1/2009 5
1 7/1/2009 5
1 8/1/2009 5
1 9/1/2009 5
And I need to get the following result:
ID Date Prod CumProd
1 2009/03/01 5 15 ---Each 3 months
1 2009/06/01 5 30 ---Each 3 months
1 2009/09/01 5 45 ---Each 3 months
What could be the best approach to take in SQL?
You can try the below - using window function
DEMO Here
select * from
(
select *,sum(prod) over(order by DATEPART(qq,dateval)) as cum_sum,
row_number() over(partition by DATEPART(qq,dateval) order by dateval) as rn
from t
)A where rn=1
How about just filtering on the month number?
select t.*
from (select id, date, prod, sum(prod) over (partition by id order by date) as running_prod
from t
) t
where month(date) in (3, 6, 9, 12);

Select rows based on criteria from within group

I have the following table:
pk_positions ass_pos_id underlying entry_date
1 1 abc 2016-03-14
2 1 xyz 2016-03-17
3 tlt 2016-03-18
4 4 ujf 2016-03-21
5 4 dks 2016-03-23
6 4 dqp 2016-03-26
I need to select one row per ass_pos_id which has the earliest entry_date. Rows which do not have a value for ass_pos_id are not included.
In other words, for each non null ass_pos_id group, select the row which has the earliest entry_date
The following is the desired result:
pk_positions ass_pos_id underlying entry_date
1 1 abc 2016-03-14
4 4 ujf 2016-03-21
You could use the row_number window function:
SELECT pk_positions, ass_pos_id, underlying, entry_date
FROM (SELECT pk_positions, ass_pos_id, underlying, entry_date,
ROW_NUMBER() OVER (PARTITION BY ass_pos_id
ORDER BY entry_date ASC) rn
FROM mytable
WHERE ass_pos_id IS NOT NULL) t
WHERE rn = 1

window function in redshift

I have some data that looks like this:
CustID EventID TimeStamp
1 17 1/1/15 13:23
1 17 1/1/15 14:32
1 13 1/1/25 14:54
1 13 1/3/15 1:34
1 17 1/5/15 2:54
1 1 1/5/15 3:00
2 17 2/5/15 9:12
2 17 2/5/15 9:18
2 1 2/5/15 10:02
2 13 2/8/15 7:43
2 13 2/8/15 7:50
2 1 2/8/15 8:00
I'm trying to use the row_number function to get it to look like this:
CustID EventID TimeStamp SeqNum
1 17 1/1/15 13:23 1
1 17 1/1/15 14:32 1
1 13 1/1/25 14:54 2
1 13 1/3/15 1:34 2
1 17 1/5/15 2:54 3
1 1 1/5/15 3:00 4
2 17 2/5/15 9:12 1
2 17 2/5/15 9:18 1
2 1 2/5/15 10:02 2
2 13 2/8/15 7:43 3
2 13 2/8/15 7:50 3
2 1 2/8/15 8:00 4
I tried this:
row_number () over
(partition by custID, EventID
order by custID, TimeStamp asc) SeqNum]
but got this back:
CustID EventID TimeStamp SeqNum
1 17 1/1/15 13:23 1
1 17 1/1/15 14:32 2
1 13 1/1/25 14:54 3
1 13 1/3/15 1:34 4
1 17 1/5/15 2:54 5
1 1 1/5/15 3:00 6
2 17 2/5/15 9:12 1
2 17 2/5/15 9:18 2
2 1 2/5/15 10:02 3
2 13 2/8/15 7:43 4
2 13 2/8/15 7:50 5
2 1 2/8/15 8:00 6
how can I get it to sequence based on the change in the EventID?
This is tricky. You need a multi-step process. You need to identify the groups (a difference of row_number() works for this). Then, assign an increasing constant to each group. And then use dense_rank():
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
min(timestamp) over (partition by custid, eventid, grp) as mints
from (select sd.*,
(row_number() over (partition by custid order by timestamp) -
row_number() over (partition by custid, eventid order by timestamp)
) as grp
from somedata sd
) sd
) sd;
Another method is to use lag() and a cumulative sum:
select sd.*,
sum(case when prev_eventid is null or prev_eventid <> eventid
then 1 else 0 end) over (partition by custid order by timestamp
) as seqnum
from (select sd.*,
lag(eventid) over (partition by custid order by timestamp) as prev_eventid
from somedata sd
) sd;
EDIT:
The last time I used Amazon Redshift it didn't have row_number(). You can do:
select sd.*, dense_rank() over (partition by custid order by mints) as seqnum
from (select sd.*,
min(timestamp) over (partition by custid, eventid, grp) as mints
from (select sd.*,
(row_number() over (partition by custid order by timestamp rows between unbounded preceding and current row) -
row_number() over (partition by custid, eventid order by timestamp rows between unbounded preceding and current row)
) as grp
from somedata sd
) sd
) sd;
Try this code block:
WITH by_day
AS (SELECT
*,
ts::date AS login_day
FROM table_name)
SELECT
*,
login_day,
FIRST_VALUE(login_day) OVER (PARTITION BY userid ORDER BY login_day , userid rows unbounded preceding) AS first_day
FROM by_day