SQL Window Function - Number of Rows since last Max - sql

I am trying to create a SQL query that will pull the number of rows since the last maximum value within a windows function over the last 5 rows. In the example below it would return 2 for row 8. The max value is 12 which is 2 rows from row 8.
For row 6 it would return 5 because the max value of 7 is 5 rows away.
|ID | Date | Amount
| 1 | 1/1/2019 | 7
| 2 | 1/2/2019 | 3
| 3 | 1/3/2019 | 4
| 4 | 1/4/2019 | 1
| 5 | 1/5/2019 | 1
| 6 | 1/6/2019 | 12
| 7 | 1/7/2019 | 2
| 8 | 1/8/2019 | 4
I tried the following:
SELECT ID, date, MAX(amount)
OVER (ORDER BY date ASC ROWS 5 PRECEDING) mymax
FROM tbl
This gets me to the max values but I am unable to efficiently determine how many rows away it is. I was able to get close using multiple variables within the SELECT but this did not seem efficient or scalable.

You can calculate the cumulative maximum and then use row_number() on that.
So:
select t.*,
row_number() over (partition by running_max order by date) as rows_since_last_max
from (select t.*,
max(amount) over (order by date rows between 5 preceding and current row) as running_max
from tbl t
) t;
I think this works for your sample data. It might not work if you have duplicates.
In that case, you can use date arithmetic:
select t.*,
datediff(day,
max(date) over (partition by running_max order by date),
date
) as days_since_most_recent_max5
from (select t.*,
max(amount) over (order by date rows between 5 preceding and current row) as running_max
from tbl t
) t;
EDIT:
Here is an example using row number:
select t.*,
(seqnum - max(case when amount = running_amount then seqnum end) over (partition by running_max order by date)) as rows_since_most_recent_max5
from (select t.*,
max(amount) over (order by date rows between 5 preceding and current row) as running_max,
row_number() over (order by date) as seqnum
from tbl t
) t;

It would be :
select *,ID-
(
SELECT ID
FROM
(
SELECT
ID,amount,
Maxamount =q.mymax
FROM
Table_4
) AS derived
WHERE
amount = Maxamount
) as result
from (
SELECT ID, date,
MAX(amount)
OVER (ORDER BY date ASC ROWS 5 PRECEDING) mymax
FROM Table_4
)as q

Related

SQL getting top 2 rows by date per PolicyId but with distinct dates

ValId | PolicyId | Date | Value
------+----------+------------+-------
1 | 11 | 2020-06-01 | 2000
2 | 11 | 2020-06-03 | 3000
3 | 11 | 2020-06-03 | 4000
4 | 12 | 2020-06-02 | 8000
5 | 12 | 2020-06-03 | 8500
I wanted to get top 2 latest Val rows for each PolicyId but they cannot be from the same date.
Rows for PolicyId = 12 are returned correctly - ValId 4 and 5.
For PolicyId = 11, rows with ValId 2 and 3 are returned but as they are on the same date I wanted row of ValId 1 to be returned instead of ValId 2.
SELECT
V.ValId, V.PolicyId, V.Value, V.Date
FROM
(SELECT
ValId, PolicyId, Value, Date,
ROW_NUMBER() OVER (PARTITION BY PolicyId ORDER BY Date Desc, ValId DESC) AS RowNum
FROM
TVal) V
WHERE
RowNum <= 2
You can enumerate the rows by dates and within dates:
select t.*
from (select t.*,
dense_rank() over (partition by policyid order by date desc valId desc) as seqnum,
rank() over (partition by policyid, date order by valId desc) as seqnum_within_date
from tval
) t
where seqnum <= 2 and seqnum_within_date = 1;
Using the suggestion from Gordon Linoff I was able to complete the sql as below
Select v.* from
(
select t.*,
row_number() over (partition by policyid order by date desc valId desc) as seqnum,
from (select t.*
dense_rank() over (partition by policyid, date order by valId desc) as seqnum_within_date
from tval
) t where seqnum_within_date = 1
)v where seqnum <= 2

SQL The largest number of consecutive values for each value

I have Tabel MatchResults
id | player_win_id
------------------
1 | 1
2 | 1
3 | 3
4 | 1
5 | 2
6 | 3
7 | 3
8 | 1
9 | 1
10 | 1
I need to find out for each player ID the highest number of consecutive victories. I use MS SQL Server.
Expected Result
PLAYER_ID | WIN_COUNT
------------------
1 | 3
2 | 1
3 | 2
This is a type of gaps-and-islands problem. One solution uses the difference of row numbers. So, to get all streaks:
select player_win_id, count(*)
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by player_win_id order by id) as seqnum_p
from MatchResults t
) t
group by player_win_id, (seqnum - seqnum_p);
Why this works is a little tricky to explain. But if you look at the results of the subquery, you'll probably see how the difference between the row number values captures adjacent rows with the same player win id.
For the maximum, the simplest is probably just an aggregation query:
select player_win_id, max(cnt)
from (select player_win_id, count(*) as cnt
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by player_win_id order by id) as seqnum_p
from MatchResults t
) t
group by player_win_id, (seqnum - seqnum_p)
) p
group by player_win_id;
Now I understand the previous comment. The code for my table is:
select player_win_id, max(cnt)
from (select player_win_id, count(*) as cnt
from (select *,
row_number() over (order by id) as seqnum,
row_number() over (partition by player_win_id order by id) as seqnum_p
from MatchResults ) t
group by player_win_id, (seqnum - seqnum_p)
) p
group by player_win_id;

sql, big query: aggregate all entries between two strings in a variable

I have to solve this problem within bigQuery. I have this column in my table:
event | time
_________________|____________________
start | 1
end | 2
random_event_X | 3
start | 4
error_X | 5
error_Y | 6
end | 7
start | 8
error_F | 9
start | 10
random_event_Y | 11
error_z | 12
end | 13
I would like to, from the end event record everything until start appear and then count it. Everything can happen between start and end and outside of it. If there is an end, there is a start, but if there is a start, there is not necessarily an end.
The desire output would be like:
string_agg | count
"start, end" | 1
"start, error_X, error_Y, end" | 1
"start, random_event_Y error_Z, end" | 1
So everything between each start and end if start has an end. So without the random_event_X at time 3, the start at time 8 or the error_F at time 9.
I was not able to find the solution and have struggle understanding how to approach this problem. Any help or advice is welcome.
Below is for BigQuery Standard SQL
#standardSQL
SELECT agg_events, COUNT(1) cnt
FROM (
SELECT STRING_AGG(event ORDER BY time) agg_events, COUNTIF(event IN ('start', 'end')) flag
FROM (
SELECT *, COUNTIF(event = 'start') OVER(PARTITION BY grp1 ORDER BY time) grp2
FROM (
SELECT *, COUNTIF(event = 'end') OVER(ORDER BY time DESC) grp1
FROM `project.dataset.table`
)
)
GROUP BY grp1, grp2
)
WHERE flag = 2
GROUP BY agg_events
If to apply to sample data from your question - result is
Row agg_events cnt
1 start,random_event_Y,error_z,end 1
2 start,error_X,error_Y,end 1
3 start,end 1
SQL tables represent unordered sets -- this is particularly true in massively parallel, columnar databases such as BigQuery.
So, I have to assume that you have some other column that specifies the ordering. If so, you can use a cumulative sum to identify the groups and then aggregation:
select grp,
string_agg(event, ',' order by time)
from (select t.*,
countif(event = 'start') over (order by time) as grp
from t
) t
group by grp
order by min(time);
Note: I would also advise you to use array_agg() instead of string_agg(). Arrays are generally easier to work with than strings.
EDIT:
I see, you only want up to end. In that case, another level of window funtions:
select grp,
string_agg(event, ',' order by <ordering col>)
from (select t.*,
max(case when event = 'end' then time end) over (partition by grp) as max_end_time
from (select t.*,
countif(event = 'start') over (order by <ordering col>) as grp
from t
) t
) t
where max_end_time is null or time <= max_end_time
group by grp
order by min(<ordering col>);

Sql query to Count Total Consecutive Years from latest year

I have a table Temp:
CREATE TABLE Temp
(
[ID] [int],
[Year] [INT],
)
**ID Year**
1 2016
1 2016
1 2015
1 2012
1 2011
1 2010
2 2016
2 2015
2 2014
2 2012
2 2011
2 2010
2 2009
3 2016
3 2015
3 2004
3 1999
4 2016
4 2015
4 2014
4 2010
5 2016
5 2014
5 2013
I want to calculate the total consecutive years starting from the most recent Year.
Result should look like this:
ID Total Consecutive Yrs
1 2
2 3
3 2
4 3
5 1
select ID,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year +1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
e.g. for ID=1:
1 2016 1 1
1 2015 2 2
1 2012 5 3
1 2011 6 4
1 2010 7 5
As long as there's no gap, both sequences increase the same.
Now check for equal sequences and count the rows:
with cte as
(
select ID,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year + 1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
)
select ID, count(*)
from cte
where x = rn -- no gap
group by ID
Edit:
Based on your year zero comment:
with cte as
(
select ID, year,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year + 1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
)
select ID,
-- remove the year zero from counting
sum(case when year <> 0 then 1 else 0 end)
from cte
where x = rn
group by ID
You can use lead and get this counts as below:
Select top (1) with ties Id, RowN as [Total Consecutive Years] from (
Select *, Num = case when ([year]- lead(year) over(partition by Id order by [Year] desc) > 1) then 0 else 1 end
, RowN = Row_Number() over (partition by Id order by [Year] desc)
from temp
) a
where a.Num = 0
order by row_number() over(partition by Id order by RowN)
Output as below:
+----+-------------------------+
| Id | Total Consecutive Years |
+----+-------------------------+
| 1 | 2 |
| 2 | 3 |
| 3 | 2 |
| 4 | 3 |
| 5 | 1 |
+----+-------------------------+
You can do this using window functions:
select id, count(distinct year)
from (select t.*,
dense_rank() over (partition by id order by year + seqnum desc) as grp
from (select t.*,
dense_rank() over (partition by id order by year desc) as seqnum
from temp t
) t
) t
where grp = 1
group by id;
This assumes that "most recent year" is per id.
Gordon Linoff,
Your code is awesome!
Your code pulls consecutive years from the most recent year.
I modified it to pull overall max consecutive years.
Posted here in case anyone else needs it:
--overall max consecutive years
select id,max(yr_cnt) max_consecutive_years
from (
select id, grp,count(seqnum) yr_cnt
from (select t.*,
dense_rank() over (partition by id order by year + seqnum desc) as grp
from (select t.*,
dense_rank() over (partition by id order by year desc) as seqnum
from temp t
) t
) t
group by id,grp) t2
group by id;

Count of duplicate values by two columns in SQL Server

From this table:
Number Value
1 a
2 b
3 a
2 c
2 b
3 a
2 b
I need to get count of all duplicate rows by Number and Value, i.e. 5.
Thanks.
I think this query is what you want:
SELECT SUM(t.cnt)
FROM
(
SELECT COUNT(*) cnt
FROM table_name
GROUP BY number, value
HAVING COUNT(*) > 1
)t;
Maybe something like this?
select value,number,max(cnt) as Count_distinct from (
select *,row_number () over (partition by value,number order by number) as cnt
from #sample
)t
group by value,number
Output
+---------------------------------+
| Value | Number | Count_Distinct |
| a | 1 | 1 |
| b | 2 | 3 |
| c | 2 | 1 |
| a | 3 | 2 |
+---------------------------------+
Select
count(distinct Number) as Distinct_Numbers,
count(distinct Value) as Distinct_Values
from
Table
This shows how many distinct values are in each column. Does this help?
Give a row number partition by both the columns and order by both the columns. Then count the number of rows where row number greater than 1.
Query
;with cte as(
select [rn] = row_number() over(
partition by [Number], [Value]
order by [Number], [Value]
), *
from [your_table_name]
)
select count(*) from cte
where [rn] > 1;
I think you mean number of unique number - value pairs, you can use:
SELECT count(*)
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY number, value ORDER BY (select 1)) from mytable rnk) i
where i.rnk = 1
May be this query may help you
select * from [dbo].[Sample_table1]
;WITH
DupContactRecords(number,value,DupsCount)
AS
(
SELECT number,value, COUNT() AS TotalCount FROM [Sample_table1] GROUP BY number,value HAVING COUNT() > 1
)
--to get the duplicats
/*select * from DupContactRecords*/
SELECT sum(DupsCount) FROM DupContactRecords