Count max number of consecutive occurrences of a value in SQL Server - sql

I have a table with players, results and ID:
Player | Result | ID
---------------
An | W | 1
An | W | 1
An | L | 0
An | W | 1
An | W | 1
An | W | 1
Ph | L | 0
Ph | W | 1
Ph | W | 1
Ph | L | 0
Ph | W | 1
A 'W' will always have an ID of 1,
I need to create a query that will count the maximum number of consecutive 'W's for each player:
Player | MaxWinStreak
---------------------
An | 3
Ph | 2
I tried to use Rows Unbounded Preceeding but i can only get it to count the maximum number of Ws in total, and not consecutively
Select
t2.player
,max(t2.cumulative_wins) As 'Max'
From
( Select
t.Player
,Sum(ID) Over (Partition By t.Result,t.player
Order By t.GameWeek Rows Unbounded Preceding) As cumulative_wins
From
t
) t2
Group By
t2.player
Is there a different approach i can take ?

You need a column to specify the ordering. SQL tables represent unordered sets. In the below query, the ? represents this column.
You can use the difference of row numbers to get each winning streak:
select player, count(*) as numwins
from (select t.*,
row_number() over (partition by player order by ?) as seqnum,
row_number() over (partition by player, result order by ?) as seqnum_r
from t
) t
where result = 'W'
group by player, (seqnum - seqnum_r);
You can then get the maximum:
select player, max(numwins)
from (select player, count(*) as numwins
from (select t.*,
row_number() over (partition by player order by ?) as seqnum,
row_number() over (partition by player, result order by ?) as seqnum_r
from t
) t
where result = 'W'
group by player, (seqnum - seqnum_r)
) pw
group by player;

Related

how to find which element appears the most in an sql table

I have a table set in the following manner:
band_id | song_name
1 | rolling
2 | stomp
1 | rage
3 | atmosphere
and so on, how can I find out which band appears the most?
You can use RANK() window function:
select t.band_id
from (
select band_id,
rank() over (order by count(*) desc) rn
from tablename
group by band_id
) t
where t.rn = 1;
or if you don't need ties in the results:
select band_id
from tablename
group by band_id
order by count(*) desc limit 1;
See the demo.
Results:
| band_id |
| ------- |
| 1 |

Group similar rows and count groups in PostgreSQL

I've got a table like this:
number | info | side
--------------------
1 | foo | a
2 | bar | a
3 | bar | a
4 | baz | a
5 | foo | a
6 | bar | b
7 | bar | b
8 | foo | a
9 | bar | a
10 | baz | a
I'd like to get how many times a bar group/package (e.g. rows 2,3 is a group, rows 6,7 is a group, row 9 is also a group) appears in the info column depending on side. I'm stuck because I don't really know what do google. Whenever I search for something like group rows or merge rows I always end up finding information about the group by feature.
However I think I need some kind of window function.
Here is what I'd like to achieve:
bar_a | bar_b
-------------
2 | 1
Use lag() to determine first rows of groups:
select
number, info, side,
lag(info || side, 1, '') over (order by number) <> info || side as start_of_group
from my_table
order by 1;
number | info | side | start_of_group
--------+------+------+----------------
1 | foo | a | t
2 | bar | a | t
3 | bar | a | f
4 | baz | a | t
5 | foo | a | t
6 | bar | b | t
7 | bar | b | f
8 | foo | a | t
9 | bar | a | t
10 | baz | a | t
(10 rows)
Aggregate and filter the above result to get the desired output:
select concat(info, '_', side) as info_side, count(*)
from (
select
info, side,
lag(info || side, 1, '') over (order by number) <> info || side as start_of_group
from my_table
) s
where info = 'bar' and start_of_group
group by 1
order by 1;
info_side | count
-----------+-------
bar_a | 2
bar_b | 1
(2 rows)
This is a "gaps-and-islands" problem, at its heart, if I understand correct. For this version, the difference of row numbers should work well.
select sum( (side = 'a')::int) as num_a,
sum( (side = 'b')::int) as num_b
from (select info, side, count(*) as cnt
from (select t.*,
row_number() over (order by number) as seqnum,
row_number() over (partition by info, side order by number) as seqnum_bs
from t
) t
where info = 'bar'
group by info, size, (seqnum - seqnum_bs)
) si;
You can make do with a single window function, which should be the fastest option:
SELECT side, count(*) AS count
FROM (
SELECT side, grp
FROM (
SELECT side, number - row_number() OVER (PARTITION BY side ORDER BY number) AS grp
FROM tbl
WHERE info = 'bar'
) sub1
GROUP BY 1, 2
) sub2
GROUP BY 1
ORDER BY 1; -- optional
Or shorter, maybe not faster:
SELECT side, count(DISTINCT grp) AS count
FROM (
SELECT side, number - row_number() OVER (PARTITION BY side ORDER BY number) AS grp
FROM tbl
WHERE info = 'bar'
) sub
GROUP BY 1
ORDER BY 1; -- optional
The "trick" is that adjacent rows forming a group (grp) have consecutive numbers. When subtracting the running count over the partition on side from the running count over all rows (number), members of a "group" get the same grp number.
If there are gaps in your serial column number, which is not the case in your demo but typically there are gaps (and you actually want to ignore such gaps?!), then use row_number() OVER (ORDER BY number) in a subquery instead of just number to close the gaps first:
SELECT side, count(DISTINCT grp) AS count
FROM (
SELECT side, number - row_number() OVER (PARTITION BY side ORDER BY number) AS grp
FROM (SELECT info, side, row_number() OVER (ORDER BY number) AS number FROM tbl) tbl1
WHERE info = 'bar'
) sub2
GROUP BY 1
ORDER BY 1; -- optional
SQL Fiddle (with extended test case)
Related:
Select longest continuous sequence

Output 2 columns with average from year t and year t+1

I have a table that looks like this:
| playerid | season | Stat |
|-----------|---------|---------|
| 1 | 2014 | 2.3 |
| 1 | 2015 | 1.4 |
| 1 | 2016 | 3.5 |
| 2 | 2011 | 1.5 |
| 2 | 2012 | 5.5 |
| 3 | 2010 | 6.7 |
| 3 | 2011 | 2.6 |
I want a table with 2 columns which average 'STAT' for year t in column 1 and year t+1 in column 2.
IE-Column 1 would have averages of 'Stat' for:
playerid=1 & season=2014,
playerid=1 & season=2015,
playerid=2 & season=2011,
playerid=3 & season=2010.
Column 2 would have averages of 'Stat' for:
playerid=1 & season=2015,
playerid=1 & season=2016,
playerid=2 & season=2012,
playerid=3 & season=2011.
You could look up next year with a left join:
select cur.playerid
, cur.season
, cur.stat as stat_this_season
, next.stat as stat_next_season
from YourTable cur
left join
YourTable next
on cur.playerid = next.playerid
and cur.year = next.year - 1
To filter out seasons that do not have a next season, change the left join to an inner join (which you can abbreviate as join.)
You could use the analytic function ROW_NUMBER() to order the rows by year, for each player. Then you can get the requested averages by excluding the first record (for Column 1) and the last record (for Column 2). In case there is only one season, Stat will be included into Column 1 only.
SELECT t.playerid,
AVG(CASE WHEN t.ord1 != 1 OR
(t.ord1 = 1 AND t.ord2 = 1) THEN Stat END) AS Column_1,
AVG(CASE WHEN t.ord2 != 1 THEN Stat END) AS Column_2
FROM (SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY playerid ORDER BY season DESC) AS ord1,
ROW_NUMBER() OVER (PARTITION BY playerid ORDER BY season ASC) AS ord2
FROM table_1 s) t
GROUP BY playerid
ORDER BY playerid
If you remove the GROUP BY from previous query, you will get only one row with both averages over all players:
SELECT AVG(CASE WHEN t.ord1 != 1 OR
(t.ord1 = 1 AND t.ord2 = 1) THEN Stat END) AS Column_1,
AVG(CASE WHEN t.ord2 != 1 THEN Stat END) AS Column_2
FROM (SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY playerid ORDER BY season DESC) AS ord1,
ROW_NUMBER() OVER (PARTITION BY playerid ORDER BY season ASC) AS ord2
FROM table_1 s) t

Amazon Redshift mechanism for aggregating a column into a string [duplicate]

I have a data set in the form.
id | attribute
-----------------
1 | a
2 | b
2 | a
2 | a
3 | c
Desired output:
attribute| num
-------------------
a | 1
b,a | 1
c | 1
In MySQL, I would use:
select attribute, count(*) num
from
(select id, group_concat(distinct attribute) attribute from dataset group by id) as subquery
group by attribute;
I am not sure this can be done in Redshift because it does not support group_concat or any psql group aggregate functions like array_agg() or string_agg(). See this question.
An alternate solution that would work is if there was a way for me to pick a random attribute from each group instead of group_concat. How can this work in Redshift?
I found a way to pick up a random attribute for each id, but it's too tricky. Actually I don't think it's a good way, but it works.
SQL:
-- (1) uniq dataset
WITH uniq_dataset as (select * from dataset group by id, attr)
SELECT
uds.id, rds.attr
FROM
-- (2) generate random rank for each id
(select id, round((random() * ((select count(*) from uniq_dataset iuds where iuds.id = ouds.id) - 1))::numeric, 0) + 1 as random_rk from (select distinct id from uniq_dataset) ouds) uds,
-- (3) rank table
(select rank() over(partition by id order by attr) as rk, id ,attr from uniq_dataset) rds
WHERE
uds.id = rds.id
AND
uds.random_rk = rds.rk
ORDER BY
uds.id;
Result:
id | attr
----+------
1 | a
2 | a
3 | c
OR
id | attr
----+------
1 | a
2 | b
3 | c
Here are tables in this SQL.
-- dataset (original table)
id | attr
----+------
1 | a
2 | b
2 | a
2 | a
3 | c
-- (1) uniq dataset
id | attr
----+------
1 | a
2 | a
2 | b
3 | c
-- (2) generate random rank for each id
id | random_rk
----+----
1 | 1
2 | 1 <- 1 or 2
3 | 1
-- (3) rank table
rk | id | attr
----+----+------
1 | 1 | a
1 | 2 | a
2 | 2 | b
1 | 3 | c
This solution, inspired by Masashi, is simpler and accomplishes selecting a random element from a group in Redshift.
SELECT id, first_value as attribute
FROM(SELECT id, FIRST_VALUE(attribute)
OVER(PARTITION BY id ORDER BY random()
ROWS BETWEEN unbounded preceding AND unbounded following)
FROM dataset)
GROUP BY id, attribute ORDER BY id;
This is an answer for the related question here. That question is closed, so I am posting the answer here.
Here is a method to aggregate a column into a string:
select * from temp;
attribute
-----------
a
c
b
1) Give a unique rank to each row
with sub_table as(select attribute, rank() over (order by attribute) rnk from temp)
select * from sub_table;
attribute | rnk
-----------+-----
a | 1
b | 2
c | 3
2) Use concat operator || to combine in one line
with sub_table as(select attribute, rank() over (order by attribute) rnk from temp)
select (select attribute from sub_table where rnk = 1)||
(select attribute from sub_table where rnk = 2)||
(select attribute from sub_table where rnk = 3) res_string;
res_string
------------
abc
This only works for a finite numbers of rows (X) in that column. It can be the first X rows ordered by some attribute in the "order by" clause. I'm guessing this is expensive.
Case statement can be used to deal with NULLs which occur when a certain rank does not exist.
with sub_table as(select attribute, rank() over (order by attribute) rnk from temp)
select (select attribute from sub_table where rnk = 1)||
(select attribute from sub_table where rnk = 2)||
(select attribute from sub_table where rnk = 3)||
(case when (select attribute from sub_table where rnk = 4) is NULL then ''
else (select attribute from sub_table where rnk = 4) end) as res_string;
I haven't tested this query, but these functions are supported in Redshift:
select id, arrary_to_string(array(select attribute from mydataset m where m.id=d.id),',')
from mydataset d

Counting number of medals in weekly tournaments

I have a table holding weekly scores of players:
# select * from pref_money limit 5;
id | money | yw
----------------+-------+---------
OK32378280203 | -27 | 2010-44
OK274037315447 | -56 | 2010-44
OK19644992852 | 8 | 2010-44
OK21807961329 | 114 | 2010-44
FB1845091917 | 774 | 2010-44
(5 rows)
This SQL statement gets me the weekly winners and how many times each player has won:
# select x.id, count(x.id) from (
select id,
row_number() over(partition by yw order by money desc) as ranking
from pref_money
) x
where x.ranking = 1 group by x.id;
id | count
------------------------+-------
OK9521784953 | 1
OK356310219480 | 1
MR797911753357391363 | 1
OK135366127143 | 1
OK314685454941 | 1
OK308121034308 | 1
OK4087658302 | 5
OK452217781481 | 6
....
I would like to save the latter number in the medals column of the players table:
# \d pref_users;
Table "public.pref_users"
Column | Type | Modifiers
------------+-----------------------------+--------------------
id | character varying(32) | not null
first_name | character varying(64) |
last_name | character varying(64) |
city | character varying(64) |
medals | integer | not null default 0
How to do this please? I can only think of using a temporary table, but there must be an easier way... Thank you
UPDATE:
The query suggested by Clodoaldo works, but now my cronjob occasionally fails with:
/* reset and then update medals count */
update pref_users set medals = 0;
psql:/home/afarber/bin/clean-database.sql:63: ERROR: deadlock detected
DETAIL: Process 31072 waits for ShareLock on transaction 124735679; blocked by process 30368.
Process 30368 waits for ShareLock on transaction 124735675; blocked by process 31072.
HINT: See server log for query details.
update pref_users u
set medals = s.medals
from (
select id, count(id) medals
from (
select id,
row_number() over(partition by yw order by money desc) as ranking
from pref_money where yw <> to_char(CURRENT_TIMESTAMP, 'IYYY-IW')
) x
where ranking = 1
group by id
) s
where u.id = s.id;
update pref_users u
set medals = s.medals
from (
select id, count(id) medals
from (
select id,
row_number() over(partition by yw order by money desc) as ranking
from pref_money
) x
where ranking = 1
group by id
) s
where u.id = s.id
You could create a view which uses your "medal-select" and joins it with the actual data:
CREATE VIEW pref_money_medals AS
SELECT *
FROM pref_money
JOIN (SELECT count(x.id)
FROM (SELECT id, row_number()
OVER(PARTITION BY yw ORDER BY money DESC) AS ranking
FROM pref_money
) x
WHERE x.ranking = 1 group by x.id) medals
ON pref_money.id = medals.id;