SQL - Row Sequencing - sql

I have a table play day with following columns date_played, winner, loser
with following values,
(Jun-03-14, USA, China)
(Jun-05-14, USA, Russia)
(Jun-06-14, France, Germany)
.
.
.
.
(Jun-09-14, USA, Russia)
I need to obtain all instances where USA has won exactly 3 rows in a sequence.
I tried with the following query.
Select
date, winner, loser,
RANK() OVER (PARTITION BY winner ORDER BY date rows 2 preceding) as rank
from playday;

You can use the following query.
select winner,loser,date,cnt from (select winner, loser, date, date - lag(date,3) over ( order by date) as cnt from playday) where cnt >=3

first you need to find out when was the last time they lost.
second count the number of wins, greater than (>) the date of the last time they lost.
third return all rows greater than last loss, if count > 3.
sorry, don't have an SQL parser in front of me to put it in code properly.
Set #team_name = "USA";
select date, winner, loser
from playday
where (select count(*) as wins_since_loss from playday
where playday.winner = #team_name
and playday.date >
(select max(date) as losing_date from playday where playday.loser = #team_name)) = 3

The query is to pull sequence of rows where USA won 3 time in a row, not less or more (I used date as date1)
select date1, winner, loser from
(
select count (*) over (partition by change) as id, date1,winner,loser from
(
select date1,winner,loser,lag_loser, sum(case when loser <> lag_loser and (loser='USA' or lag_loser='USA') then 1 else 0 end) over (order by date1 rows unbounded preceding) as change from
(
select date1, winner,loser, lag(loser) over (order by date1) as lag_loser from
(
select date1, winner, loser from playday
where winner ='USA' or loser = 'USA'
ORDER BY date1 ASC
)
)
)
)
where winner ='USA' and id =3

Related

Sum for a rolling total

I have the following query:
select b.month_date,total_signups,active_users from
(
SELECT date_trunc('month',confirmed_at) as month_date
, count(distinct id) as total_signups
FROM follower.users
WHERE confirmed_at::date >= dateadd(day,-90,getdate())::date
and (deleted_at is null or deleted_at > date_trunc('month',confirmed_at))
group by 1
) a ,
(
SELECT date_trunc('month', inv.created_at) AS month_date
,COUNT(DISTINCT em.user_id) AS active_users
FROM follower.invitees inv
INNER JOIN follower.events
ON inv.event_id = em.event_id
where inv.created_at::date >= dateadd(day,-90,getdate())::date
GROUP BY 1
) b
where a.month_date=b.month_date
This returns three columns month date, total signups and active users, what I need is a rolling total for all users in the fourth column (rolling total of signups). I've tried over and partition functions with no luck. Could someone help? Appreciate it very much.
Try adding this column definition to your first Select:
SUM(total_signups)
OVER (ORDER BY b.month_date ASC rows between unbounded preceding and current row)
AS running_total
Here's a mini-demo

Percentage of players who play this game each month Redshift

This query gets the number of all active players each month from the activity table:
SELECT
date_trunc('month', createdat) as month,
count(distinct playerid) as play_all
FROM
activity
group by month
order by 1
And this query gets the number of players who play the game "bee" each month:
SELECT
date_trunc('month', createdat) as month,
count(distinct playerid) as play_bee
FROM
activity
where gamename = 'bee'
group by month
order by 1
How can I get the percentage of players who play the game "bee" each month?
This might work:
SELECT
DATE_TRUNC('month', createdat) AS month,
COUNT(DISTINCT playerid) AS play_all,
COUNT(DISTINCT CASE WHEN gamename = 'bee' THEN playerid END) AS play_bee,
100. * COUNT(DISTINCT CASE WHEN gamename = 'bee' THEN playerid END)
/ COUNT(DISTINCT playerid) AS percent_play_bee,
FROM
activity
group by month
order by 1
It uses the fact that CASE WHEN gamename = 'bee' THEN playerid END will return a playerid if gamename = 'bee, but will return NONE if it isn't. (I'm not sure if the NONE response will be included in the COUNT(DISTINCT..), so it might be off by 1.)
Basically, the CASE is evaluated for each line is individually. Then, the values are made DISTINCT, then they are counted.

How to return all the rows in the yellow census blocks?

Hey the schema is like this: for the whole dataset, we should order by machine_id first, then order by ss2k. after that, for each machine, we should find all the rows with at least consecutively 5 flag = 'census'. In this dataset, the result should be all the yellow rows..
I cannot return the last 4 rows of the yellow blocks by using this:
drop table if exists qz_panel_census_228_rank;
create table qz_panel_census_228_rank as
select t.*
from (select t.*,
count(*) filter (where flag = 'census') over (partition by machine_id, date order by ss2k rows between current row and 4 following) as census_cnt5,
count(*) filter (where flag = 'census') over (partition by machine_id, date) as count_census,
row_number() over (partition by machine_id, date order by ss2k) as seqnum,
count(*) over (partition by machine_id, date) as cnt
from qz_panel_census_228 t
) t
where census_cnt5 = 5
group by 1,2,3,4,5,6,7,8,9,10,11
DISTRIBUTED BY (machine_id);
You were close, but you need to search in both directions:
select t.*
from (select t.*,
case when count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between 4 preceding and current row) = 5
or count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between current row and 4 following) = 5
then 1
else 0
end as flag
from qz_panel_census_228 t
) t
where flag = 1
Edit:
This approach will not work unless you add an extra count for each possible 5 row window, e.g. 3 preceding and 1 following, 2 preceding and 2 following, etc. This results in ugly code and is not very flexible.
The common way to solve this gaps & islands problem is to assign consecutive rows to a common group first:
select *
from
(
select t2.*,
count(*) over (partition by machine_id, date, grp) as cnt
from
(
select t1.*
from (select t.*,
-- keep the same number for 'census' rows
sum(case when flag = 'census' then 0 else 1 end)
over (partition by machine_id, date
order by ss2k
rows unbounded preceding) as grp
from qz_panel_census_228 t
) t1
where flag = 'census' -- only census rows
) as t2
) t3
where cnt >= 5 -- only groups of at least 5 census rows
Wow, there has to be a better way of doing this, but the only way I could figure out was to create blocks of consecutive 'census' values. This looks awful but might be a catalyst to a better idea.
with q1 as (
select
machine_id, recorded, ss2k, flag, date,
case
when flag = 'census' and
lag (flag) over (order by machine_id, ss2k) != 'census'
then 1
else 0
end as block
from foo
),
q2 as (
select
machine_id, recorded, ss2k, flag, date,
sum (block) over (order by machine_id, ss2k) as group_id,
case when flag = 'census' then 1 else 0 end as census
from q1
),
q3 as (
select
machine_id, recorded, ss2k, flag, date, group_id,
sum (census) over (partition by group_id order by ss2k) as max_count
from q2
),
groups as (
select group_id
from q3
group by group_id
having max (max_count) >= 5
)
select
q2.machine_id, q2.recorded, q2.ss2k, q2.flag, q2.date
from
q2
join groups g on q2.group_id = g.group_id
where
q2.flag = 'census'
If you run each query within the with clauses in isolation, I think you will see how this evolves.

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

Make MySQL show every users rating and then sort the results using ratings?

Q: How do I make MySQL also show every users rating and then sort the results using ratings, desc?
This is all used for a gaming ladder. weblGames has the result of every reported game and has info about who won/lost, and what the winner/losers rating became (winner_elo & loser_elo).
Here is a partial screenshot of the table: http://www.ubuntu-pics.de/bild/21059/screenshot_87_RTDZBb.png
Using only this table the current MySQL code displays (thanks to this place) every players name and the number of games he played within the most recent x days, .
I want to keep that info, but also be able to output every players current Elo Point (which equals the winner_elo or loser_elo in his most recently played game.)
Here is the code that I currently have and that displays every player and his/her amount of most recent played games within x days:
SELECT userid, count(*) as cnt
FROM
(
SELECT winner as userid
from webl_games g
where (g.reported_on > now() - interval 4 day
UNION ALL
SELECT loser as userid
from webl_games g
where g.reported_on > now() - interval 4 day
) t
GROUP BY userid
HAVING COUNT(*) >= 3
SELECT userid, COUNT(*) as cnt,
(
SELECT CASE t.userid WHEN winner THEN winner_elo ELSE loser_elo END
FROM webl_games l
WHERE t.userid IN (winner, loser)
ORDER BY
reported_on DESC
LIMIT 1
) AS last_elo
FROM (
SELECT winner as userid
FROM webl_games g
WHERE (g.reported_on > now() - interval 4 day
UNION ALL
SELECT loser as userid
FROM webl_games g
WHERE g.reported_on > now() - interval 4 day
) t
GROUP BY
userid
The subquery here can be inefficient.
If it is and your table has a PRIMARY KEY, rewrite it as this:
SELECT userid, cnt,
(
SELECT q2.userid WHEN winner THEN winner_elo ELSE loser_elo END
FROM webl_games l
WHERE l.id IN (lwin, llose)
ORDER BY
reported_on DESC
LIMIT 1
)
FROM (
SELECT userid, COUNT(*) as cnt,
(
SELECT id
FROM webl_games l
WHERE t.userid = winner
ORDER BY
reported_on DESC
LIMIT 1
) AS lwin,
(
SELECT id
FROM webl_games l
WHERE t.userid = loser
ORDER BY
reported_on DESC
LIMIT 1
) AS llose
FROM (
SELECT winner as userid
FROM webl_games g
WHERE (g.reported_on > now() - interval 4 day
UNION ALL
SELECT loser as userid
FROM webl_games g
WHERE g.reported_on > now() - interval 4 day
) t
GROUP BY
userid
) q2