Selecting sum of first and last rows in related table - sql

In my table I have these three tables:
Teams:
Id GUID UNIQUE PRIMARY
Name NVARCHAR
Participants:
Id GUID UNIQUE PRIMARY
FirstName NVARCHAR
[....]
TeamId GUID
ParticipantData:
Id GUID UNIQUE PRIMARY
Weight FLOAT
Date DATETIME
ParticipantId GUID
TeamId GUID
What I need is an SQL query that gives me all columns from Teams AND:
The sum of the first (order by Date) entries in ParticipantData of participants in the team (TeamId)
The sum of the last (ordered by Date) entries in ParticipantData of participants in the team (TeamId)
Explanation:
I have lots of participants (team members) reporting their weight with some interval (Weight + Date). What I'm trying to accomplish is to calculate the weight loss of all the team members.
On 2019-01-03 Participant 1 reports Weight 78
On 2019-01-06 Participant 1 reports Weight 75
On 2019-01-04 Participant 2 reports Weight 86
On 2019-01-07 Participant 2 reports Weight 83
I need the query to get SumOfFirstWeights (78 + 86) = 164
I need the query to get SumOfLastWeights (75 + 83) = 158
Which gives me a weight loss of 6.
I've tried many combinations of:
Select *,
(SELECT TOP (1) Sum(Weight)
FROM ParticipantData WHERE (TeamId = Teams.Id)
GROUP BY ParticipantId
)
ORDER BY Date As SumOfFirstWeights
From Teams

Your problem is some kind of greatest/lowest per group, moreover, you want a sum of these values.
select t.id, t.name, sum(t1.weight), sum(t2.weight)
from teams t
left join
(
select pd.teamid, pd.participantid, pd.weight
from ParticipantData pd
join
(
select teamid, participantid, min(date) min_date
from ParticipantData
group by teamid, participantid
) t on pd.teamid = t.teamid and
pd.participantid = t.participantid and
pd.date = t.min_date
) t1 on t.id = t1.teamid
left join
(
select pd.teamid, pd.participantid, pd.weight
from ParticipantData pd
join
(
select teamid, participantid, max(date) max_date
from ParticipantData
group by teamid, participantid
) t on pd.teamid = t.teamid and
pd.participantid = t.participantid and
pd.date = t.max_date
) t2 on t1.teamid = t2.teamid and
t1.participantid = t2.participantid
group by t.id, t.name

You want to find the first and last row per team. You can use analytic functions for this. I'm using ROW_NUMBER which gives me exactly one record, even if a participant has two entries on the same day.
with pd as
(
select
participantid,
weight,
row_number() over (partition by participantid order by date) as one_is_first,
row_number() over (partition by participantid order by date desc) as one_is_last
from participantdata
)
, team_weights as
(
select
p.teamid,
sum(case when pd.one_is_first = 1 then weight end) as first_weight,
sum(case when pd.one_is_last = 1 then weight end) as last_weight,
sum(case when pd.one_is_last = 1 then weight end) -
sum(case when pd.one_is_first = 1 then weight end) as difference
from pd
join participant p on p.id = pd.participantid
where (pd.one_is_first = 1 or pd.one_is_last = 1)
group by p.teamid
)
select *
from teams
join team_weights on where team_weights.team_id = teams.id
order by teams.id;

Related

Hackerrank Winners chart SQL advance level question

This is a new question Hackerrank has added for the advance level SQL certification. I was not able to solve it at all. Can anyone help?
There were a number of contests where participants each made number of attempts. The attempt with the highest score is only one considered. Write a query to list the contestants ranked in the top 3 for each contest. If multiple contestants have the same score in each contest, they are at the same rank.
Report event_id, rank 1 name(s), rank 2 name(s), rank 3 name(s). Order the contest by event_id. Name that share a rank should be ordered alphabetically and separated by a comma.
Order the report by event_id
The following is some sample data for your scenario. A table of contestants, and the attempts made. Made each person's attempts on their own line so you can see obvious different attempts per person.
create table contestants
( id int identity(1,1) not null,
personName nvarchar(10) )
insert into contestants ( personName )
values ( 'Bill' ), ('Mary'), ('Jane' ), ('Mark')
create table attempts
( id int identity(1,1) not null,
contestantid int not null,
score int not null )
insert into attempts ( contestantid, score )
values
( 1, 72 ), ( 1, 88 ), (1, 81 ),
( 2, 83 ), ( 2, 88 ), (2, 79), (2,86),
( 3, 94 ),
( 4, 79 ), (4, 87)
Now, the simple premise is each contestants best score which is a simple MAX() of the score per contestant.
select
contestantid,
max( score ) highestScore
from
attempts
group by
contestantid
The result of the above query is the BASIS of the final ranking. So I have put that query as the FROM source. So instead of a table, the from is the result of the above query which I have aliased "PreAgg" for the pre-aggregation per contestant.
select
ContestantID,
c.personName,
DENSE_RANK() OVER ( order by HighestScore DESC ) as FinalRank
from
(select
contestantid,
max( score ) highestScore
from
attempts
group by
contestantid ) preAgg
JOIN Contestants c
on preAgg.contestantid = c.id
The join to the contestant is easy enough to pull the name, but now look at the DENSE_RANK() clause. Since there is no grouping per score, such as say the Olympics where there is a specific sport, and each sport has highest ranks, we do not need the "PARTITION" clause.
The ORDER BY clause is what you want. In this case, the HighestScore column from the pre-aggregation query and want that in DESCENDING order so the HIGHEST score is at the top and going down. The "as" gives it the final column name.
DENSE_RANK() OVER ( order by HighestScore DESC ) as FinalRank
Results
ContestantID personName FinalRank
3 Jane 1
1 Bill 2
2 Mary 2
4 Mark 3
Now, if you only wanted a limit such as the top 3 ranks and you actually had 20+ competitors, just wrap this up one more time where a where clase
select * from
(
select
ContestantID,
c.personName,
DENSE_RANK() OVER ( order by HighestScore DESC ) as FinalRank
from
(select
contestantid,
max( score ) highestScore
from
attempts
group by
contestantid ) preAgg
JOIN Contestants c
on preAgg.contestantid = c.id ) dr
where
dr.FinalRank < 3
MY Reference table
MYSQL SOLUTION
using multiple common table expressions , dense_rank , Join and group_concat
WITH t1 AS
(SELECT *,DENSE_RANK() OVER(PARTITION BY event_id ORDER BY score DESC) AS 'rk' FROM Scoretable),
t2 AS
(SELECT * FROM t1 WHERE rk<=3),
t3 AS
(SELECT event_id , CASE WHEN rk=1 THEN p_Name ELSE NULL END AS 'first' FROM t2 WHERE rk=1 ),
t4 AS
(SELECT event_id , CASE WHEN rk=2 THEN p_Name ELSE NULL END AS 'second' FROM t2 WHERE rk=2 ),
t5 AS
(SELECT event_id , CASE WHEN rk=3 THEN p_Name ELSE NULL END AS 'third' FROM t2 WHERE rk=3 ),
t6 AS
(SELECT t3.event_id , t3.first , t4.second , t5.third FROM t3 JOIN t4 ON t3.event_id = t4.event_id JOIN t5 ON t4.event_id=t5.event_id ORDER BY 1,2,3,4)
SELECT event_id , GROUP_CONCAT(DISTINCT first) AS 'rank 1' , GROUP_CONCAT(DISTINCT second) AS 'rank 2' , GROUP_CONCAT(DISTINCT third) AS 'rank 3'
FROM t6 GROUP BY 1 ORDER BY 1;

sql - select all rows that have all multiple same cols

I have a table with 4 columns.
date
store_id
product_id
label_id
and I need to find all store_ids that have all products_id with same label_id (for example 4)in one day.
for example:
store_id | label_id | product_id | data|
4 4 5 9/2
5 4 7 9/2
4 3 12 9/2
4 4 7 9/2
so it should return 4 because it's the only store that contains all possible products with label 4 at one day.
I have tried something like this:
(select store_id, date
from table
where label_id = 4
group by store_id, date
order by date)
I dont know how to write the outer query, I tried:
select * from table
where product_id = all(Inner query)
but it didnt work.
Thanks
It is unclear from your question whether the labels are specific to a given day or through the entire period. But a variation of Tim's answer seems appropriate. For any label:
SELECT t.date, t.label, t.store_id
FROM t
GROUP BY t.date, t.label, t.store_id
HAVING COUNT(DISTINCT t.product_id) = (SELECT COUNT(DISTINCT t2product_id)
FROM t t2
WHERE t2.label = t.label
);
For a particular label:
SELECT t.date, t.store_id
FROM t
WHERE t.label = 4
GROUP BY t.date,t.store_id
HAVING COUNT(DISTINCT t.product_id) = (SELECT COUNT(DISTINCT t2product_id)
FROM t t2
WHERE t2.label = t.label
);
If the labels are specific to the date, then you need that comparison in the outer queries as well.
Here is one way:
SELECT date, store_id
FROM yourTable
GROUP BY date, store_id
HAVING COUNT(DISTINCT product_id) = (SELECT COUNT(DISTINCT product_id)
FROM yourTable t2
WHERE t2.date = t1.date)
ORDER BY date, product_id;
This query reads in a pretty straightforward way, and it says to find every product, on some date, whose distinct product count is the same as the distinct product count on the same day, across all stores.
I'd probably aggregate to lists of products in a string or array:
with products_per_day_and_store as
(
select
store_id,
date,
string_agg(distinct product_id order by product_id) as products
from mytable
where label_id = 4
group by store_id, date
)
, products_per_day
(
select
date,
string_agg(distinct product_id order by product_id) as products
from mytable
where label_id = 4
group by date
)
select distinct ppdas.store_id
from products_per_day_and_store ppdas
join products_per_day ppd using (date, products);

Modify query to group by client identifier

I have the following query.
Base query
WITH CTE (clientid, dayZero)
AS
-- Define the CTE query.
(
SELECT
clientid,
DATEDIFF(
DAY,
MIN(calendar),
MIN(CASE
WHEN total = 0
THEN calendar
END)
) as dayZero
FROM (
SELECT
clientid,
CONVERT(datetime, convert(varchar(10), calendar)) calendar,
TOTAL
FROM STATS s1
) a
GROUP BY clientid
),
cteb as
-- Define the outer query referencing the CTE name.
(SELECT cte.*, c.company, v.Name, m.id as memberid
FROM CTE
JOIN client c
on c.id = cte.CLIENTID
join Domain v
on v.Id = c.domainID
join subscriber m
on m.ClientId = c.id
join activity a
on m.id = a.memberid
where c.id != 023
),
ctec as
(
select count(distinct memberid) as Number from cteb
group by clientid
)
select clientid, dayzero, company, name, Number from cteb , ctec
The output of this query is -
clientid dayzero company name Number
21 35 School Boards Education 214
21 35 School Boards Education 214
I want it to only return 1 row per client. Any ideas on how to modify this query
Sub Query
select count(distinct memberid) as Number from cteb
group by clientid
When I only run the query until the above subquery and select like so -
select * from ctec
where clientid = 21
I get
clientid Number
21 214
22 423
This is what I would. But when I run the following select to get all the other columns I need, I start getting duplicates. The output makes sense because I am not grouping by clientid. But if I groupby how do I get the other columns I need?
select clientid, dayzero, company, name, Number from cteb , ctec
UPDATE
When I run the below select
select clientid, dayzero, company, name, Number from cteb , ctec
group by clientid, dayzero, company, name, Number
I still get
clientid dayzero company name Number
21 35 School Boards Education 214
21 35 School Boards Education 215
I don't understand why I am getting different numbers in the Number column (214 and 215 in this case). But when I run it with the group by as shown below, I get the correct numbers.
select count(distinct memberid) as Number from cteb
group by clientid
select * from ctec
where clientid = 21
I get
clientid Number
21 2190
Neither 214 nor 215 is correct. The correct number is 2190 which I get when I groupby as shown above.
If you want to show unique rows based on a particular column, you can use ROW_NUMBER() like following query.
select * from
(
select clientid, dayzero, company, name, Number,
ROW_NUMBER() OVER(PARTITION BY clientid ORDER BY Number DESC) RN
from cteb , ctec
) t
where RN=1

How to get the second best value

Assume we have two tables:
Players(player_id int)
GameScores(player_id int, session_id int, score int)
How can we query the second highest-score session for every player?
For example if
Players
1
2
GameScores
(player_id, session_id, score)
1 1 10
1 2 20
1 3 40
2 4 10
2 5 20
Then result would be
(player_id, session_id)
1, 2
2, 4
Can you try this
SELECT GameScores.player_id, GameScores.session_id
FROM (
SELECT player_id,MAX(score) as SecondScore
FROM GameScores g
WHERE score < (SELECT Max(Score) FROM gameScore where gameScore.player_id = g.player_id)
GROUP BY player_id
) x
INNER JOIN GameScores ON x.player_id = gamescore.player_id
AND x.SecondScore = gamescore.score
This is the query that select the second high score for each player
SELECT player_id,MAX(score) as SecondScore
FROM GameScores g
WHERE score < (SELECT Max(Score) FROM gameScore where gameScore.player_id = g.player_id)
GROUP BY player_id
You can't group by session in this query. So that's why you need to put this in a subquery and join it to gamescore to get the session_id
Here is code snippet for Oracle sql
select tbl.player_id,tbl.session_id from
(select p.player_id,g.session_id,g.score,rank() over (partition by p.player_id order by score desc) rnk from players p,
gamescores g
where p.player_id = g.player_id) tbl
where tbl.rnk = 2;
select player_id, first(session_id) as session_id
from
GameScores inner join (
select player_id, max(score) as secondscore
from
GameScores left join (
select player_id, max(score) as firstscore
from GameScores
group by player_id
) as NotThisOnes on GameScores.player_id = NotThisOnes.player_id
and GameScores.score = NotThisOnes.firstscore
where NotThisOnes.player_id is null
group by player_id
) as thisare on GameScores.player_id = thisare.player_id
and GameScores.score = thisare.secondscore
group by player_id
I took a different approach... I am not sure if this is better than other answers, but i wanted to solve it this way:
SELECT
GameScores.player_id,
GameScores.session_id,
GameScores.score
FROM
GameScores
WHERE
GameScores.score=
(select max(score) from GameScores GameScores_2
where GameScores.player_id = GameScores_2.Player_ID
and GameScores_2.Score<
(select max(score) from GameScores GameScores_1
where GameScores_1.player_id = GameScores.player_id));

How to query for rows that have highest column value among rows that have same value for one of the columns

I have UserScores Table with data like this:
Id userId Score
1 1 10
2 2 5
3 1 5
I would like to have a query or SQL block that can give me the following output
Id userId Score
3 1 5
2 2 5
That is, I would like to pick rows that are unique by 'user id' that belonging to the highest 'id' column value.
Another solution that would work on SQL Server 2000 (same as INNER JOIN above, but slightly faster) is:
SELECT id, userId, Score
FROM UserScores
WHERE id in (SELECT MAX(id)
FROM UserScores
GROUP BY userId
)
ORDER BY userId
Use:
WITH summary AS (
SELECT t.id,
t.userid,
t.score,
ROW_NUMBER() OVER (PARTITION BY t.userid ORDER BY t.id DESC, t.score DESC) AS rank
FROM USERSCORES sc)
SELECT s.id,
s.userid,
s.score
FROM summary s
WHERE s.rank = 1
How about
SELECT MAX(Id), userId, Score
FROM table
GROUP BY UserId
SELECT U2.id, U2.userId, U2.score
FROM UserScores U2
INNER JOIN (
SELECT U1.userId, MAX(U1.Id) MaxId
FROM UserScores U1
GROUP BY U1.userId
) U3
ON U2.id = U3.MaxId and U2.userId = U3.userId
ORDER BY U2.userId