Generate columns from values returned by SELECT - sql

I've got a query that returns data like so:
student
course
grade
a-student
ENG-W05
100
a-student
MAT-W05
85
a-student
ENG-W06
100
b-student
MAT-W05
90
b-student
SCI-W05
75
The data is grouped by student and course. Ideally, I'd like to have the above data transformed into the below:
student
ENG-W05
MAT-W05
ENG-W06
SCI-W05
a-student
100
85
100
NULL
b-student
NULL
90
NULL
75
So, after the transformation, each student only has one record, with all of their grades (and any missing courses graded as null).
Does anyone have any ideas? Obviously, this is fairly simple to do if I take the data out and transform it in a language (like Python), but I'd love to get the data in the desired format with an SQL query.
Also, would it be possible to have the columns order alphabetically (ascending)? So, the final output would be:
student
ENG-W05
ENG-W06
MAT-W05
SCI-W05
a-student
100
100
85
NULL
b-student
NULL
NULL
90
75
EDIT: To clarify, the values in course aren't known. The ones I provided are just examples. So ideally, if more course values found there way into that first query result (the first table), they would still be mapped to columns in the final result (without needing to change the query). In reality, I actually have >1k distinct values for the course column, and so I can't manually write out each one.

demos:db<>fiddle
You can use conditional aggregation for that:
SELECT
student,
SUM(grade) FILTER (WHERE course = 'ENG-W05') as eng_w05,
SUM(grade) FILTER (WHERE course = 'MAT-W05') as mat_w05,
SUM(grade) FILTER (WHERE course = 'ENG-W06') as eng_w06,
SUM(grade) FILTER (WHERE course = 'SCI-W05') as sci_w05
FROM mytable
GROUP BY student
The FILTER clause allows to aggregate only some specific records. So this one aggregates all records for a specific course.
Finding the correct aggregate function could be difficult. Here SUM() does the job, as there's only one value per group. MAX() or MIN() would do it as well. It depends on your real requirement. If there's really only one value per group, it doesn't matter, you just need to do any aggregation.
Instead of FILTER clause, which is Postgres specific, you could use the more SQL standard fitting CASE clause:
SELECT
student,
SUM(
CASE
WHEN course = 'ENG-W05' THEN grade
END
) AS eng_w05,
...

You can use the conditional aggregation as follows:
select student,
max(case when course = 'ENG-W05' then grade end) as "ENG-W05",
max(case when course = 'MAT-W05' then grade end) as "MAT-W05",
max(case when course = 'ENG-W06' then grade end) as "ENG-W06",
max(case when course = 'SCI-W05' then grade end) as "SCI-W05"
from (your_query) t
group by student

Related

SQL JOIN with CASE statement result

Is there any way of joining the result of a case statement with a reference table without creating a CTE, ect.
Result AFTER CASE statement:
ID Name Bonus Level (this is the result of a CASE statement)
01 John A
02 Jim B
01 John B
03 Jake C
Reference table
A 10%
B 20%
C 30%
I want to then get the % next to each employee, then the max %age using the MAX function and grouping by ID, then link it back again to the reference so that each employee has the single correct (highest) bonus level next to their name. (This is a totally fictitious scenario, but very similar to what I am looking for).
Just need help with joining the result of the CASE statement with the reference table.
Thanks in advance.
In place of a temporary value as the result of the case statement, you could use a select statement from the reference table.
So if your case statement looks like:
case when variable1=value then bonuslevel =A
Then, replacing it like this might help
case when variable1=value then (select percentage from ReferenceTable where variable2InReferenceTable=A)
Don't know if I am overly simplifying, but based on the results of your case result query, why not just join that to the reference table, and do a max grouped by ID/Name. Since the ID and persons name wont change anyhow since they are the same person, you are just getting the max you want. To complete the Bonus level, rejoin just that portion after the max percentage determined for the person.
select
lvl1.ID,
lvl1.Name,
lvl1.FinalBonus,
rt2.BonusLvl
from
( select
PQ.ID,
PQ.Name,
max( rt.PcntBonus ) as FinalBonus
from
(however you
got your
data query ) PQ
JOIN RefTbl rt
on PQ.BonusLvl = rt.BonusLvl
) lvl1
JOIN RefTbl rt2
on lvl1.FinalBonus = rt2.PcntBonus
Since the Bonus levels (A,B,C) do not guarantee corresponding % levels (10,20,30), I did it this way... OTHERWISE, you could have just used max() on both the bonus level and percent. But what if your bonus levels were listed as something like
Limited 10%
Aggressive 20%
Ace 30%
You could see that a max of the level above would have "Limited", but the max % = 30 is associated with an "Ace" sales rep... Get the 30% first, then see what the label that matched that is.

SQL - Group by an agregate function

I have a question whether if it's possible to make a group by an aggregate function.
Scenario:
I have a table which has biomass(kg) and number of individuals for everyday and a description, therefore I can calculate the total av. weight and total number of individuals within two dates as:
select
description,
sum(biomass)/sum(number_individuals) as av.weight,
sum(number_individuals) as individuals
from
Table
group by description
Which works okay, now, the thing is that I want to group those individuals separating them by weight ranges, in order to get something like:
description range(kg) number av.weigh(g)
Foo 2-3 2400 2584.48
I have tried something like
SELECT
description,
case when sum(biomass)/sum(number_individuals) >= 2000.0
and sum(biomass)*1000/sum(number_individuals) < 3000 then '2-3'
else 'nothing'
end as desc_range
FROM Table
Group by
description,
sum(biomass)/sum(number_individuals)
But it doesn't seem to work, neither using the alias desc_range ofc.
I am using Informix 9.40 TC3
Any help will be appreciated.
Best regards
If you want to aggregate on an aggregation, you usually need a subquery. However, you mention individuals, so perhaps this is what you want:
select description,
(case when biomass between 2 and 3 then '2-3'
else 'nothing'
end) as biomass
sum(biomass)/sum(number_individuals) as av.weight, sum(number_individuals) as individuals
from Table
group by description,
(case when biomass between 2 and 3 then '2-3'
else 'nothing'
end);

Complex SQL query on one table

Have forgotten SQL queries as have not used it for a long time.
I have a following requirement.
Have a table called match where I keep my competitor details with respect to matches my team have played against them. So some important fields are like this
match_id
competior_id
match_winner_id
ismatchtied
goals_scored_my_team
goals_scored_comp
From this table I want to get the head to head information for all my competitors.
like this
Competitor Matches Wins Losses Draws
A 10 5 4 1
B 8 3 2 1
Draw information I can get from ismatchtied is set to 'Y' or 'N'.
I want to get all the info from one query. I can get all the info from executing queries separately and do complex logic processing in my server code. But my performance will take a hit.
Any help will be hugely appreciated.
cheers,
Saurav
You could use conditional aggregation, involving CASE expressions inside aggregate functions, like this:
SELECT
competitor_id,
COUNT(*) AS Matches,
COUNT(CASE WHEN goals_scored_my_team > goals_scored_comp THEN 1 END) AS Wins,
COUNT(CASE WHEN goals_scored_my_team < goals_scored_comp THEN 1 END) AS Losses,
COUNT(CASE WHEN goals_scored_my_team = goals_scored_comp THEN 1 END) AS Draws
FROM matches
GROUP BY
competitor_id
;
Every CASE above will evaluate to NULL when the condition isn't satisfied. And since COUNT(expr) omits NULLs, every COUNT(CASE ...) in the above query will effectively only count rows that match the corresponding WHEN condition.
So, the first COUNT counts only rows where my team scored more against the competitor, i.e. where my team won. In a similar way, the second and the third CASEs get the numbers of losses and draws.
SELECT m4.competior_id, COUNT(*) as TotalMathces,
(select count(*) from match m1 where goals_scored_my_team>goals_scored_comp AND m1.competior_id=m4.competior_id) as WINS,
(select count(*) as WIN from match m2 where goals_scored_comp>goals_scored_my_team AND m2.competior_id=m4.competior_id) as LOSES,
(select count(*) as WIN from match m3 where goals_scored_my_team=goals_scored_comp AND m3.competior_id=m4.competior_id) as DRAWS
FROM match m4 group by m4.competior_id;

Retrieve names by ratio of their occurrence

I'm somewhat new to SQL queries, and I'm struggling with this particular problem.
Let's say I have query that returns the following 3 records (kept to one column for simplicity):
Tom
Jack
Tom
And I want to have those results grouped by the name and also include the fraction (ratio) of the occurrence of that name out of the total records returned.
So, the desired result would be (as two columns):
Tom | 2/3
Jack | 1/3
How would I go about it? Determining the numerator is pretty easy (I can just use COUNT() and GROUP BY name), but I'm having trouble translating that into a ratio out of the total rows returned.
SELECT name, COUNT(name)/(SELECT COUNT(1) FROM names) FROM names GROUP BY name;
Since the denominator is fixed, the "ratio" is directly proportional to the numerator. Unless you really need to show the denominator, it'll be a lot easier to just use something like:
select name, count(*) from your_table_name
group by name
order by count(*) desc
and you'll get the right data in the right order, but the number that's shown will be the count instead of the ratio.
If you really want that denominator, you'd do a count(*) on a non-grouped version of the same select -- but depending on how long the select takes, that could be pretty slow.

MySQL: Getting highest score for a user

I have the following table (highscores),
id gameid userid name score date
1 38 2345 A 100 2009-07-23 16:45:01
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
5 38 2345 A 50 2009-07-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
7 32 2345 A 100 2009-07-20 16:45:01
Now in the above structure, a user can play a game multiple times but I want to display the "Games Played" by a specific user. So in games played section I can't display multiple games. So the concept should be like if a user played a game 3 times then the game with highest score should be displayed out of all.
I want result data like:
id gameid userid name score date
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
I tried following query but its not giving me the correct result:
SELECT id,
gameid,
userid,
date,
MAX(score) AS score
FROM highscores
WHERE userid='2345'
GROUP BY gameid
Please tell me what will be the query for this?
Thanks
Requirement is a bit vague/confusing but would something like this satisfy the need ?
(purposely added various aggregates that may be of interest).
SELECT gameid,
MIN(date) AS FirstTime,
MAX(date) AS LastTime,
MAX(score) AS TOPscore.
COUNT(*) AS NbOfTimesPlayed
FROM highscores
WHERE userid='2345'
GROUP BY gameid
-- ORDER BY COUNT(*) DESC -- for ex. to have games played most at top
Edit: New question about adding the id column to the the SELECT list
The short answer is: "No, id cannot be added, not within this particular construct". (Read further to see why) However, if the intent is to have the id of the game with the highest score, the query can be modified, using a sub-query, to achieve that.
As explained by Alex M on this page, all the column names referenced in the SELECT list and which are not used in the context of an aggregate function (MAX, MIN, AVG, COUNT and the like), MUST be included in the ORDER BY clause. The reason for this rule of the SQL language is simply that in gathering the info for the results list, SQL may encounter multiple values for such an column (listed in SELECT but not GROUP BY) and would then not know how to deal with it; rather than doing anything -possibly useful but possibly silly as well- with these extra rows/values, SQL standard dictates a error message, so that the user can modify the query and express explicitly his/her goals.
In our specific case, we could add the id in the SELECT and also add it in the GROUP BY list, but in doing so the grouping upon which the aggregation takes place would be different: the results list would include as many rows as we have id + gameid combinations the aggregate values for each of this row would be based on only the records from the table where the id and the gameid have the corresponding values (assuming id is the PK in table, we'd get a single row per aggregation, making the MAX() and such quite meaningless).
The way to include the id (and possibly other columns) corresponding to the game with the top score, is with a sub-query. The idea is that the subquery selects the game with TOP score (within a given group by), and the main query's SELECTs any column of this rows, even when the fieds wasn't (couldn't be) in the sub-query's group-by construct. BTW, do give credit on this page to rexem for showing this type of query first.
SELECT H.id,
H.gameid,
H.userid,
H.name,
H.score,
H.date
FROM highscores H
JOIN (
SELECT M.gameid, hs.userid, MAX(hs.score) MaxScoreByGameUser
FROM highscores H2
GROUP BY H2.gameid, H2.userid
) AS M
ON M.gameid = H.gameid
AND M.userid = H.userid
AND M.MaxScoreByGameUser = H.score
WHERE H.userid='2345'
A few important remarks about the query above
Duplicates: if there the user played several games that reached the same hi-score, the query will produce that many rows.
GROUP BY of the sub-query may need to change for different uses of the query. If rather than searching for the game's hi-score on a per user basis, we wanted the absolute hi-score, we would need to exclude userid from the GROUP BY (that's why I named the alias of the MAX with a long, explicit name)
The userid = '2345' may be added in the [now absent] WHERE clause of the sub-query, for efficiency purposes (unless MySQL's optimizer is very smart, currently all hi-scores for all game+user combinations get calculated, whereby we only need these for user '2345'); down side duplication; solution; variables.
There are several ways to deal with the issues mentioned above, but these seem to be out of scope for a [now rather lenghty] explanation about the GROUP BY constructs.
Every field you have in your SELECT (when a GROUP BY clause is present) must be either one of the fields in the GROUP BY clause, or else a group function such as MAX, SUM, AVG, etc. In your code, userid is technically violating that but in a pretty harmless fashion (you could make your code technically SQL standard compliant with a GROUP BY gameid, userid); fields id and date are in more serious violation - there will be many ids and dates within one GROUP BY set, and you're not telling how to make a single value out of that set (MySQL picks a more-or-less random ones, stricter SQL engines might more helpfully give you an error).
I know you want the id and date corresponding to the maximum score for a given grouping, but that's not explicit in your code. You'll need a subselect or a self-join to make it explicit!
Use:
SELECT t.id,
t.gameid,
t.userid,
t.name,
t.score,
t.date
FROM HIGHSCORES t
JOIN (SELECT hs.gameid,
hs.userid,
MAX(hs.score) 'max_score'
FROM HIGHSCORES hs
GROUP BY hs.gameid, hs.userid) mhs ON mhs.gameid = t.gameid
AND mhs.userid = t.userid
AND mhs.max_score = t.score
WHERE t.userid = '2345'