sum of two columns assigned to a condition - sql

hi im trying to get the total of two columns stored to a name then get a condition but i having error on the 'Rebound' name on line 3
the offreb and defreb has a integer type and some values are stored as 0 (zero)
SELECT team, CONCAT(firstname,' ',lastname) AS name, SUM(offreb + defreb) AS Rebounds
FROM boxscore
WHERE round = 'Finals' AND game = 7 AND Rebounds > 0
ORDER BY team, Rebounds;

You want to filter by column in the WHERE clause which is not yet calculated when the WHERE clause is executed. You can use a sub-query or having.
It should be something like this:
SELECT team, CONCAT(firstname,' ',lastname) AS name, SUM(offreb + defreb) AS Rebounds
FROM boxscore
WHERE round = 'Finals' AND game = 7
GROUP BY team, CONCAT(firstname,' ',lastname)
HAVING SUM(offreb + defreb) > 0
ORDER BY team, Rebounds;

Here using HAVING clause solves your issue.
If a table has been grouped using GROUP BY, but only certain groups
are of interest, the HAVING clause can be used, much like a WHERE
clause, to eliminate groups from the result.
Official postgres docs
SELECT
team,
CONCAT(firstname,' ',lastname) AS name,
SUM(offreb + defreb) AS "Rebounds"
FROM
boxscore
WHERE
round = 'Finals' AND game = 7
GROUP BY
team,
CONCAT(firstname,' ',lastname)
HAVING
SUM(offreb + defreb) > 0
ORDER BY
team, "Rebounds";
Note that you cannot use column alias in WHERE and GROUP BY clause, but can be used in ORDER BY clause and wrap double quotes to preserve case-sensitivity.

Related

How to find every combination of features shared across multiple rows?

I am pretty new to using SQL (using StandardSQL via Big Query currently) and unfortunately my Google-fu could not find me a solution to this issue.
I'm working with a dataset where each row is a different person and each column is an attribute (name, age, gender, weight, ethnicity, height, bmi, education level, GPA, etc.). I am tying to 'cluster' these people into all of the feature combinations that match 5 or more people.
Originally I did this manually with 3 feature columns where I would essentially concatenate a 'cluster name' column and then have 7 select queries for each grouping with a >5 where clause, which I then UNIONed together:
gender
age
ethnicity
gender + age
gender + ethnicity
age + ethnicity
gender + age + ethnicity
^ unfortunately doing it this way just balloons the number of combinations and with my anticipated ~15 total features doing it this way seems really unfeasible. I'd also like to do this through a less manual approach so that if a new feature is added in the future it does not require major edits to include it in my cluster identification.
Is there a function or existing process that could accomplish something like this? I'd ideally like to be able to identify ALL combinations that meet my combination user count minimum (so it's expected the same rows would match multiple different clusters here. Any advice or help here would be appreciated! Thanks.
If only BQ supported grouping sets or cube, this would be simple. One method that is pretty generalizable enumerates the 7 groups and then uses bits to figure out what to aggregate:
select (case when n & 1 > 0 then gender end) as gender,
(case when n & 2 > 0 then age end) as age,
(case when n & 4 > 0 then ethnicity end) as ethnicity,
count(*)
from t cross join
unnest(generate_array(1, 7)) n
group by n, 1, 2, 3;
Another method which is trickier is to reconstruct the groups using rollup(). Something like this:
select gender, age, ethnicity, count(*)
from t
group by rollup(gender, age, ethnicity);
Produces three of the groups you want. So:
select gender, age, ethnicity, count(*)
from t
group by rollup(gender, age, ethnicity)
union all
select gender, null, ethnicity, count(*)
from t
group by gender, ethnicity
union all
select null, age, ethnicity, count(*)
from t
group by rollup (ethnicity, age);
The above reconstructs all your groups using rollup().

Using a conditional to display either column value or concatenated text, AND exclude from GROUP BY

I am currently working on a personal project where I take the Lehman Access Baseball Database, convert it to MSSQL and build out functionality against it. I've managed to convert the database and I've built out three statstical views for Batting, Pitching and Fielding.
I am currently working on a query against the batting view to return a list of players that qualified for the batting title. This is how far I've gotten:
SELECT Player, Season, SUM(AB) AS AB, SUM(BB)AS BB, SUM(HBP) AS HBP,
SUM(SH)AS SH, SUM(SF) AS SF, AVG(AVG)AS AVG, AVG(OBP) AS OBP,
AVG(SLG) AS SLG, AVG(OPS) AS OPS
FROM View_PlayerBatting
WHERE Season = '2016'
GROUP BY Player, Season
HAVING (SUM(AB) + SUM(BB) + SUM(HBP) + SUM(SH) + SUM(SF)) > 502
ORDER BY AVG(OPS) DESC
I am using the Order By clause to check my results against a verified master (MLB.com).
The problem I have is that I also wish to include Teams on this list, and some of the players were traded mid-season, so to figure out qualifiers I don't want to group by team (grouping by team removes seven otherwise qualified results). However, if I try to add to the SELECT clause, something like:
IIF(COUNT(Team) = 1, Team, CAST(COUNT(Team) AS varchar) + ' teams') AS Team
to my select clause, it forces me to include team in GROUP BY. It also forces me to include team by GROUP BY if I try to use a subquery to return the value from another table, such as:
IIF(COUNT(Team) = 1, (SELECT teamID FROM Teams where TeamID = View_PlayerBatting.Team), ...
It seems like if I need to know the value of Team, I have to include it in the GROUP BY clause; if I change the first result of the IIF to return something other than the Team column, like a static value or character string, the query will execute without requiring Team being in the GROUP BY. How do return the Team column in the format I'm trying to present it in?
I am not familiar with that database, but you are basically looking to solve the top 1 per group problem. You could use max(Team) but that might not get you the team you want.
You can use cross apply() to get 1 team per player and season ordered by some column like TradeDate like so:
SELECT Player, Season, x.Team
, SUM(AB) AS AB, SUM(BB)AS BB, SUM(HBP) AS HBP,
SUM(SH)AS SH, SUM(SF) AS SF, AVG(AVG)AS AVG, AVG(OBP) AS OBP,
AVG(SLG) AS SLG, AVG(OPS) AS OPS
FROM View_PlayerBatting
cross apply (
select top 1
Team
from View_PlayerBatting as i
where i.Player = View_PlayerBatting.Player
and i.Season = View_PlayerBatting.Season
order by TradeDate desc
) as x
WHERE Season = '2016'
GROUP BY Player, Season, x.Team
HAVING (SUM(AB) + SUM(BB) + SUM(HBP) + SUM(SH) + SUM(SF)) > 502
ORDER BY AVG(OPS) DESC
If you want to get all the team values for each player concatenated, you could use the stuff() with select ... for xml path ('') method of string concatenation.
SELECT Player, Season,
, Team = stuff((
select distinct '; '+i.Team
from View_PlayerBatting as i
where i.Player = View_PlayerBatting.Player
and i.Season = View_PlayerBatting.Season
for xml path (''), type).value('.','varchar(max)')
,1,2,'')
, SUM(AB) AS AB, SUM(BB)AS BB, SUM(HBP) AS HBP
, SUM(SH)AS SH, SUM(SF) AS SF, AVG(AVG)AS AVG, AVG(OBP) AS OBP
, AVG(SLG) AS SLG, AVG(OPS) AS OPS
FROM View_PlayerBatting
WHERE Season = '2016'
GROUP BY Player, Season, x.Team
HAVING (SUM(AB) + SUM(BB) + SUM(HBP) + SUM(SH) + SUM(SF)) > 502
ORDER BY AVG(OPS) DESC;

Ratio or Percentage from group by SQL query from column with condition and without condition

I am having some trouble with a SQL query. From a table let's call it Reports:
I want to group all the reports by the name column.
Then for each of those name groups I want to go to the rating column and count the number of times the rating was 15 or less. Let's say this happened 10 times for one of the groups with the name BOBBO.
I also want to know the number of times ratings were submitted (same as total number of records for each name group). So using the name group BOBBO let's say he has 20 ratings.
So under the condition the group BOBBO 50% of the time has a rating 15 or less.
I've seen these posts -- I am still having some trouble cracking this.
using-count-and-return-percentage-against-sum-of-records
getting-two-counts-and-then-dividing-them
getting-a-percentage-from-mysql-with-a-group-by-condition-and-precision
divide-two-counts-from-one-select
After reading those I tried queries like these:
ActiveRecord::Base.connection.execute
("SELECT COUNT(*) Matched,
(select COUNT(rating) from reports group by name) Total,
CAST(COUNT(*) AS FLOAT)/CAST((SELECT COUNT(*) FROM reports group by name) AS FLOAT)*100 Percentage from reports
where rating <= 15 order by Percentage")
ActiveRecord::Base.connection.execute
("select name, sum(rating) / count(rating) as bad_rating
from reports group by name having bad_rating <= 15")
Any help would be very much appreciated!
Consider a conditional aggregate for the bad ratings divided by full count:
SELECT [name],
SUM(CASE WHEN [rating] <= 15 THEN 1 ELSE 0 END) / Count(*) AS bad_rating
FROM Reports
GROUP BY [name]
Or as #CL. points out a shorter conditional aggregate (where logical expression is summed):
SELECT [name],
SUM([rating] <= 15) / Count(*) AS bad_rating
FROM Reports
GROUP BY [name]

SQL add multiple "Count" together

I'm trying to add the counts together and output the one with the max counts.
The question is: Display the person with the most medals (gold as place = 1, silver as place = 2, bronze as place = 3)
Add all the medals together and display the person with the most medals
Below is the code I have thought about (obviously doesn't work)
Any ideas?
Select cm.Givenname, cm.Familyname, count(*)
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
having max (count(re.place = 1) + count(re.place = 2) + count(re.place = 3))
Sorry forgot to add that were not allowed to use ORDER BY.
Some data in the table
Competitors Table
Competitornum GivenName Familyname gender Dateofbirth Countrycode
219153 Imri Daniel Male 1988-02-02 Aus
Results Table
Eventid Competitornum Place Lane Elapsedtime
SWM111 219153 1 2 20 02
From what you've described it sounds like you just need to take the "Top" individual in the total medal count. In order to do that you would write something like this.
Select top 1 cm.Givenname, cm.Familyname, count(*)
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
order by count(*) desc
Without using order by you have a couple of other options though I'm glossing over whatever syntax peculiarities sqlfire may use.
You could determine the max medal count of any user and then only select competitors that have that count. You could do this by saving it out to a variable or using a subquery.
Select cm.Givenname, cm.Familyname, count(*)
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
having count(*) = (
Select max( count(*) )
FROM Competitors cm JOIN Results re ON cm.competitornum = re.competitornum
WHERE re.place between '1' and '3'
group by cm.Givenname, cm.Familyname
)
Just a note here. This second method is highly inefficient because we recalculate the max medal count for every row in the parent table. If sqlfire supports it you would be much better served by calculating this ahead of time, storing it in a variable and using that in the HAVING clause.
You are grouping by re.place, is that what you want? You want the results per ... ? :)
[edit] Good, now that's fixed you're almost there :)
The having is not needed in this case, you simply need to add a count(re.EventID) to your select and make a subquery out of it with a max(that_count_column).

Switch case in aggregate query

I want to have a switch case in my SQL query such that when the group by does not group any element i dont want to aggregate otherwise I want to. Is that possible.
my query is something like this:
select count(1),AVG(student_mark) ,case when Count(1)=1 then student_subjectid else null end from Students
group by student_id
i get this error Column 'student_subjectid' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Thanks in advance..
SELECT
student_id,
COUNT(*) AS MarkCount,
AVG(student_mark) AS student_mark,
CASE COUNT(*) WHEN 1 THEN MIN(student_subjectid) END AS student_subjectid
FROM Students
GROUP BY student_id
Why in the world would you complicate it?
select count(1), AVG(Student_mark) Student_mark
from Students
group by student_id
If there is only one student_mark, it is also the SUM, AVG, MIN and MAX - so just continue to use the aggregate!
EDIT
The dataset that would eventuate with your requirement will not normally make sense. The way to achieve that would be to merge (union) two different results
select
numRecords,
Student_mark,
case when numRecords = 1 then student_subjectid end # else is implicitly NULL
from
(
select
count(1) AS numRecords,
AVG(Student_mark) Student_mark,
min(student_subjectid) as student_subjectid
from Students
group by student_id
) x