Take the value of the first row met in non-aggregate expressions - sql

I have a query like this:
SELECT PlayerID, COUNT(PlayerID) AS "MatchesPlayed", Name, Role, Team,
SUM(Goals) As "TotalGoals", SUM(Autogoals) As "TotalAutogoals",
SUM(...)-2*SUM(...)+2*SUM(...) AS Score, ...
FROM raw_ordered
GROUP BY PlayerID
ORDER BY Score DESC
where in raw_ordered each row describes the performance of some player in some match, in reverse chronological order.
Since I'm grouping by PlayerID what I get from this query is a table where each row provides the cumulative data about some player. Now, there's no problem with columns with aggregate functions; my problem is with the Team column.
A player may change team during a season; what I'm interested in here is the last Team he played with, so I'd like to have a way to tell SELECT to take the first value met in each group for the Team column (or, in general, for non-aggregate-function columns).
Unfortunately, I don't seem to find any (easy) way to do this in SQLite: the documentation of SELECT says:
If the expression is an aggregate expression, it is evaluated across all rows in the group. Otherwise, it is evaluated against a single arbitrarily chosen row from within the group.
with no suggestion about how to alter this behavior, and I can't find between the aggregate functions anything that just takes the first value it encounters.
Any idea?

SQLite does not have a 'first' aggregate function; you would have to implement it yourself.
However, the documentation is out of date. Since SQLite 3.7.11, if there is a MIN() or MAX(), the record from which that minimum/maximum value comes is guaranteed to be chosen.
Therefore, just add MAX(MatchDate) to the SELECT column list.

SELECT PlayerID, COUNT(PlayerID) AS "MatchesPlayed", Name, Role,
(SELECT Team FROM raw_ordered GROUP BY PlayerID ORDER BY some_date) AS team,
SUM(Goals) As "TotalGoals", SUM(Autogoals) As "TotalAutogoals",
SUM(...)-2*SUM(...)+2*SUM(...) AS Score, ...
FROM raw_ordered
GROUP BY PlayerID
ORDER BY Score DESC
Presumably you have some way in your table to order the output such that you can use a subquery to achieve your goal.

Related

Confused with the Group By function in SQL

Q1: After using the Group By function, why does it only output one row of each group at most? Does this mean that having is supposed to filter the group rather than filter the records in each group?
Q2: I want to find the records in each group whose ages are greater than the average age of that group. I tried the following, but it returns nothing. How should I fix this?
SELECT *, avg(age) FROM Mytable Group By country Having age > avg(age)
Thanks!!!!
You can calculate the average age for each country in a subquery and join that to your table for filtering:
SELECT mt.*, MtAvg.AvgAge
FROM Mytable mt
inner join
(
select mtavgs.country
, avg(mtavgs.age) as AvgAge
from Mytable mtavgs
group by mtavgs.country
) MTAvg
on mtavg.country=mt.country
and mt.Age > mtavg.AvgAge
GROUP BY returns always 1 row per unique combination of values in the GROUP BY columns listed (provided that they are not removed by a HAVING clause). The subquery in our example (alias: MTAvg) will calculate a single row per country. We will use its results for filtering the main table rows by applying the condition in the INNER JOIN clause; we will also report that average by including the calculated average age.
GROUP BY is a keyword that is called an aggregate function. Check this out here for further reading SQL Group By tutorial
What it does is it lumps all the results together into one row. In your example it would lump all the results with the same country together.
Not quite sure what exactly your query needs to be to solve your exact problem. I would however look into what are called window functions in SQL. I believe what you first need to do is write a window function to find the average age in each group. Then you can write a query to return the results you need
Depending on your dbms type and version, you may be able to use a "window function" that will calculate the average per country and with this approach it makes the calculation available on every row. Once that data is present as a "derived table" you can simply use a where clause to filter for the ages that are greater then the calculated average per country.
SELECT mt.*
FROM (
SELECT *
, avg(age) OVER(PARTITION BY country) AS AvgAge
FROM Mytable
) mt
WHERE mt.Age > mt.AvgAge

Why does MAX statement require a Group By?

I understand why the first query needs a GROUP BY, as it doesn't know which date to apply the sum to, but I don't understand why this is the case with the second query. The value that ultimately is the max amount is already contained in the table - it is not calculated like SUM is. thank you
-- First Query
select
sum(OrderSales),OrderDates
From Orders
-- Second Query
select
max(FilmOscarWins),FilmName
From tblFilm
It is not the SUM and MAX that require the GROUP BY, it is the unaggregated column.
If you just write this, you will get a single row, for the maximum value of the FilmOscarWins column across the whole table:
select
max(FilmOscarWins)
From
tblFilm
If the most Oscars any film won was 12, that one row will say 12. But there could be multiple films, all of which won 12 Oscars, so if we ask for the FilmName alongside that 12, there is no single answer.
By adding the Group By, we fundamentally change the query: instead of returning one number for the whole table, it will return one row for each group - which in this case, means one row for each film.
If you do want to get a list of all those films which had the maximum 12 Oscars, you have to do something more complicated, such as using a sub-query to first find that single number (12) and then find all the rows matching it:
select
FilmOscarWins,
FilmName
From
tblFilm
Where FilmOscarWins = (
select
max(FilmOscarWins)
From
tblFilm
)
If you want the film with the most Oscar wins, then use select top:
select top (1) f.*
From tblFilm f
order by FilmOscarWins desc;
In an aggregation query, the select columns need to be consistent with the group by columns -- the unaggregated columns in the select must match the group by.

When to use multiple GROUP BY in SQL?

I'm practicing SQL on SQLZOO, and I'm working on Joins. Question 11 of that section asks: "For every match involving 'POL', show the matchid, date and the number of goals scored."
So I tried the following code:
SELECT matchid, mdate, COUNT(player)
FROM goal JOIN game ON matchid = id
WHERE (team1 = 'POL' OR team2 = 'POL')
GROUP BY matchid
But it throws an error:
'gisq.game.mdate' isn't in GROUP BY
So the answer is:
SELECT matchid, mdate, COUNT(player)
FROM goal JOIN game ON matchid = id
WHERE (team1 = 'POL' OR team2 = 'POL')
GROUP BY matchid, mdate
My question is, why is it required to also include mdate in the GROUP BY clause if it's not part of the aggregate function? Thank you and sorry for the newbie question. Here is the table's format: https://sqlzoo.net/wiki/The_JOIN_operation
The simple reason why it is required is because SQL requires that the GROUP BY columns and the SELECT columns need to be compatible. Those are the rules of the language.
Your query slightly simplified is:
SELECT matchid, mdate, COUNT(player)
FROM goal JOIN
game
ON matchid = id
WHERE 'POL' IN (team1, team2)
GROUP BY matchid;
The query is saying: Return one row per matchid -- because of the GROUP BY. But then which mdate gets returned? There could be multiple matches.
SQL requires that you be explicit about what you want. You might intend the most recent date, in which case you would use MAX(mdate). Or you might want a separate row for each date, in which case you would include it in the GROUP BY. Or you might intend something else. The query needs to be clear.
When using aggregations and aggregating functions (COUNT, MAX, MIN, AVG, etc.) in the SELECT part of a query together with direct (not aggregated) columns, it's mandatory to repeat all not aggregated columns from the SELECT part in the GROUP BY part of the query. As the result, and this is what is required, all columns are aggregated, some of them by aggregating functions in the SELECT part of your query, the rest of them are aggregated in the GROUP BY clause.
Group By single column: Group By single column means, to place all the rows with same value of only that particular column in one group.
Group By multiple columns: Group by multiple column for example, GROUP BY column1, column2. This means to place all the rows with same values of both the columns column1 and column2 in one group
Since the question asks you to select date as well, you will have to put that in group by clause, lets suppose what if POL had multiple games on the same date. Keeping date in groupby clause can help you with that scenario.

ORDER BY an aggregated column in Report Builder 3.0

On a report builder 3.0, i retreived some items and counted them using a Count aggregate. Now i want to order them from highest to lowest. How do i use the ORDER BY function on the aggregated column? The picture below show the a column that i want to ORDER BY it, it is ticked.
Pic
The code is vers simple as shown bellow:
SELECT DISTINCT act_id,NameOfAct,
FROM Acts
Your picture indicates you also want a Total row at the bottom:
SELECT
COALESCE(NameOfAct,'Total') NameOfAct,
COUNT(DISTINCT act_id) c
FROM Acts
GROUP BY ROLLUP(NameOfAct)
ORDER BY
CASE WHEN NameOfAct is null THEN 1 ELSE 0 END,
c DESC;
Result of example data:
NameOfAct count
-------------- -------
Act_B 3
Act_A 2
Act_Z 1
Total 6
Try it with example rows at: http://sqlfiddle.com/#!18/dbd6c/2
I looked at the Pic. So you might have duplicate acts with the same name. And you want to know the number of acts that have the same unique name.
You might want to group the results by name:
GROUP BY NameOfAct
And include the act names and their counts in the query results:
SELECT NameOfAct, COUNT(*) AS ActCount
(Since the act_id column is not included in the groups, you need to omit it in the SELECT. The DISTINCT is also not necessary anymore, since all groups are unique already.)
Finally, you can sort the data (probably descending to get the acts with the largest count on top):
ORDER BY ActCount DESC
Your complete query would become something like this:
SELECT NameOfAct, COUNT(*) AS ActCount
FROM Acts
GROUP BY NameOfAct
ORDER BY ActCount DESC
Edit:
By the way, you use field "act_id" in your SELECT clause. That's somewhat confusing. If you want to know counts, you want to look at either the complete table data or group the table data into smaller groups (with the GROUP BY clause). Then you can use aggregate functions to get more information about those groups (or the whole table), like counts, average values, minima, maxima...
Single record information, like an act's ID in your case, is typically not important if you want to use statistic/aggregate methods on grouped data. Suppose your query returns an act name which is used 10 times. Then you have 10 records in your table, each with a unique act_id, but with the same name.
If you need just one act_id that represents each group / act name (and assuming act_id is an autonumbering field), you might include the latest / largest act_id value in the query using the MAX aggregate function:
SELECT NameOfAct, COUNT(*) AS ActCount, MAX(act_id) AS LatestActId
(The rest of the query remains the same.)

SQL: Getting a value multiple times

I have a problem getting the same value multiple times and I don't know what I am doing wrong, it's probably something very simple but nothing seems to work for me, and as I said, I need it for a school project and I have only been doing this for about a week.
This is my code:
select hobby
from preshobby
order by hobby asc
When I click execute I get the same value a couple of times. For example:
Wrestling
Wlking
Walking
Walking
Walking
Walking
Touch Football
Tennis
I need the result to be in ascending order and each value should only appear once.
Use distinct:
select distinct hobby
from preshobby
order by hobby
Note that you don't need to specify asc with order by as ascending is the default sort order in most versions of SQL.
In your table you have probably many entries with repeated hobbies. So you need to group them like this
select hobby
from preshobby
group by hobby order by hobby asc
You are basically selecting all the values of hobbies you have entered in the database column.. Since there are many people with same hobby.. when you query the table for the column, you see repetitive values. Use distinct like this..
select distinct hobby from table Name;
And default order is asc so you need not specify any value unless you need it descending.