How does GROUP BY work? - sql

Suppose I have a table Tab1 with attributes - a1, a2, ... etc. None of the attributes are unique.
What will be the nature of the following query? Will it return a single row always?
SELECT a1, a2, sum(a3) FROM Tab1 GROUP BY a1, a2

GROUP BY returns a single row for each unique combination of the GROUP BY fields. So in your example, every distinct combination of (a1, a2) occurring in rows of Tab1 results in a row in the query representing the group of rows with the given combination of group by field values . Aggregate functions like SUM() are computed over the members of each group.

GROUP BY returns one row for each unique combination of fields in the GROUP BY clause. To ensure only one row, you would have to use an aggregate function - COUNT, SUM, MAX - without a GROUP BY clause.

GROUP BY groups all the identical records.
SELECT COUNT(ItemID), City
FROM Orders
GROUP BY City;
----------------------------------------
13 Sacrmento
23 Dallas
87 Los Angeles
5 Phoenix
If you don't group by City it will just display the total count of ItemID.

Analogously, not technically, to keep in mind its logic, it can be thought each grouped field having some rows is put per different table, then the aggregate function carries on the tables individually.
Ben Forta conspicuously states the following saying.
The GROUP BY clause instructs the DBMS to group the data and then
perform the aggregate (function) on each group rather than on the entire result
set.
Aside from the aggregate calculation statements, every column in your SELECT statement must be present in the GROUP BY clause.
The GROUP BY clause must come after any WHERE clause and before any ORDER BY clause.
My understanding reminiscent of his saying is the following.
As is DISTINCT keyword, each field specified through GROUP BY is
thought as grouped and made unique at the end of the day. The
aggregate function is carried out over each group, as happened in
SuL's answer.

Related

Why ORDER BY works only when I gave an alias name for the column, but didn't work just as column name?

This code cannot be executed, showed an error like
column "o.total_amt_usd" must appear in the GROUP BY clause or be used in an aggregate function
SELECT a.name, MIN(o.total_amt_usd)
FROM accounts a
JOIN orders o ON a.id = o.account_id
GROUP BY a.name
ORDER BY o.total_amt_usd
LIMIT 3
But after I use an alias, it worked:
SELECT a.name, MIN(o.total_amt_usd) small
FROM accounts a
JOIN orders o ON a.id = o.account_id
GROUP BY a.name
ORDER BY small
LIMIT 3
Could anybody explain a little bit about this, please?
Both logically make sense to me. But one of them is not working.
Thanks a lot.
You can't order by o.total_amt_usd since it's not available in the result set (after grouping on name).
You need to order by a grouped field or using an aggregate function like MIN. In this case you'll want to order by MIN(o.total_amt_usd) which is essentially what you are doing after using the alias in the order-clause.
Think about what you're asking of the first query and perhaps try a visual example by hand.
Imagine a table with just 4 rows, John has two rows with amounts of 50 and 20, Bob has two rows with amounts of 30 and 60.
You are asking for each unique name and the corresponding minimum amount, so the results are naturally John:20, Bob:30.
By asking to order your results by referring specfically to every row's Total (and not the aggregated total) you are saying, order my two rows by looking at all four rows, which means John could go both before Bob and after Bob given 20 is less than 30 and 50 is greater than 30.
You might look at the data visually and see the "correct" order, however for the query engine this makes no sense, you can only prioritise the resulting two rows based on their aggregated values, therefore you must order by those aggregated values, either using that column's alias or using the same expression. You cannot order by non-aggregated columns.

Confused with the Group By function in SQL

Q1: After using the Group By function, why does it only output one row of each group at most? Does this mean that having is supposed to filter the group rather than filter the records in each group?
Q2: I want to find the records in each group whose ages are greater than the average age of that group. I tried the following, but it returns nothing. How should I fix this?
SELECT *, avg(age) FROM Mytable Group By country Having age > avg(age)
Thanks!!!!
You can calculate the average age for each country in a subquery and join that to your table for filtering:
SELECT mt.*, MtAvg.AvgAge
FROM Mytable mt
inner join
(
select mtavgs.country
, avg(mtavgs.age) as AvgAge
from Mytable mtavgs
group by mtavgs.country
) MTAvg
on mtavg.country=mt.country
and mt.Age > mtavg.AvgAge
GROUP BY returns always 1 row per unique combination of values in the GROUP BY columns listed (provided that they are not removed by a HAVING clause). The subquery in our example (alias: MTAvg) will calculate a single row per country. We will use its results for filtering the main table rows by applying the condition in the INNER JOIN clause; we will also report that average by including the calculated average age.
GROUP BY is a keyword that is called an aggregate function. Check this out here for further reading SQL Group By tutorial
What it does is it lumps all the results together into one row. In your example it would lump all the results with the same country together.
Not quite sure what exactly your query needs to be to solve your exact problem. I would however look into what are called window functions in SQL. I believe what you first need to do is write a window function to find the average age in each group. Then you can write a query to return the results you need
Depending on your dbms type and version, you may be able to use a "window function" that will calculate the average per country and with this approach it makes the calculation available on every row. Once that data is present as a "derived table" you can simply use a where clause to filter for the ages that are greater then the calculated average per country.
SELECT mt.*
FROM (
SELECT *
, avg(age) OVER(PARTITION BY country) AS AvgAge
FROM Mytable
) mt
WHERE mt.Age > mt.AvgAge

Why does MAX statement require a Group By?

I understand why the first query needs a GROUP BY, as it doesn't know which date to apply the sum to, but I don't understand why this is the case with the second query. The value that ultimately is the max amount is already contained in the table - it is not calculated like SUM is. thank you
-- First Query
select
sum(OrderSales),OrderDates
From Orders
-- Second Query
select
max(FilmOscarWins),FilmName
From tblFilm
It is not the SUM and MAX that require the GROUP BY, it is the unaggregated column.
If you just write this, you will get a single row, for the maximum value of the FilmOscarWins column across the whole table:
select
max(FilmOscarWins)
From
tblFilm
If the most Oscars any film won was 12, that one row will say 12. But there could be multiple films, all of which won 12 Oscars, so if we ask for the FilmName alongside that 12, there is no single answer.
By adding the Group By, we fundamentally change the query: instead of returning one number for the whole table, it will return one row for each group - which in this case, means one row for each film.
If you do want to get a list of all those films which had the maximum 12 Oscars, you have to do something more complicated, such as using a sub-query to first find that single number (12) and then find all the rows matching it:
select
FilmOscarWins,
FilmName
From
tblFilm
Where FilmOscarWins = (
select
max(FilmOscarWins)
From
tblFilm
)
If you want the film with the most Oscar wins, then use select top:
select top (1) f.*
From tblFilm f
order by FilmOscarWins desc;
In an aggregation query, the select columns need to be consistent with the group by columns -- the unaggregated columns in the select must match the group by.

ORDER BY an aggregated column in Report Builder 3.0

On a report builder 3.0, i retreived some items and counted them using a Count aggregate. Now i want to order them from highest to lowest. How do i use the ORDER BY function on the aggregated column? The picture below show the a column that i want to ORDER BY it, it is ticked.
Pic
The code is vers simple as shown bellow:
SELECT DISTINCT act_id,NameOfAct,
FROM Acts
Your picture indicates you also want a Total row at the bottom:
SELECT
COALESCE(NameOfAct,'Total') NameOfAct,
COUNT(DISTINCT act_id) c
FROM Acts
GROUP BY ROLLUP(NameOfAct)
ORDER BY
CASE WHEN NameOfAct is null THEN 1 ELSE 0 END,
c DESC;
Result of example data:
NameOfAct count
-------------- -------
Act_B 3
Act_A 2
Act_Z 1
Total 6
Try it with example rows at: http://sqlfiddle.com/#!18/dbd6c/2
I looked at the Pic. So you might have duplicate acts with the same name. And you want to know the number of acts that have the same unique name.
You might want to group the results by name:
GROUP BY NameOfAct
And include the act names and their counts in the query results:
SELECT NameOfAct, COUNT(*) AS ActCount
(Since the act_id column is not included in the groups, you need to omit it in the SELECT. The DISTINCT is also not necessary anymore, since all groups are unique already.)
Finally, you can sort the data (probably descending to get the acts with the largest count on top):
ORDER BY ActCount DESC
Your complete query would become something like this:
SELECT NameOfAct, COUNT(*) AS ActCount
FROM Acts
GROUP BY NameOfAct
ORDER BY ActCount DESC
Edit:
By the way, you use field "act_id" in your SELECT clause. That's somewhat confusing. If you want to know counts, you want to look at either the complete table data or group the table data into smaller groups (with the GROUP BY clause). Then you can use aggregate functions to get more information about those groups (or the whole table), like counts, average values, minima, maxima...
Single record information, like an act's ID in your case, is typically not important if you want to use statistic/aggregate methods on grouped data. Suppose your query returns an act name which is used 10 times. Then you have 10 records in your table, each with a unique act_id, but with the same name.
If you need just one act_id that represents each group / act name (and assuming act_id is an autonumbering field), you might include the latest / largest act_id value in the query using the MAX aggregate function:
SELECT NameOfAct, COUNT(*) AS ActCount, MAX(act_id) AS LatestActId
(The rest of the query remains the same.)

How to group rows after another grouping in oracle?

I have a table called correctObjects. In this tablet here a lot of grups which has different number records. One example is given below as grup 544 has 5 rows in table. So firstly, I should group all records by GRUP COLUMN then I must do inner matching by CAP COLUMN. So in grup#544 there is three different CAP values then I must give Inner Group number to these records. How can I do these two level grouping process. GRUP column is already done. Inner Grup Column is null in every records.
After Inner Group process, It must look like as belows:
I am using Oracle 11g R2 and PL/SQL Developer
Your question lacks certain details, so I'll just give you a starting point, and you can tweak it to suit your needs.
It's not entirely clear, but the way I understand it, you want to rank the different rows by cap. And I think the ranking is independent for every distinct grup value.
What's not clear to me is why 125 mm is ranked 1, and 62 mm is ranked 2. Is it based on the value? Is it based on which row is the first one, and if so, how are the rows ordered? Or maybe you don't really care which one is first or second, as long as they are grouped correctly. I'll have to assume the latter.
In any case, it sounds like you want to use the dense_rank() analytic function in some form:
select mip, startmi, cap, grup,
dense_rank() over (partition by grup order by cap) as inner_grup
from tbl