Why when using MIN function and selecting another column, we require GROUP BY clause? Doesn't MIN return single record?

Why when using MIN function and selecting another column, we require GROUP BY clause? Doesn't MIN return single record? - sql

I have two tables vehicles and dealership_vehicles. The dealership_vehicles table has a price column. The vehicles table has a column dealership_vehicle_id which relates to the dealership_vehicles id column in the dealership_vehicles table.
I wanted to return just the vehicle make of the cheapest car.
Why is it that the following query:
select
vehicles.make,
MIN(dealership_vehicles.price)
from
vehicles inner join dealership_vehicles
on vehicles.dealership_vehicle_id=dealership_vehicles.id;
Returns the error:
column "vehicles.make" must appear in the GROUP BY clause or be used in an aggregate function
Since MIN function returns a single value it is plausible that SQL query can be constructed that will return a single value without needing GROUP BY.

You say you want to know the make of the cheapest car. The easiest way to do this is
SELECT DISTINCT v.MAKE
FROM VEHICLE v
INNER JOIN DEALERSHIP_VEHICLES dv
ON v.DEALERSHIP_VEHICLE_ID = dv.ID
WHERE dv.PRICE = (SELECT MIN(PRICE) FROM DEALERSHIP_VEHICLES);
Note that because multiple vehicles might have the "cheapest" price it's entirely possible you'll get multiple returns from the above query.
Best of luck.
EDIT
Another way to do it is to take the minimum price, by make, then sort by the minimum price, and then just take the first row. Something like
SELECT *
FROM (SELECT v.MAKE, MIN(dv.PRICE)
FROM VEHICLE v
INNER JOIN DEALERSHIP_VEHICLES dv
ON v.DEALERSHIP_VEHICLE_ID = dv.ID
GROUP BY v.MAKE
ORDER BY MIN(dv.PRICE) ASC)
WHERE ROWNUM = 1;

Think of the term "GROUP BY" as "for each." It's saying "Give me the MIN of dealership_vehicles.price for each vehicles.make"
So you will need to change your query to:
select
vehicles.make,
MIN(dealership_vehicles.price)
from
vehicles inner join dealership_vehicles
on vehicles.dealership_vehicle_id=dealership_vehicles.id
Group by vehicles.make;

If you want the make of the cheapest car, then no aggregation is needed:
select v.make, dv.price
from vehicles v inner join
dealership_vehicles dv
on v.dealership_vehicle_id = dv.id
order by dv.price asc
fetch first one row only;
This gets a little more complicated if you want all rows in the case of ties:
select v.*
from (select v.make, dv.price, rank() over (order by price asc) as seqnum
from vehicles v inner join
dealership_vehicles dv
on v.dealership_vehicle_id = dv.id
) v
where seqnum = 1

So let's say after we join in the price we have the following table (i.e. stored in a #temp table):
#temp Vehicles table:
| Make | Model | Price |
|--------|-------|----------|
| Toyota | Yaris | 5000.00 |
| Toyota | Camry | 10000.00 |
| Ford | Focus | 7500.00 |
If you query it for minimum price without specifying what you're grouping by, then only one minimum function is applied across all of the rows. Example:
select min(Price) from #temp
will return you a single value of 5000.00
If you want to know the make of the cheapest car, you need to filter your results by the cheapest price - it's a two step process. First you find out the cheapest price using min, then in a separate query, you find out which cars are at that price. Once you construct your query correctly, you will notice that this reveals what you might not have though of - you can actually have more than one cheapest make.
Example table:
#temp Vehicles table v2:
| Make | Model | Price |
|--------|--------|----------|
| Toyota | Yaris | 5000.00 |
| Toyota | Camry | 10000.00 |
| Ford | Focus | 7500.00 |
| Ford | Escort | 5000.00 |
query:
select * from #temp
where Price = (select min(Price) from #temp)
result:
| Make | Model | Price |
|--------|--------|----------|
| Toyota | Yaris | 5000.00 |
| Ford | Escort | 5000.00 |

Related

SQL MIN() with GROUP BY select additional columns

I am trying to query a sql database table for the minimum price for products. I also want to grab an additional column with the value of the row with the minimum price. My data looks something like this.
ProductId | Price | Location
1 | 50 | florida
1 | 55 | texas
1 | 53 | california
2 | 65 | florida
2 | 64 | texas
2 | 60 | new york
I can query the minimum price for a product with this query
select ProductId, Min(Price)
from Table
group by ProductId
What I want to do is also include the Location where the Min price is being queried from in the above query. Is there a standard way to achieve this?

One method uses a correlated subquery:
select t.*
from t
where t.price = (select min(t2.price) from t t2 where t2.productid = t.productid);
In most databases, this has very good performance with an index on (productid, price).

Create multiple filtered result sets of a joined table for use in aggregate functions

I have a (heavily simplified) orders table, total being the dollar amount, containing:
| id | client_id | type | total |
|----|-----------|--------|-------|
| 1 | 1 | sale | 100 |
| 2 | 1 | refund | 100 |
| 3 | 1 | refund | 100 |
And clients table containing:
| id | name |
|----|------|
| 1 | test |
I am attempting to create a breakdown, by client, metrics about the total number of sales, refunds, sum of sales, sum of refunds etc.
To do this, I am querying the clients table and joining the orders table. The orders table contains both sales and refunds, specified by the type column.
My idea was to join the orders twice using subqueries and create aliases for those filtered tables. The aliases would then be used in aggregate functions to find the sum, average etc. I have tried many variations of joining the orders table twice to achieve this but it produces the same incorrect results. This query demonstrates this idea:
SELECT
clients.*,
SUM(sales.total) as total_sales,
SUM(refunds.total) as total_refunds,
AVG(sales.total) as avg_ticket,
COUNT(sales.*) as num_of_sales
FROM clients
LEFT JOIN (SELECT * FROM orders WHERE type = 'sale') as sales
ON sales.client_id = clients.id
LEFT JOIN (SELECT * FROM orders WHERE type = 'refund') as refunds
ON refunds.client_id = clients.id
GROUP BY clients.id
Result:
| id | name | total_sales | total_refunds | avg_ticket | num_of_sales |
|----|------|-------------|---------------|------------|--------------|
| 1 | test | 200 | 200 | 100 | 2 |
Expected result:
| id | name | total_sales | total_refunds | avg_ticket | num_of_sales |
|----|------|-------------|---------------|------------|--------------|
| 1 | test | 100 | 200 | 100 | 1 |
When the second join is included in the query, the rows returned from the first join are returned again with the second join. They are multiplied by the number of rows in the second join. It's clear my understanding of joining and/or subqueries is incomplete.
I understand that I can filter the orders table with each aggregate function. This produces correct results but seems inefficient:
SELECT
clients.*,
SUM(orders.total) FILTER (WHERE type = 'sale') as total_sales,
SUM(orders.total) FILTER (WHERE type = 'refund') as total_refunds,
AVG(orders.total) FILTER (WHERE type = 'sale') as avg_ticket,
COUNT(orders.*) FILTER (WHERE type = 'sale') as num_of_sales
FROM clients
LEFT JOIN orders
on orders.client_id = clients.id
GROUP BY clients.id
What is the appropriate way to created filtered and aliased versions of this joined table?
Also, what exactly is happening with my initial query where the two subqueries are joined. I would expect them to be treated as separate subsets even though they are operating on the same (orders) table.

You should do the (filtered) aggregation once for all aggregates you want, and then join to the result of that. As your aggregation doesn't need any columns from the clients table, this can be done in a derived table. This is also typically faster than grouping the result of the join.
SELECT clients.*,
o.total_sales,
o.total_refunds,
o.avg_ticket,
o.num_of_sales
FROM clients
LEFT JOIN (
select client_id,
SUM(total) FILTER (WHERE type = 'sale') as total_sales,
SUM(total) FILTER (WHERE type = 'refund') as total_refunds,
AVG(total) FILTER (WHERE type = 'sale') as avg_ticket,
COUNT(*) FILTER (WHERE type = 'sale') as num_of_sales
from orders
group by client_id
) o on o.client_id = clients.id

How do you use two SUM() aggregate functions in the same query for PostgreSQL?

I have a PostgreSQL query that yields the following results:
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | null | 125.00 | new
123-2 | B corp. | null | 100.00 | new
I need to replace the o.order_total (it doesn't work properly) and sum up the sum of the order_shipment_total column so that, for the example above, each row winds up saying 225.00. I need the results above to look like this below:
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 225.00 | 125.00 | new
123-2 | B corp. | 225.00 | 100.00 | new
What I've Tried
1.) To replace o.order_total, I've tried SUM(SUM(osh.items)) but get the error message that you cannot nest aggregate functions.
2.) I've tried to put the entire query as a subquery and sum the order_shipment_total column, but when I do, it just repeats the column itself. See below:
SELECT order,
company,
SUM(order_shipment_total) AS order_shipment_total,
order_shipment_total,
order_type
FROM (
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type
) subquery
GROUP BY order,
company,
order_shipment_total,
order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 125.00 | 125.00 | new
123-2 | B corp. | 100.00 | 100.00 | new
3.) I've tried to only include the rows I actually want to group by in my subquery/query example above, because I feel like I was able to do this in Oracle SQL. But when I do that, I get an error saying "column [name] must appear in the GROUP BY clause or be used in an aggregate function."
...
GROUP BY order,
company,
order_type;
ERROR: column "[a column name]" must appear in the GROUP BY clause or be used in an aggregate function.
How do I accomplish this? I was certain that a subquery would be the answer but I'm confused as to why this approach will not work.

The thing you're not quite grasping with your query / approach is that you're actually wanting two different levels of grouping in the same query row results. The subquery approach is half right, but when you do a subquery that groups, inside another query that groups you can only use the data you've already got (from the subquery) and you can only choose to keep it at the level of aggregate detail it already is, or you can choose to lose precision in favor of grouping more. You can't keep the detail AND lose the detail in order to sum up further. A query-of-subquery is hence (in practical terms) relatively senseless because you might as well group to the level you want in one hit:
SELECT groupkey1, sum(sumx) FROM
(SELECT groupkey1, groupkey2, sum(x) as sumx FROM table GROUP BY groupkey1, groupkey2)
GROUP BY groupkey1
Is the same as:
SELECT groupkey1, sum(x) FROM
table
GROUP BY groupkey1
Gordon's answer will probably work out (except for the same bug yours exhibits in that the grouping set is wrong/doesn't cover all the columns) but it probably doesn't help much in terms of your understanding because it's a code-only answer. Here's a breakdown of how you need to approach this problem but with simpler data and foregoing the window functions in favor of what you already know.
Suppose there are apples and melons, of different types, in stock. You want a query that gives a total of each specific kind of fruit, regardless of the date of purchase. You also want a column for the total for each fruit overall type:
Detail:
fruit | type | purchasedate | count
apple | golden delicious | 2017-01-01 | 3
apple | golden delicious | 2017-01-02 | 4
apple | granny smith | 2017-01-04 ! 2
melon | honeydew | 2017-01-01 | 1
melon | cantaloupe | 2017-01-05 | 4
melon | cantaloupe | 2017-01-06 | 2
So that's 7 golden delicious, 2 granny smith, 1 honeydew, 6 cantaloupe, and its also 9 apples and 7 melons
You can't do it as one query*, because you want two different levels of grouping. You have to do it as two queries and then (critical understanding point) you have to join the less-precise (apples/melons) results back to the more precise (granny smiths/golden delicious/honydew/cantaloupe):
SELECT * FROM
(
SELECT fruit, type, sum(count) as fruittypecount
FROM fruit
GROUP BY fruit, type
) fruittypesum
INNER JOIN
(
SELECT fruit, sum(count) as fruitcount
FROM fruit
GROUP BY fruit
) fruitsum
ON
fruittypesum.fruit = fruitsum.fruit
You'll get this:
fruit | type | fruittypecount | fruit | fruitcount
apple | golden delicious | 7 | apple | 9
apple | granny smith | 2 | apple | 9
melon | honeydew | 1 | melon | 7
melon | cantaloupe | 6 | melon | 7
Hence for your query, different groups, detail and summary:
SELECT
detail.order || '-' || detail.ordinal_number as order,
detail.company,
summary.order_total,
detail.order_shipment_total,
detail.order_type
FROM (
SELECT o.order,
osh.ordinal_number,
o.company,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_type
) detail
INNER JOIN
(
SELECT o.order,
SUM(osh.items) AS order_total
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
--don't need the where clause; we'll join on order number
GROUP BY o.order,
o.company,
o.order_type
) summary
ON
summary.order = detail.order
Gordon's query uses a window function achieve the same effect; the window function runs after the grouping is done, and it establishes another level of grouping (PARTITION BY ordernumber) which is the effective equivalent of my GROUP BY ordernumber in the summary. The window function summary data is inherently connected to the detail data via ordernumber; it is implicit that a query saying:
SELECT
ordernumber,
lineitemnumber,
SUM(amount) linetotal
sum(SUM(amount)) over(PARTITION BY ordernumber) ordertotal
GROUP BY
ordernumber,
lineitemnumber
..will have an ordertotal that is the total of all the linetotal in the order: The GROUP BY prepares the data to the line level detail, and the window function prepares data to just the order level, and repeats the total as many times are necessary to fill in for every line item. I wrote the SUM that belongs to the GROUP BY operation in capitals.. the sum in lowercase belongs to the partition operation. it has to sum(SUM()) and cannot simply say sum(amount) because amount as a column is not allowed on its own - it's not in the group by. Because amount is not allowed on its own and has to be SUMmed for the group by to work, we have to sum(SUM()) for the partition to run (it runs after the group by is done)
It behaves exactly the same as grouping to two different levels and joining together, and indeed I chose that way to explain it because it makes it more clear how it's working in relation to what you already know about groups and joins
Remember: JOINS make datasets grow sideways, UNIONS make them grow downwards. When you have some detail data and you want to grow it sideways with some more data(a summary), JOIN it on. (If you'd wanted totals to go at the bottom of each column, it would be unioned on)
*you can do it as one query (without window functions), but it can get awfully confusing because it requires all sorts of trickery that ultimately isn't worth it because it's too hard to maintain

You should be able to use window functions:
SELECT o.order || '-' || osh.ordinal_number AS order, o.company,
SUM(SUM(osh.items)) OVER (PARTITION BY o.order) as order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o JOIN
order_shipments osh
ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order, o.company, o.order_type;

SQL select only highest date

For a project I want to generate a price list.
I want to get only the latest prices from each supplier for each article.
There are just those two tables.
Table articles
ARTNR | TXT | ACTIVE | SUPPLIER
------------------------------------------
10 | APPLE | Y | 10
20 | ORANGE | Y | 10
30 | KEYBOARD | N | 20
40 | ORANGE | Y | 20
50 | BANANA | Y | 10
60 | CHERRY | Y | 10
Table prices
ARTNR | PRCGRP | PRCDAT | PRICE
--------------------------------------
10 | 10 | 01-Aug-10 | 2.1
10 | 10 | 05-Aug-11 | 2.2
10 | 10 | 21-Aug-12 | 2.5
20 | 0 | 01-Aug-10 | 2.1
20 | 10 | 09-Aug-12 | 2.3
10 | 10 | 14-Aug-13 | 2.7
This is what I have so far:
SELECT
ARTICLES.[ARTNR], ARTICLES.[TXT], ARTICLES.[ACTIVE], ARTICLES.[SUPPLIER], PRICES.PRCGRP, PRICES.PRCDAT, PRICES.PRICE
FROM
ARTICLES INNER JOIN PRICES ON ARTICLES.ARTNR = PRICES.ARTNR
WHERE
(
(ARTICLES.[ACTIVE]="Y") AND
(ARTICLES.[SUPPLIER]=10) AND
(PRICES.PRCGRP=0) AND
(PRICES.PRCDAT=(SELECT MAX(PRCDAT) FROM PRICES as art WHERE art.ARTNR = PRICES.artnr) )
)
ORDER BY ARTICLES.ARTNR
;
It is okay to choose just one supplier each time, but I want the max price.
The problem is:
Lots of articles do not show up with the query above,
but I cannot figure out what is wrong.
I can see that they should be in the resultset when I leave out the subselect on max prcdat.
What is wrong?

Your subquery to get the latest price does not take the other conditions into account, that is when you're getting the latest price, you may get a price in another price group or that is not active. When you join that against the filtered list that has no inactive prices and only prices in a single price group, you get no hits that exist in both.
Either you need to duplicate or - better - move your conditions inside the subquery to get the best price under the conditions. I can't test against access, but something like this should be possible if the SQL is not too limited;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
JOIN (
SELECT a.artnr, MAX(p.prcdat) prcdat
FROM articles a JOIN prices p ON a.artnr = p.artnr
WHERE a.active='Y' AND a.supplier=10 AND p.prcgrp=10
GROUP BY a.artnr) z
ON a.artnr = z.artnr AND p.prcdat = z.prcdat
ORDER BY a.ARTNR
If the SQL support in access won't allow a join with a subquery, you can just move the conditions inside your existing subquery, something like;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
WHERE p.prcdat = (
SELECT MAX(p2.prcdat)
FROM articles a2 JOIN prices p2 ON a2.artnr = p2.artnr
WHERE a.artnr = a2.artnr AND a2.active='Y' AND a2.supplier=10 AND p2.prcgrp=10
)
ORDER BY a.ARTNR;
Note that due to limitations in identifying a unique price (no primary key in prices), the queries may give duplicates if several prices for the same article have the same prcdat. If that's a problem, you'll probably need to duplicate your conditions outside the subquery too.

MIN() Function in SQL

Need help with Min Function in SQL
I have a table as shown below.
+------------+-------+-------+
| Date_ | Name | Score |
+------------+-------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/05 | Jones | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/06 | James | 3 |
| 2012/07/07 | Hugo | 1 |
| 2012/07/07 | Jack | 1 |
| 2012/07/07 | Jim | 2 |
+------------+-------+-------+
I would like to get the output like below
+------------+------+-------+
| Date_ | Name | Score |
+------------+------+-------+
| 2012/07/05 | Jack | 1 |
| 2012/07/06 | Jill | 2 |
| 2012/07/07 | Hugo | 1 |
+------------+------+-------+
When I use the MIN() function with just the date and Score column I get the lowest score for each date, which is what I want. I don't care which row is returned if there is a tie in the score for the same date. Trouble starts when I also want name column in the output. I tried a few variation of SQL (i.e min with correlated sub query) but I have no luck getting the output as shown above. Can anyone help please:)
Query is as follows
SELECT DISTINCT
A.USername, A.Date_, A.Score
FROM TestTable AS A
INNER JOIN (SELECT Date_,MIN(Score) AS MinScore
FROM TestTable
GROUP BY Date_) AS B
ON (A.Score = B.MinScore) AND (A.Date_ = B.Date_);

Use this solution:
SELECT a.date_, MIN(name) AS name, a.score
FROM tbl a
INNER JOIN
(
SELECT date_, MIN(score) AS minscore
FROM tbl
GROUP BY date_
) b ON a.date_ = b.date_ AND a.score = b.minscore
GROUP BY a.date_, a.score
SQL-Fiddle Demo
This will get the minimum score per date in the INNER JOIN subselect, which we use to join to the main table. Once we join the subselect, we will only have dates with names having the minimum score (with ties being displayed).
Since we only want one name per date, we then group by date and score, selecting whichever name: MIN(name).
If we want to display the name column, we must use an aggregate function on name to facilitate the GROUP BY on date and score columns, or else it will not work (We could also use MAX() on that column as well).

Please learn about the GROUP BY functionality of RDBMS.
SELECT Date_,Name,MIN(Score)
FROM T
GROUP BY Name
This makes the assumption that EACH NAME and EACH date appears only once, and this will only work for MySQL.
To make it work on other RDBMSs, you need to apply another group function on the Date column, like MAX. MIN. etc

SELECT T.Name, T.Date_, MIN(T.Score) as Score FROM T
GROUP BY T.Date_

Edit: This answer is not corrected as pointed out by JNK in comments
SELECT Date_,MAX(Name),MIN(Score)
FROM T
GROUP BY Date_
Here I am using MAX(NAME), it will pick one name if two names were found with the same goal numbers.
This will find Min score for each day (no duplicates), scored by any player. The name that starts with Z will be picked first than the name that starts with A.
Edit: Fixed by removing group by name

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why when using MIN function and selecting another column, we require GROUP BY clause? Doesn't MIN return single record? - sql

Related

SQL MIN() with GROUP BY select additional columns

Create multiple filtered result sets of a joined table for use in aggregate functions

How do you use two SUM() aggregate functions in the same query for PostgreSQL?

SQL select only highest date

MIN() Function in SQL

Categories

Resources