Select row with smallest number on multiple groups of same ids - sql

I have the following table as an output from a sql statement
user | product | price
…
123 | 12 | 451.29
373 | 12 | 637.28
623 | 12 | 650.84
672 | 16 | 356.87
123 | 16 | 263.90
…
Now I want to get only the row with the smallest price for each product_id
THE SQL is fairly easy
SELECT user, product, price
FROM t
WHERE product IN (
SELECT product_id
FROM p
WHERE typ LIKE 'producttyp1'
)
)
but adding MIN(price) does not work how it usually do. I think its because there are several groups of the same product_ids in the same table. Is there an easy to use solution or do I have to rewrite the whole query?
Edit: when I delete user from the query I can get the product and the smallest price:
12 | 451.29
16 | 263.90
But now I would have to join the user, which I am trying to avoid.

You can use row_number():
select p.*
from (select p.*,
row_number() over (partition by product order by price asc) as seqnum
from p
) p
where seqnum = 1;

Related

Postgres query - get all records of lowest price per ID [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 10 months ago.
I have items table where I store information about items and their prices.
It looks like this:
id | title | item_code | price | site_id | store_id
I want to select all item rows with the lowest price per item_code. It means the query should return ONE row per item_code in my table, which contains the lowest price.
I'm using PostgreSQL.
Not sure where to start. Example DB data:
id | title | item_code | price | site_id | store_id
1 | Shampoo | TEST1 | 10 | 1 | 1
2 | Shampoo | TEST1 | 5 | 2 | 1
3 | Shampoo | TEST1 | 12 | 2 | 1
Use DISTINCT ON:
SELECT DISTINCT ON (item_code) *
FROM items
ORDER BY item_code, price;
See the demo.
Group your result set and use the MIN aggregate function:
SELECT item_code
, MIN(price) min_price
FROM items
GROUP BY item_code
;
Join the result of this query with the original table if you need the the complete item record:
SELECT it.*
FROM items it
JOIN (
SELECT item_code
, MIN(price) min_price
FROM items
GROUP BY item_code
) gi ON ( gi.item_code = it.item_code )
WHERE it.price = gi.min_price
;
See a live demo here on dbfiddle.co.uk
You can also use ROW_NUMBER().
SELECT a.id,
a.title,
a.item_code,
a.price,
a.site_id,
a.store_id
FROM
(
SELECT *, row_number() over(partition by item_code order by price) rn
FROM items
) a WHERE a.rn=1;

Grouping in SQL Table [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
Suppose I have a Table such that:
|ID | product |orderid | brand |number of product cust ord|
|----|---------|--------|-------|--------------------------|
| 1 | 123 | 111 | br | 1 |
|----|---------|--------|-------|--------------------------|
| 1 | 234 | 111 | br | 1 |
|----|---------|--------|-------|--------------------------|
| 1 | 345 | 333 | br | 1 |
|----|---------|--------|-------|--------------------------|
| 2 | 123 | 211 | br | 1 |
|----|---------|--------|-------|--------------------------|
| 2 | 456 | 212 | br | 2 |
|----|---------|--------|-------|--------------------------|
| 3 | 567 | 213 | br | 1 |
|----|---------|--------|-------|--------------------------|
What I'd like to do is group them as:
|ID | brand |number of product cust ord|
|----|---------|--------------------------|
| 1 | br | 3 |
|----|---------|--------------------------|
| 2 | br | 4 |
|----|---------|--------------------------|
further to that i'd like to classify them and tried a case...when but can't seem to get it right.
if ID purchases more than 3 unique products and orders more than twice- i'd like to call them a frequent buyer (in the above example, ID '1' would be a 'frequent buyer'), if the average number of products they purchase is higher than the average number of that product sold - i'd like to call them a 'merchant', else just a purchaser.
I've renamed the last field to qty for brevity and called the table test1.
To get frequent flyers use below query. Note that I used >= instead of >. I changed this based on your example where ID 1 is a "frequent flyer" even though he only bought 3 products, not more than 3.
SELECT ID, count(distinct product) as DistinctProducts, count(distinct orderid) DistinctOrders
FROM test1
GROUP BY ID
HAVING count(distinct product) >= 3 and count(distinct orderid) >= 2
Not sure if I understood the merchant logic correctly. Below is the query which will give you customers that on average purchased more than overall average of product for any given product. There are none in the data.
SELECT DISTINCT c.ID
FROM
(select ID, product, avg(qty) as AvgQty
FROM test1
GROUP BY ID, product) as c
FULL OUTER JOIN
(select product, avg(qty) as AvgQty
FROM test1
GROUP BY product) p ON p.product = c.product
WHERE c.AvgQty > p.AvgQty;
To get "purchasers" do EXCEPT between all customer and the UNION of merchants and frequent buyers:
select distinct ID from test1
EXCEPT
(SELECT ID FROM (
select ID, count(distinct product) as DistinctProducts, count(distinct orderid) DistinctOrders
FROM test1
GROUP BY ID
HAVING count(distinct product) >= 3 and count(distinct orderid) >= 2) t
UNION
SELECT DISTINCT c.ID
FROM
(select ID, product, avg(qty) as AvgQty
FROM test1
GROUP BY ID, product) as c
FULL OUTER JOIN
(select product, avg(qty) as AvgQty
FROM test1
GROUP BY product) p ON p.product = c.product
WHERE c.AvgQty > p.AvgQty
);
This is one way that you could do it. Note that according to the description you gave, buyers could be constantly being reclassified between 'Merchant' and 'Purchaser' as the average goes up and down. That might not be what you want.
With cte As (
Select ID,
Brand,
DistinctOrders = Count(Distinct OrderID), -- How many separate orders by this customer for the brand?
DistinctProducts = Count(Distinct Product), -- How many different products by this customer for the brand?
[number of product cust ord] = Sum(CountOfProduct), -- Total number of items by this customer for the brand.
AverageCountOfProductPerBuyer =
Sum(Sum(CountOfProduct)) Over () * 1.0 / (Select Count(*) From (Select Distinct ID, Brand From #table) As tbl)
-- Average number of items per customer (for all customers) for this brand
From #table
Group By ID, Brand)
Select ID, Brand, DistinctOrders, DistinctProducts, [number of product cust ord],
IsFrequentBuyer = iif(DistinctOrders > 1 And DistinctProducts > 2, 'Frequent Buyer', NULL),
IsMerchant = iif(AverageCountOfProductPerBuyer < [number of product cust ord], 'Merchant', 'Purchaser')
From cte;
This query could be written without the common-table expression, but was written this way to avoid defining expressions multiple times.
Note that I have the first ID as a 'Frequent Buyer' based on your description, so I'm assuming that when you say 'more than 3 unique products' you mean 3 or more. Likewise with two or more distinct orders.

GROUP BY PostgreSQL query where I need a column that is not in the GROUP BY clause [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 6 years ago.
I have a database that parallels the 'widget' database below.
widget_id | vendor_id | price
------------------------------
1 | 101 | 10.00
2 | 101 | 9.00
3 | 102 | 6.00
4 | 102 | 7.00
I want to find the cheapest widget by vendor, so something like the below output:
widget_id | vendor_id | price
------------------------------
1 | 101 | 10.00
3 | 102 | 6.00
In MySQL or SQLite, I could query
SELECT widget_id, vendor_id, min( price ) AS price FROM widgets GROUP BY( vendor_id )
However, it seems that this is contrary to the SQL spec. In PostgreSQL, I'm unable to run the above query. The error message is "widget_id must appear in the GROUP BY clause or be used in an aggregate function". I can kind of see PostgreSQL's point, but it seems like a perfectly reasonable thing to want the widget_id of the widget that has the minimum price.
What am I doing wrong?
You can use DISTINCT ON:
SELECT DISTINCT ON (vendor_id) *
FROM widget
ORDER BY vendor_id, price;
You can also use the row_number window function in a subquery:
SELECT widget_id, vendor_id, price
FROM (
SELECT *, row_number() OVER (PARTITION BY vendor_id ORDER BY price) AS rn
FROM widget
) t
WHERE rn=1;
Finaly, you can also do it with a LATERAL join:
SELECT t2.*
FROM
(SELECT DISTINCT vendor_id FROM widget) t1,
LATERAL (SELECT * FROM widget WHERE vendor_id=t1.vendor_id ORDER BY price LIMIT 1) t2

How to retrieve specific columns in a GROUP BY sql statement?

Table airports:
id | from | to | price | photo | notes
_______________________________________
1 | LON | JFK| 1000 | | test
2 | LON | JFK| 2000 | | test2
I want to retrieve the bestprice entry of all from-to combinations inside the database.
I want to fetch the whole record that is minprice found, or at least specific tables.
The following works, BUT only gives me the 3 columns from, to, price. Not the whole entity.
SELECT from, to, min(price) FROM airports GROUP BY from, to
How would I have to adapt this?
This is typically done using window functions:
select id, "from", "to", price, photo, notes
from (
select id, "from", "to", price, photo, notes
min(price) over (partition by "from", "to") as min_price
from the_table
) t
where price = min_price
order by id;
from is a reserved word and it's a bad idea to use that as a column name (not entirely sure about to)
To deal with "ties" (same values in from, to and price), you can use the dense_rank() function instead:
select id, "from", "to", price, photo, notes
from (
select id, "from", "to", price, photo, notes
dense_rank() over (partition by "from", "to" order by price) as price_rank
from the_table
) t
where price_rank = 1
order by id;
You can order the results and use distinct on to take the first result from each grouping
select distinct on (from,to) * from airports order by from,to,price asc;
the above query should work
A very simple solution would be this. SQLFiddle here
SELECT *
FROM airports
WHERE (from_place, to_place, price) =
(SELECT from_place, to_place, min(price)
FROM airports
GROUP BY from_place, to_place);
Use SELECT * FROM ... since you want the whole entity.
There isn't going to be a way to get a "whole entity" there could be many rows in your table that could contain the matching from + to + min price
For example if your table contains
id | from | to | price | photo | notes
_______________________________________
1 | LON | JFK| 1000 | | test
2 | LON | JFK| 2000 | | test2
3 | LON | JFK| 5000 | | test3
4 | LON | JFK| 2000 | | test4
5 | LON | JFK| 1000 | | test5
Then both rows 1 and 5 meet your criteria of from + to + min price.
You could write the query
SELECT id, from, to, price, photo, notes
FROM airports a
INNER JOIN (
SELECT from, to, min(price) [price]
FROM airports
GROUP BY from, to) sub
ON sub.from = a.from
AND sub.to = a.to
AND sub.price = a.price
Which would get you the matching records.
If you want to get the entire data, here is a query that solve your problem:
SELECT A.*
FROM airports A
INNER JOIN (SELECT A2.fromhere
,A2.tohere
,MIN(A2.price) AS minprice
FROM airports A2
GROUP BY A2.fromhere, A2.tohere) T ON T.fromhere = A.fromhere
AND T.tohere = A.tohere
AND T.minprice = A.price
The jointure is used to get only the best prices for each couple fromhere/tohere.
Hope this will help you.

Aggregate highest prices per client of salesmen

I have a table like this:
SELECT * FROM orders;
client_id | order_id | salesman_id | price
-----------+----------+-------------+-------
1 | 167 | 1 | 65
1 | 367 | 1 | 27
2 | 401 | 1 | 29
2 | 490 | 2 | 48
3 | 199 | 1 | 68
3 | 336 | 2 | 22
3 | 443 | 1 | 84
3 | 460 | 2 | 92
I want to find the an array of order_ids for each of the highest priced sales for each unique salesman and client pair. In this case I want the resulting table:
salesman_id | order_id
-------------+----------------
1 | {167, 401, 443}
2 | {490, 460}
So far I have an outline for a query:
SELECT salesman_id, max_client_salesman(order_id)
FROM orders
GROUP BY salesman_id;
However I'm having trouble writing the aggregate function max_client_salesman.
The documentation online for aggregate functions and arrays in postgres is very minimal. Any help is appreciated.
Standard SQL
I would combine the window function last_value() or firstvalue() with DISTINCT to the get the orders with the highest price per (salesman_id, client_id) efficiently and then aggregate this into the array you are looking for with the simple aggregate function array_agg().
SELECT salesman_id
,array_agg(max_order_id) AS most_expensive_orders_per_client
FROM (
SELECT DISTINCT
salesman_id, client_id
,last_value(order_id) OVER (PARTITION BY salesman_id, client_id
ORDER BY price
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS max_order_id
FROM orders
) x
GROUP BY salesman_id
ORDER BY salesman_id;
Returns:
salesman_id | most_expensive_orders_per_client
-------------+------------------------------------
1 | {167, 401, 443}
2 | {490, 460}
SQL Fiddle.
If there are multiple highest prices per (salesman_id, client_id), this query pick only one order_id arbitrarily - for lack of definition.
For this solution it is essential to understand that window functions are applied before DISTINCT. How you to combine DISTINCT with a window function:
PostgreSQL: running count of rows for a query 'by minute'
For an explanation on ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING refer to this closely related answer on dba.SE.
Simper with non-standard DISTINCT ON
PostgreSQL implements, as extension to the SQL standard, DISTINCT ON. With it you can very effectively select rows unique according to a defined set of columns.
It won't get simpler or faster than this:
SELECT salesman_id
,array_agg(order_id) AS most_expensive_orders_per_client
FROM (
SELECT DISTINCT ON (1, client_id)
salesman_id, order_id
FROM orders
ORDER BY salesman_id, client_id, price DESC
) x
GROUP BY 1
ORDER BY 1;
SQL Fiddle.
I also use positional parameters for shorter syntax. Details:
Select first row in each GROUP BY group?
I think you want the Postgres function array_agg in combination with row_number() However, your description of the query does not make sense to me.
The following gets clients and salesmen and the list of orders for the highest priced order by salesman:
select client_id, salesman_id, array_agg(order_id)
from (select o.*,
row_number() over (partition by salesman_id order by price desc) as sseqnum,
row_number() over (partition by client_id order by price desc) as cseqnum
from orders o
) o
where sseqnum = 1
group by salesman_id, client_id
I don't know what you mean by "highest priced sales for each salesman and client". Perhaps you want:
where sseqnum = 1 or cseqnum = 1