How to use SUM and MAX on the same column? - sql

So I'm having an SQL table where I will need find out the product that has been purchased the most meaning that I need to do a SUM and a group by on all the quantity of the products:
SELECT PRODUCT_ID, SUM(QUANTITY) FROM PURCHASE GROUP BY PRODUCT_ID
However when I try to find the product with the maximum amount of purchases it gives me an error:
SELECT MAX(QUANTITY) FROM(SELECT PRODUCT_ID, SUM(QUANTITY) FROM PURCHASE GROUP BY PRODUCT_ID)
Any ideas?

Just order by and keep the top record only:
SELECT PRODUCT_ID, SUM(QUANTITY) SUM_QUANTITY
FROM PURCHASE
GROUP BY PRODUCT_ID
ORDER BY SUM_QUANTITY DESC
LIMIT 1
The actual syntax might vary accross RDBMS. The above would work in MySQL and Postgres.
In SQL Server, you would use SELECT TOP (1) ... ORDER BY SUM_QUANTITY DESC.
In Oracle >= 12c, you would use SELECT ... ORDER BY SUM_QUANTITY DESC FETCH FIRST ROW ONLY.
You also have to consider the possibilty of ties in the first position, for which there are different strategies depending on your requirement and RDBMS.

Related

Find the nth greatest value per group in SQL

I'm trying to find the nth greatest value in each group in a table; is there an efficient way to do this in SQL? (specifically Google BigQuery, if that's relevant)
For example, suppose we had a table sales with two fields, customer_id and amount, where each record corresponds to the sale of an item to a customer for a given amount. If I wanted the top sale to each customer, I could do
SELECT customer_id, MAX(amount) top_amount
FROM sales
GROUP BY customer_id;
If I instead wanted the 5th greatest value for each customer, is there an efficient/idiomatic way to do that in SQL?
Consider below approach
SELECT customer_id, array_agg(amount order by amount desc limit 5)[safe_offset(4)] top_5th_amount
FROM sales
GROUP BY customer_id;
Yet another option with use of nth_value() function
SELECT distinct customer_id,
nth_value(amount, 5) over win top_5th_amount
FROM sales
window win as (partition by customer_id order by amount desc rows between unbounded preceding and unbounded following )
You can use qualify:
select s.*
from sales s
where 1=1
qualify row_number() over (partition by customer_id order by amount desc) = 5;
Note: You question is unclear on how to handle tied amounts. This treats them as separate amounts (so the 5th could be the same as the 1st). If you want the 5th largest distinct value, use dense_rank() instead.

Combining two queries using JOIN, GROUP BY and SUM?

I am stuck trying to write the correct query for this problem. So I have 2 tables orders and products where orders.user_id=products.buyer_id
I want to query from both tables and find out how much each person owes for their purchase, how much they actually paid, and finally the difference between the two (owes-paid).
The individual queries that work are
SELECT buyer_id, SUM(price) AS owes FROM products GROUP BY buyer_id ORDER BY buyer_id ASC;
and
SELECT user_id, SUM(amount_paid) AS paid FROM orders GROUP BY user_id ORDER BY user_id ASC;
I am able to do the right query but only on each table individually. However, when trying to combine both queries (Outer Join?), I get bad results.
Any help/guidance is appreciated.
I would suggest a sub query where you take the union of both, and dedicate a column to the paid amount and another to the due amount. Then apply the aggregation on that sub query:
SELECT user_id,
SUM(amount_due) AS owes,
SUM(amount_paid) AS paid,
SUM(amount_due) - SUM(amount_paid) AS diff
FROM (
SELECT user_id, amount_paid, 0 amount_due
FROM orders
UNION ALL
SELECT buyer_id, 0, price
FROM products
) AS transactions
GROUP BY user_id
ORDER BY user_id ASC;

How do I find out what items are less purchased in the store؟

We have two table , the first is
products (pro_id,pro_name,supleir_id,quantity,unit,price,enter_date)
the second table is
customers(cus_id,cus_name,purchased_item,pro_id,quantity,total_price,date,invoice)
We want to create a procedure using PLSQL to know which products are less purchased by customers
I think "less purchased" means the product id with the lowest occurrences. However, that is still ambiguous whether it means lowest number of times purchased (count of pro_id) or lowest number of items sold (sum of quantity) from customers.
Either way one method would to rank the target value and then select only the the rows with the lowest ranking. It creates a Subquery Factoring (more commonly known as CTE) to rank the product from the Customers table by quantity. It then Joins the result with the Product table. ( See Demo )
with less_purchased(pro_id, number_purchased) as
( select pro_id, cnt
from (-- rank each product by quantity
select pro_id, cnt, dense_rank() over( order by cnt desc) rnk
from (-- get sum of quantity for each product
select pro_id, sum(quantity) cnt
from customers
group by pro_id
) sum_query
) rank_query
-- discard all but rank 1
where rnk = 1
)
-- get product information and quantity purchased
select p.*, lp.number_purchased
from products p
join less_purchased lp on (lp.pro_id = p.pro_id);
The above may produce output with multiple product rows as products with the same quantity will have the same rank.
To get same for number of times purchased replace sum(quantity) with count(*) and remove desc or order by.

how to match a value with SQL max(count) function?

I have a orderLine table looks like this
I would like to know which pizza is the best seller, and the quantity of pizza sold.
I've tried query:
select sum(quantity), pizza_name from order_line group by pizza_name;
it returns
which is almost what I want, But when I start adding Max function, it could not match the pizza name with the total quantity of pizza sold
For example:
select MAX(sum(quantity)), pizza_name from order_line group by pizza_name;
it returns following error:
"not a single-group group function"
I guess I could achieve this by using a sub-query, but I have no idea how to do this.
You don't need max for this. If you only want one pizza, then you can use order by and fetch first 1 row only (or something similar such as limit or top):
select sum(quantity), pizza_name
from order_line
group by pizza_name
order by sum(quantity)
fetch first 1 row only;
Or, if you want all such pizzas, use rank():
select p.*
from (select sum(quantity) as quantity, pizza_name,
rank() over (order by sum(quantity) desc) as seqnum
from order_line
group by pizza_name
) p
where seqnum = 1;
Both of the queries give the same desired result
SELECT PIZZA_NAME,
SUM(QUANTITY) "Total Quant"
FROM Order_line
GROUP BY PIZZA_NAME
ORDER BY "Total Quant" DESC
FETCH FIRST 1 row only;
SELECT PIZZA_NAME, "Total Quantity" FROM (
SELECT PIZZA_NAME,SUM(QUANTITY) "Total Quantity", RANK() OVER (ORDER BY SUM(QUANTITY) DESC) T FROM Order_line GROUP BY PIZZA_NAME
) query1 where query1.T=1 ;
You group by pizza_name to get sum(quantity) per pizza_name.
Then you aggregate again by using MAX on the quantity sum, but you don't specify which of the three pizza names to have in the result. You need an aggregate function on pizza_name as well, which you don't have. Hence the error.
If you want to use your query, you must apply the appropriate aggregation function on pizza_name, which is KEEP DENSE_RANK FIRST/LAST.
select
max(sum(quantity)),
max(pizza_name) keep (dense_rank last order by sum(quantity))
from order_line
group by pizza_name;
But on one hand Gordon's queries are more readable in my opinion. And on the other this double aggregation is Oracle specific and not SQL standard. Unexperienced readers may be confused that the query produces one result row in spite of the GROUP BY clause.

Join to replace sub-query

I am almost a novie in database queries.
However,I do understand why and how correlated subqueries are expensive and best avoided.
Given the following simple example - could someone help replacing with a join to help understand how it scores better:
SQL> select
2 book_key,
3 store_key,
4 quantity
5 from
6 sales s
7 where
8 quantity < (select max(quantity)
9 from sales
10 where book_key = s.book_key);
Apart from join,what other option do we have to avoid the subquery.
In this case, it ought to be better to use a windowed-function on a single access to the table - like so:
with s as
(select book_key,
store_key,
quantity,
max(quantity) over (partition by book_key) mq
from sales)
select book_key, store_key, quantity
from s
where quantity < s.mq
Using Common Table Expressions (CTE) will allow you to execute a single primary SELECT statement and store the result in a temporary result set. The data can then be self-referenced and accessed multiple times without requiring the initial SELECT statement to be executed again and won't require possibly expensive JOINs. This solution also uses ROW_NUMBER() and the OVER clause to number the matching BOOK_KEYs in descending order based off of the quantity. You will then only include the records that have a quantity that is less than the max quantity for each BOOK_KEY.
with CTE as
(
select
book_key,
store_key,
quantity,
row_number() over(partition by book_key order by quantity desc) rn
from sales
)
select
book_key,
store_key,
quantity
from CTE where rn > 1;
Working Demo: http://sqlfiddle.com/#!3/f0051/1
Apart from join,what other option do we have to avoid the subquery.
You use something like this:
SELECT select max(quantity)
INTO #myvar
from sales
where book_key = s.book_key
select book_key,store_key,quantity
from sales s
where quantity < #myvar