Query to order data while maintaining grouping? - sql

I have a request which I can accomplish in code but am wondering if it is at all possible do do on SQL alone. I have a products table that has a Category column and a Price column. What I want to achieve is all of the products grouped together by Category, and then ordered by the cheapest to most expensive in both the category and all the categories combined. So for example :
Category | Price
--------------|---------------------
Basin | 500
Basin | 700
Basin | 750
Accessories | 550
Accessories | 700
Accessories | 1000
Bath | 700
As you can see the cheapest item is a basin for 500, then an Accessory for 550 then a bath for 700. So I need the categories of products to be sorted by their cheapest item, and then each category itself in turn to be sorted cheapest to most expensive.
I have tried partitioning, grouping sets ( which i know nothing about ) but still no luck so eventually resorted to my strength ( C# ) but would prefer to do it straight in SQL if possible. One last side note : This query is hit quite often so performance is key so if possible i would like to avoid temp tables / cursors etc

I think using MIN() with a window (OVER) makes it clearest what the intent is:
declare #t table (Category varchar(19) not null,Price int not null)
insert into #t (Category,Price) values
('Basin',500),
('Basin',700),
('Basin',750),
('Accessories',550),
('Accessories',700),
('Accessories',1000),
('Bath',700)
;With FindLowest as (
select *,
MIN(Price) OVER (PARTITION BY Category) as Lowest
from
#t
)
select * from FindLowest
order by Lowest,Category,Price
If two categories share the same lowest price, this will still keep the two categories separate and sort them alphabetically.

Select...
Order by category, price desc

SELECT p.category,p.price
FROM products p,(select category,min(price) mn from products group by category order by mn) tab1
WHERE p.category=tab1.category
GROUP BY p.category,p.price,tab1.mn
order by tab1.mn,p.category;
Is this what you want?

I think, you do not need GROUP BY clause in your query. If I got your goal correctly, you can try by substituting actual categories in your ORDER BY clause with the minimum price per category inside the subquery.That will allow you getting all the categories sequential, i.e. not Basin - 500; Accessories - 550, but everything for Basin first. After that, you can group by ordinary Price inside each category.
SELECT *
FROM products p
ORDER BY
(SELECT MIN(Price) FROM products p2 WHERE p2.Category=p.Category
),
Price;

Related

Aggregate functions on a single column only?

My learning goal: is to find how to find an ingredient and see which recipe uses any given ingredient the most.
E.g.
+------------+--------------+--------+
| Pizza | Ingredient | Amount |
+------------+--------------+--------+
| Anchovy | Anchovy | 200 |
+------------+--------------+--------+
| Meatlovers | Pepparoni | 150 |
+------------+--------------+--------+
| X pizza | X ingredient | 50 |
+------------+--------------+--------+
Through:
(a) SELECT INGREDIENT,MAX(AMOUNT) FROM RECIPE GROUP BY INGREDIENT;
Works wonderfully, however I wish to know the pizza name of the recipe.
(b) SELECT NAME,INGREDIENT,MAX(AMOUNT) FROM RECIPE GROUP BY INGREDIENT,NAME;
Doesn't work as expected -I want the name to be appended to result set of (a). Although, what I get is all pizza, ingredient, max amounts. I'm assuming the max function is applying itself to the pizza column as well, which I do not want. Is there a way to specify an aggregate function to only be applied to two desired columns and leave one (only for viewing purposes).
PostgreSql supports window functions, so the easy way is this:
SELECT Pizza,
Ingredient,
MAX(Amount) OVER(PARTITION BY Ingredient) As MaxAmount
FROM Recipe
Reading the question again, following Damien's comment, I think that what you are asking will not get you the results you want.
In the beginning of the question, you wrote:
My learning goal: is to find how to find an ingredient and see which recipe uses any given ingredient the most. see which recipe uses any given ingredient the most.
Later you wrote:
I want the name to be appended to result set of (a)
These statements conflict.
To know which pizza is using the most of a specific ingredient, as you stated in your first statement, use the (b) query from your question. You can order the results of it by ingredient, following the MAX(AMOUNT) column in a descending order - this will enable you to see what pizza is using the most of each ingredient easily.
SELECT Name, Ingredient, MAX(Amount) AS MaxAmount
FROM Recipe
GROUP BY Ingredient,Name
ORDER BY Ingredient, MaxAmount DESC;
The query in my answer, however, will get you what you what you are asking in your second statement - get the maximum value for each ingredient, grouped only by ingredient, but adding the pizza name to the result set. (In other words - append the pizza name to the result set of (a))
A standard modern approach to this would be to use a window function to assign row numbers:
SELECT
*
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Ingredient ORDER BY Amount DESC) as rn
FROM
Recipe) r
where
r.rn = 1
This will arbitrarily select one row as the top row if there are multiple rows with the same highest Amount for a particular ingredient. To take more control over the query to break ties, add another ORDER BY expression within the OVER clause. In the alternative, if you wish to see all tying rows, use RANK() instead of ROW_NUMBER().
use corelated sub-query
SELECT r.*
FROM RECIPE AS r
where r.AMOUNT =
( select MAX(AMOUNT)
FROM RECIPE r1 where
r1.INGREDIENT=r.INGREDIENT
GROUP BY r1.INGREDIENT
)

POSTGRESQL - Finding specific product when

I've attempted to write a query but I've not managed to get it working correctly.
I'm attempting to retrieve where a specific product has been bought but where it also has been bought with other products. In the case below, I want to find where product A01 has been bought but also when it was bought with other products.
Data (extracted from tables for illustration):
Order | Product
123456 | A01
123457 | A01
123457 | B02
123458 | C03
123459 | A01
123459 | C03
Query which will return all orders with product A01 without showing other products:
SELECT
O.NUMBER
O.DATE
P.NUMBER
FROM
ORDERS O
JOIN PRODUCTS P on P.ID = O.ID
WHERE
P.NUMBER = 'A01'
I've tried to create a sub query which brings back just orders of product A01 but I don't know how to place it in the query for it to return all orders containing product A01 as well as any other product ordered with it.
Any help on this would be very grateful.
Thanks in advance.
You can use conditional SUM to detect if one ORDER group have one ore more 'A01'
CREATE TABLE orders
("Order" int, "Product" varchar(3))
;
INSERT INTO orders
("Order", "Product")
VALUES
(123456, 'A01'),
(123457, 'A01'),
(123457, 'B02'),
(123458, 'C03'),
(123459, 'A01'),
(123459, 'C03')
;
SELECT "Order"
FROM orders
GROUP BY "Order"
HAVING SUM(CASE WHEN "Product" = 'A01' THEN 1 ELSE 0 END) > 0
I appreciated Juan's including the DDL to create the database on my system. By the time I saw it, I'd already done all the same work, except that I got around the reserved word problem by naming that field Order1.
Sadly, I didn't consider that either of the offered queries worked on my system. I used MySQL.
The first one returned the A01 lines of the two orders on which other products were ordered too. I took Alex's purpose to include seeing all items of all orders that included A01. (Perhaps he wants to tell future customers what other products other customers have ordered with A01, and generate sales that way.)
The second one returned the three A01 lines.
Maybe Alex wants:
select *
from orders
where Order1 in (select Order1
from orders
where Product = 'A01')
It outputs all lines of all orders that include A01. The subquery makes a list of all orders with A01. The first query returns all lines of those orders.
In a big database, you might not want to run two queries, but this is the only way I see to get the result I understood Alex wanted. If that is what he wanted, he would have to run a second query once armed with output from the queries offered, so there's no real gain.
Good discussion. Thanks to all!
Use GROUP BY clause along with HAVING like
select "order", Product
from data
group by "order"
having count(distinct product) > 1;

SQL SUM with Repeating Sub Entries - Best Practice?

I hit this issue regularly but here is an example....
I have a Order and Delivery Tables. Each order can have one to many Deliveries.
I need to report totals based on the Order Table but also show deliveries line by line.
I can write the SQL and associated Access Report for this with ease ....
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
until I get to the summing element. I obviously only want to sum each Order once, not the 1-many times there are deliveries for that order.
e.g. The SQL might return the following based on 2 Orders (ignore the banalness of the report, this is very much simplified)
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
South 3 £50 23-04-2012
I would want to report:
Total Sales North - £173
Delivery 12-04-2012
Delivery 14-04-2012
Delivery 01-05-2012
Delivery 03-05-2012
Delivery 07-05-2012
Total Sales South - £50
Delivery 23-04-2012
The bit I'm referring to is the calculation of the £173 and £50 which the first of which obviously shouldn't be £419!
In the past I've used things like MAX (for a given Order) but that seems like a fudge.
Surely there must be a regular answer to this seemingly common problem but I can't find one.
I don't necessarily need the code - just a helpful point in the right direction.
Many thanks,
Chris.
A roll up operator may not look pretty. However, it would do the regular aggregates that you see now, and it show the subtotals of the order. This is what you're looking for.
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
GROUP BY xxx
WITH ROLLUP;
I'm not exactly sure how the rest of your query is set up, but it would look something like this:
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
NULL NULL f419 NULL
I believe what you want is called a windowing function for your aggregate operation. It looks like the following:
SELECT xxx, SUM(Value) OVER (PARTITION BY Order.Region) as OrderTotal
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
Here's the MSDN article. The PARTITION BY tells the SUM to be done separately for each distinct Order.Region.
Edit: I just noticed that I missed what you said about orders being counted multiple times. One thing you could do is SUM() the values before joining, as a CTE (guessing at your schema a bit):
WITH RegionOrders AS (
SELECT Region, OrderNo, SUM(Value) OVER (PARTITION BY Region) AS RegionTotal
FROM Order
)
SELECT Region, OrderNo, Value, DeliveryDate, RegionTotal
FROM RegionOrders RO
INNER JOIN Delivery D on D.OrderNo = RO.OrderNo

JavaDB: get ordered records in the subquery

I have the following "COMPANIES_BY_NEWS_REPUTATION" in my JavaDB database (this is some random data just to represent the structure)
COMPANY | NEWS_HASH | REPUTATION | DATE
-------------------------------------------------------------------
Company A | 14676757 | 0.12345 | 2011-05-19 15:43:28.0
Company B | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company C | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company A | -7874564 | 0.12345 | 2011-05-19 15:43:28.0
One news_hash may relate to several companies while a company can relate to several news_hashes as well. Reputation and date are bound to the news_hash.
What I need to do is calculate the average reputation of last 5 news for every company. In order to do that I somehow feel that I need to user 'order by' and 'offset' in a subquery as shown in the code below.
select COMPANY, avg(REPUTATION) from
(select * from COMPANY_BY_NEWS_REPUTATION order by "DATE" desc
offset 0 rows fetch next 5 row only) as TR group by COMPANY;
However, JavaDB allows neither ORDER BY, nor OFFSET in a subquery. Could anyone suggest a working solution for my problem please?
Which version of JavaDB are you using? According to the chapter TableSubquery in the JavaDB documentation, table subqueries do support order by and fetch next, at least in version 10.6.2.1.
Given that subqueries can be ordered and the size of the result set can be limited, the following (untested) query might do what you want:
select COMPANY, (select avg(REPUTATION)
from (select REPUTATION
from COMPANY_BY_NEWS_REPUTATION
where COMPANY = TR.COMPANY
order by DATE desc
fetch first 5 rows only))
from (select distinct COMPANY
from COMPANY_BY_NEWS_REPUTATION) as TR
This query retrieves all distinct company names from COMPANY_BY_NEWS_REPUTATION, then retrieves the average of the last five reputation rows for each company. I have no idea whether it will perform sufficiently, that will likely depend on the size of your data set and what indexes you have in place.
If you have a list of unique company names in another table, you can use that instead of the select distinct ... subquery to retrieve the companies for which to calculate averages.

SQL Select: picking the right distinct record based on another field

How does one filter a list of records to remove those that have some identical fields, based on selecting the one with the minimum value in another field? Note that it's not sufficient to just get the minimum value... I need to have other fields from the same record.
I have a table of "products", and I am trying to add the ability to apply a coupon code. Because of how the invoices are generated, selling a product at a different cost is considered a different product. In the database you might see this:
Product ID, Product Cost, Product Name, Coupon Code
1, 20, Product1, null
2, 10, Product1, COUPON1
3, 40, Product2, null
I have a query that selects a list of all products available now (based on other criteria; I'm simplifying this a lot). The problem is that, for the above case, my query returns:
1 - Product1 for $20
2 - Product1 for $10
3 - Product2 for $40
This gets shown to the customer (assuming they've entered the coupon code), and it's obviously bad form to show a customer the same product for two prices. What I want is:
2 - Product1 for $10
3 - Product2 for $40
i.e., showing the lowest-costing version of each product.
I need a solution that will work for MySQL, but the preferred solution would be standard SQL.
Try this:
SELECT T2.*
FROM
(
SELECT `Product Name` AS name, MIN(`Product Cost`) AS cost
FROM products
GROUP BY `Product Name`
) T1
JOIN products T2
ON T1.name = T2.`Product Name`
AND T1.cost = T2.`Product Cost`
To get the output exactly as you described as a string replace the first line with:
SELECT CONCAT(`Product ID`, ' - ', T1.name, ' for $', T1.cost)