How do you use two SUM() aggregate functions in the same query for PostgreSQL? - sql

I have a PostgreSQL query that yields the following results:
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | null | 125.00 | new
123-2 | B corp. | null | 100.00 | new
I need to replace the o.order_total (it doesn't work properly) and sum up the sum of the order_shipment_total column so that, for the example above, each row winds up saying 225.00. I need the results above to look like this below:
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 225.00 | 125.00 | new
123-2 | B corp. | 225.00 | 100.00 | new
What I've Tried
1.) To replace o.order_total, I've tried SUM(SUM(osh.items)) but get the error message that you cannot nest aggregate functions.
2.) I've tried to put the entire query as a subquery and sum the order_shipment_total column, but when I do, it just repeats the column itself. See below:
SELECT order,
company,
SUM(order_shipment_total) AS order_shipment_total,
order_shipment_total,
order_type
FROM (
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type
) subquery
GROUP BY order,
company,
order_shipment_total,
order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 125.00 | 125.00 | new
123-2 | B corp. | 100.00 | 100.00 | new
3.) I've tried to only include the rows I actually want to group by in my subquery/query example above, because I feel like I was able to do this in Oracle SQL. But when I do that, I get an error saying "column [name] must appear in the GROUP BY clause or be used in an aggregate function."
...
GROUP BY order,
company,
order_type;
ERROR: column "[a column name]" must appear in the GROUP BY clause or be used in an aggregate function.
How do I accomplish this? I was certain that a subquery would be the answer but I'm confused as to why this approach will not work.

The thing you're not quite grasping with your query / approach is that you're actually wanting two different levels of grouping in the same query row results. The subquery approach is half right, but when you do a subquery that groups, inside another query that groups you can only use the data you've already got (from the subquery) and you can only choose to keep it at the level of aggregate detail it already is, or you can choose to lose precision in favor of grouping more. You can't keep the detail AND lose the detail in order to sum up further. A query-of-subquery is hence (in practical terms) relatively senseless because you might as well group to the level you want in one hit:
SELECT groupkey1, sum(sumx) FROM
(SELECT groupkey1, groupkey2, sum(x) as sumx FROM table GROUP BY groupkey1, groupkey2)
GROUP BY groupkey1
Is the same as:
SELECT groupkey1, sum(x) FROM
table
GROUP BY groupkey1
Gordon's answer will probably work out (except for the same bug yours exhibits in that the grouping set is wrong/doesn't cover all the columns) but it probably doesn't help much in terms of your understanding because it's a code-only answer. Here's a breakdown of how you need to approach this problem but with simpler data and foregoing the window functions in favor of what you already know.
Suppose there are apples and melons, of different types, in stock. You want a query that gives a total of each specific kind of fruit, regardless of the date of purchase. You also want a column for the total for each fruit overall type:
Detail:
fruit | type | purchasedate | count
apple | golden delicious | 2017-01-01 | 3
apple | golden delicious | 2017-01-02 | 4
apple | granny smith | 2017-01-04 ! 2
melon | honeydew | 2017-01-01 | 1
melon | cantaloupe | 2017-01-05 | 4
melon | cantaloupe | 2017-01-06 | 2
So that's 7 golden delicious, 2 granny smith, 1 honeydew, 6 cantaloupe, and its also 9 apples and 7 melons
You can't do it as one query*, because you want two different levels of grouping. You have to do it as two queries and then (critical understanding point) you have to join the less-precise (apples/melons) results back to the more precise (granny smiths/golden delicious/honydew/cantaloupe):
SELECT * FROM
(
SELECT fruit, type, sum(count) as fruittypecount
FROM fruit
GROUP BY fruit, type
) fruittypesum
INNER JOIN
(
SELECT fruit, sum(count) as fruitcount
FROM fruit
GROUP BY fruit
) fruitsum
ON
fruittypesum.fruit = fruitsum.fruit
You'll get this:
fruit | type | fruittypecount | fruit | fruitcount
apple | golden delicious | 7 | apple | 9
apple | granny smith | 2 | apple | 9
melon | honeydew | 1 | melon | 7
melon | cantaloupe | 6 | melon | 7
Hence for your query, different groups, detail and summary:
SELECT
detail.order || '-' || detail.ordinal_number as order,
detail.company,
summary.order_total,
detail.order_shipment_total,
detail.order_type
FROM (
SELECT o.order,
osh.ordinal_number,
o.company,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_type
) detail
INNER JOIN
(
SELECT o.order,
SUM(osh.items) AS order_total
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
--don't need the where clause; we'll join on order number
GROUP BY o.order,
o.company,
o.order_type
) summary
ON
summary.order = detail.order
Gordon's query uses a window function achieve the same effect; the window function runs after the grouping is done, and it establishes another level of grouping (PARTITION BY ordernumber) which is the effective equivalent of my GROUP BY ordernumber in the summary. The window function summary data is inherently connected to the detail data via ordernumber; it is implicit that a query saying:
SELECT
ordernumber,
lineitemnumber,
SUM(amount) linetotal
sum(SUM(amount)) over(PARTITION BY ordernumber) ordertotal
GROUP BY
ordernumber,
lineitemnumber
..will have an ordertotal that is the total of all the linetotal in the order: The GROUP BY prepares the data to the line level detail, and the window function prepares data to just the order level, and repeats the total as many times are necessary to fill in for every line item. I wrote the SUM that belongs to the GROUP BY operation in capitals.. the sum in lowercase belongs to the partition operation. it has to sum(SUM()) and cannot simply say sum(amount) because amount as a column is not allowed on its own - it's not in the group by. Because amount is not allowed on its own and has to be SUMmed for the group by to work, we have to sum(SUM()) for the partition to run (it runs after the group by is done)
It behaves exactly the same as grouping to two different levels and joining together, and indeed I chose that way to explain it because it makes it more clear how it's working in relation to what you already know about groups and joins
Remember: JOINS make datasets grow sideways, UNIONS make them grow downwards. When you have some detail data and you want to grow it sideways with some more data(a summary), JOIN it on. (If you'd wanted totals to go at the bottom of each column, it would be unioned on)
*you can do it as one query (without window functions), but it can get awfully confusing because it requires all sorts of trickery that ultimately isn't worth it because it's too hard to maintain

You should be able to use window functions:
SELECT o.order || '-' || osh.ordinal_number AS order, o.company,
SUM(SUM(osh.items)) OVER (PARTITION BY o.order) as order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o JOIN
order_shipments osh
ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order, o.company, o.order_type;

Related

Sql query group by question, creating monthly averages

I am trying to compile a table which shows three columns:
product name, average monthly sales volume and average monthly sales price for said product.
I am using adventureworks 2019.
I have written the below query based upon some help I previously received. I have summed the order quantity and unit price for each product and then in the outer query averaged them.
SELECT
Product_Name,
AVG(Sales_Volume) AS Avg_Sales_Volume,
AVG(Price) AS Avg_Price
FROM
(SELECT
PP.[Name] AS Product_Name,
SUM(SSOD.[OrderQty]) AS Sales_Volume,
SUM(SSOD.[UnitPrice]) AS Price,
FORMAT(SSOH.[OrderDate], 'MM-yyyy') AS Month_Year
FROM
[Sales].[SalesOrderHeader] AS SSOH
LEFT JOIN Sales.SalesOrderDetail AS SSOD
ON SSOH.SalesOrderID = SSOD.SalesOrderID
LEFT JOIN production.product AS PP
ON SSOD.ProductID = PP.ProductID
GROUP BY PP.[Name], SSOH.[OrderDate]) AS T
--WHERE Product_Name = 'Road-150 Red, 44' (doing this for reference)
GROUP BY T.Product_Name, Month-Year
If I group by Month-Year I get around 3600 rows, where as if I don't, I get 266. I am confused about this. Basically, I am not sure what it is actually showing.
To some of you this may seem really basic, but it feels like I cannot get my head around it.
Could anyone take a bit of time to explain this to me?
Thanks
Product_Name Avg_Sales_Volume Avg_Price
LL Mountain Frame - Silver, 48 14 844.96
LL Touring Frame - Blue, 50 26 2100.546
Women's Mountain Shorts, L 13 161.494
Road-550-W Yellow, 44 3 1890.7332
HL Road Frame - Red, 48 18 6025.3137
Mountain-500 Silver, 42 8 1395.0116
(266 rows)
Query with group by Month-Year also:
Product_Name Avg_Sales_Volume Avg_Price
Road-150 Red, 44 1 6758.9544
HL Mountain Frame - Silver, 46 15 4465.6362
AWC Logo Cap 14 76.4672
Long-Sleeve Logo Jersey, L 2 102.611
Road-150 Red, 56 2 6817.546
Mountain-500 Silver, 52 13 2118.7125
LL Touring Frame - Yellow, 62 62 5001.30
ML Mountain Frame-W - Silver, 40 115 6546.3382
(3862 rows)
A few notes first, then an explanation of what you're seeing.
First, your outer query has GROUP BY Month-Year with a dash, but I believe this is intended to be the column Month_Year (with an underscore) from the subquery.
Next, I like your use of AS to explicitly define aliases on columns, as well as your use of square brackets [ and ] around object names rather than quoted identifiers.
Finally, start getting in the practice of using schema- and alias-identifiers throughout your code - especially when working with subqueries. They'll make it much more clear which objects should be returned, and will prevent "ambiguous column" errors.
Explanation:
The difference in row-counts returned for each of your two queries is due to how those queries instruct the grouping to occur.
GROUP BY T.Product_Name; returns 266 rows because it is aggregating all of the sales volume and pricing information for the given product across all time. Changing up your subquery a bit to read:
SELECT COUNT(DISTINCT pp.Name)
FROM [Sales].[SalesOrderHeader] AS SSOH
LEFT JOIN Sales.SalesOrderDetail AS SSOD
ON SSOH.SalesOrderID = SSOD.SalesOrderID
LEFT JOIN production.product AS PP
ON SSOD.ProductID = PP.ProductID
Results in 266 rows being returned, indicating that there are 266 distinct product names included in that set.
When you add more grouping conditions (such as T.Month_Year), you are telling the engine to make "subgroups" in the aggregation structure.
GROUP BY T.Product_Name, T.Month_Year runs your aggregates for each of those distinct 266 product names as well as aggregating the data for each distinct T.Month_Year value that appears in each of your 266 product name groups.
Examining those row-counts a bit closer, the two-condition grouping returns 3,862 rows while the single-condition grouping returns 266 rows. Across those 3,862 rows there are 266 distinct product names represented an average of 14.52 times (3862 / 266 = 15.518...). If you assumed that every product had at least one sale per month, then you might conclude that we are looking at slightly more than one year's worth of sales data here. More likely, this is a set of several years of sales data with a lot of variation in sales volume and frequency between products.
ADDENDUM: Adding your GROUP BY columns to the SELECT illustrates the difference in result sets:
SELECT Product_Name,
T.Month_Year,
AVG(Sales_Volume) AS Avg_Sales_Volume,
AVG(Price) AS Avg_Price
FROM (SELECT PP.[Name] AS Product_Name,
SUM(SSOD.[OrderQty]) AS Sales_Volume,
SUM(SSOD.[UnitPrice]) AS Price,
FORMAT(SSOH.[OrderDate], 'MM-yyyy') AS Month_Year
FROM [Sales].[SalesOrderHeader] AS SSOH
LEFT JOIN Sales.SalesOrderDetail AS SSOD
ON SSOH.SalesOrderID = SSOD.SalesOrderID
LEFT JOIN production.product AS PP
ON SSOD.ProductID = PP.ProductID
GROUP BY PP.[Name], SSOH.[OrderDate]) AS T
--WHERE Product_Name = 'Road-150 Red, 44' (doing this for reference)
GROUP BY T.Product_Name,
T.Month_Year
ORDER BY Product_Name;
Examining the results shows that each product name also has records for any month in which that product sold:
+--------------------------+-----------+-------+-----------+
| ProductName | Month_Year|Avg_Vol| Avg_Price |
+--------------------------+-----------+-------+-----------+
| All-Purpose Bike Stand | 12-2013 | 1 | 193.0714 |
| All-Purpose Bike Stand | 06-2014 | 1 | 218.625 |
| All-Purpose Bike Stand | 05-2014 | 1 | 187.909 |
| All-Purpose Bike Stand | 10-2013 | 1 | 212.00 |
| AWC Logo Cap | 02-2014 | 6 | 57.7928 |
| AWC Logo Cap | 02-2012 | 48 | 93.357 |
| AWC Logo Cap | 08-2011 | 68 | 103.73 |
| AWC Logo Cap | 01-2013 | 124 | 129.4896 |
| AWC Logo Cap | 03-2014 | 21 | 71.1747 |
+--------------------------+-----------+-------+-----------+

Why when using MIN function and selecting another column, we require GROUP BY clause? Doesn't MIN return single record?

I have two tables vehicles and dealership_vehicles. The dealership_vehicles table has a price column. The vehicles table has a column dealership_vehicle_id which relates to the dealership_vehicles id column in the dealership_vehicles table.
I wanted to return just the vehicle make of the cheapest car.
Why is it that the following query:
select
vehicles.make,
MIN(dealership_vehicles.price)
from
vehicles inner join dealership_vehicles
on vehicles.dealership_vehicle_id=dealership_vehicles.id;
Returns the error:
column "vehicles.make" must appear in the GROUP BY clause or be used in an aggregate function
Since MIN function returns a single value it is plausible that SQL query can be constructed that will return a single value without needing GROUP BY.
You say you want to know the make of the cheapest car. The easiest way to do this is
SELECT DISTINCT v.MAKE
FROM VEHICLE v
INNER JOIN DEALERSHIP_VEHICLES dv
ON v.DEALERSHIP_VEHICLE_ID = dv.ID
WHERE dv.PRICE = (SELECT MIN(PRICE) FROM DEALERSHIP_VEHICLES);
Note that because multiple vehicles might have the "cheapest" price it's entirely possible you'll get multiple returns from the above query.
Best of luck.
EDIT
Another way to do it is to take the minimum price, by make, then sort by the minimum price, and then just take the first row. Something like
SELECT *
FROM (SELECT v.MAKE, MIN(dv.PRICE)
FROM VEHICLE v
INNER JOIN DEALERSHIP_VEHICLES dv
ON v.DEALERSHIP_VEHICLE_ID = dv.ID
GROUP BY v.MAKE
ORDER BY MIN(dv.PRICE) ASC)
WHERE ROWNUM = 1;
Think of the term "GROUP BY" as "for each." It's saying "Give me the MIN of dealership_vehicles.price for each vehicles.make"
So you will need to change your query to:
select
vehicles.make,
MIN(dealership_vehicles.price)
from
vehicles inner join dealership_vehicles
on vehicles.dealership_vehicle_id=dealership_vehicles.id
Group by vehicles.make;
If you want the make of the cheapest car, then no aggregation is needed:
select v.make, dv.price
from vehicles v inner join
dealership_vehicles dv
on v.dealership_vehicle_id = dv.id
order by dv.price asc
fetch first one row only;
This gets a little more complicated if you want all rows in the case of ties:
select v.*
from (select v.make, dv.price, rank() over (order by price asc) as seqnum
from vehicles v inner join
dealership_vehicles dv
on v.dealership_vehicle_id = dv.id
) v
where seqnum = 1
So let's say after we join in the price we have the following table (i.e. stored in a #temp table):
#temp Vehicles table:
| Make | Model | Price |
|--------|-------|----------|
| Toyota | Yaris | 5000.00 |
| Toyota | Camry | 10000.00 |
| Ford | Focus | 7500.00 |
If you query it for minimum price without specifying what you're grouping by, then only one minimum function is applied across all of the rows. Example:
select min(Price) from #temp
will return you a single value of 5000.00
If you want to know the make of the cheapest car, you need to filter your results by the cheapest price - it's a two step process. First you find out the cheapest price using min, then in a separate query, you find out which cars are at that price. Once you construct your query correctly, you will notice that this reveals what you might not have though of - you can actually have more than one cheapest make.
Example table:
#temp Vehicles table v2:
| Make | Model | Price |
|--------|--------|----------|
| Toyota | Yaris | 5000.00 |
| Toyota | Camry | 10000.00 |
| Ford | Focus | 7500.00 |
| Ford | Escort | 5000.00 |
query:
select * from #temp
where Price = (select min(Price) from #temp)
result:
| Make | Model | Price |
|--------|--------|----------|
| Toyota | Yaris | 5000.00 |
| Ford | Escort | 5000.00 |

Multi-Table Invoice SUM Comparison

Say I have 3 tables in a rails app:
invoices
id | customer_id | employee_id | notes
---------------------------------------------------------------
1 | 1 | 5 | An order with 2 items.
2 | 12 | 5 | An order with 1 item.
3 | 17 | 12 | An empty order.
4 | 17 | 12 | A brand new order.
invoice_items
id | invoice_id | price | name
---------------------------------------------------------
1 | 1 | 5.35 | widget
2 | 1 | 7.25 | thingy
3 | 2 | 1.25 | smaller thingy
4 | 2 | 1.25 | another smaller thingy
invoice_payments
id | invoice_id | amount | method | notes
---------------------------------------------------------
1 | 1 | 4.85 | credit card | Not enough
2 | 1 | 1.25 | credit card | Still not enough
3 | 2 | 1.25 | check | Paid in full
This represents 4 orders:
The first has 2 items, for a total of 12.60. It has two payments, for a total paid amount of 6.10. This order is partially paid.
The second has only one item, and one payment, both totaling 1.25. This order is paid in full.
The third order has no items or payments. This is important to us, sometimes we use this case. It is considered paid in full as well.
The final order has one item again, for a total of 1.25, but no payments as of yet.
Now I need a query:
Show me all orders that are not paid in full yet; that is, all orders such that the total of the items is greater than the total of the payments.
I can do it in pure sql:
SELECT invoices.*,
invoice_payment_amounts.amount_paid AS amount_paid,
invoice_item_amounts.total_amount AS total_amount
FROM invoices
LEFT JOIN (
SELECT invoices.id AS invoice_id,
COALESCE(SUM(invoice_payments.amount), 0) AS amount_paid
FROM invoices
LEFT JOIN invoice_payments
ON invoices.id = invoice_payments.invoice_id
GROUP BY invoices.id
) AS invoice_payment_amounts
ON invoices.id = invoice_payment_amounts.invoice_id
LEFT JOIN (
SELECT invoices.id AS invoice_id,
COALESCE(SUM(invoice_items.item_price), 0) AS total_amount
FROM invoices
LEFT JOIN invoice_items
ON invoices.id = invoice_items.invoice_id
GROUP BY invoices.id
) AS invoice_item_amounts
ON invoices.id = invoice_item_amounts.invoice_id
WHERE amount_paid < total_amount
But...now I need to get that into rails (probably as a scope). I can use find_by_sql, but that then returns an array, rather than an ActiveRecord::Relation, which is not what I need, since I want to chain it with other scopes (there is, for example, an overdue scope, which uses this), etc.
So raw SQL probably isn't the right way to go here.....but what is? I've not been able to do this in activerecord's query language.
The closest I've gotten so far was this:
Invoice.select('invoices.*, SUM(invoice_items.price) AS total, SUM(invoice_payments.amount) AS amount_paid').
joins(:invoice_payments, :invoice_items).
group('invoices.id').
where('amount_paid < total')
But that fails, since on orders like #1, with multiple payments, it incorrectly doubles the price of the order (due to multiple joins), showing it as still unpaid. I had the same problem in SQL, which is why I structured it in the way I did.
Any thoughts here?
You can get your results using group by and having clause of MySQL as:
Pure MySQL Query:
SELECT `invoices`.* FROM `invoices`
INNER JOIN `invoice_items` ON
`invoice_items`.`invoice_id` = `invoices`.`id`
INNER JOIN `invoice_payments` ON
`invoice_payments`.`invoice_id` = `invoices`.`id`
GROUP BY invoices.id
HAVING sum(invoice_items.price) < sum(invoice_payments.amount)
ActiveRecord Query:
Invoice.joins(:invoice_items, :invoice_payments).group("invoices.id").having("sum(invoice_items.price) < sum(:invoice_payments.amount)")
When building more complex queries in Rails usually Arel Really Exasperates Logicians comes in handy
Arel is a SQL AST manager for Ruby. It
simplifies the generation of complex SQL queries, and
adapts to various RDBMSes.
Here is a sample how the Arel implementation would look like based on the requirements
invoice_table = Invoice.arel_table
# Define invoice_payment_amounts
payment_arel_table = InvoicePayment.arel_table
invoice_payment_amounts = Arel::Table.new(:invoice_payment_amounts)
payment_cte = Arel::Nodes::As.new(
invoice_payment_amounts,
payment_arel_table
.project(payment_arel_table[:invoice_id],
payment_arel_table[:amount].sum.as("amount_paid"))
.group(payment_arel_table[:invoice_id])
)
# Define invoice_item_amounts
item_arel_table = InvoiceItem.arel_table
invoice_item_amounts = Arel::Table.new(:invoice_item_amounts)
item_cte = Arel::Nodes::As.new(
invoice_item_amounts,
item_arel_table
.project(item_arel_table[:invoice_id],
item_arel_table[:price].sum.as("total"))
.group(item_arel_table[:invoice_id])
)
# Define main query
query = invoice_table
.project(
invoice_table[Arel.sql('*')],
invoice_payment_amounts[:amount_paid],
invoice_item_amounts[:total]
)
.join(invoice_payment_amounts).on(
invoice_table[:id].eq(invoice_payment_amounts[:invoice_id])
)
.join(invoice_item_amounts).on(
invoice_table[:id].eq(invoice_item_amounts[:invoice_id])
)
.where(invoice_item_amounts[:total].gt(invoice_payment_amounts[:amount_paid]))
.with(payment_cte, item_cte)
res = Invoice.find_by_sql(query.to_sql)
for r in res do
puts "---- Invoice #{r.id} -----"
p r
puts "total: #{r[:total]}"
puts "amount_paid: #{r[:amount_paid]}"
puts "----"
end
This will return the same output as your SQL query using the sample data you have provided to the question.
Output:
<Invoice id: 2, notes: "An order with 1 items.", created_at: "2017-12-18 21:15:47", updated_at: "2017-12-18 21:15:47">
total: 2.5
amount_paid: 1.25
----
---- Invoice 1 -----
<Invoice id: 1, notes: "An order with 2 items.", created_at: "2017-12-18 21:15:47", updated_at: "2017-12-18 21:15:47">
total: 12.6
amount_paid: 6.1
----
Arel is quite flexible so you can use this as a base and refine the query conditions based on more specific requirements you might have.
I would strongly recommend for you to consider creating a cache columns (total, amount_paid) in the Invoice table and maintain them so you can avoid this complex query. At least the total additional column would be quite simple to create and fill the data.

SQL to find max of sum of data in one table, with extra columns

Apologies if this has been asked elsewhere. I have been looking on Stackoverflow all day and haven't found an answer yet. I am struggling to write the query to find the highest month's sales for each state from this example data.
The data looks like this:
| order_id | month | cust_id | state | prod_id | order_total |
+-----------+--------+----------+--------+----------+--------------+
| 67212 | June | 10001 | ca | 909 | 13 |
| 69090 | June | 10011 | fl | 44 | 76 |
... etc ...
My query
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders GROUP BY `month`, `state`
ORDER BY sales;
| month | state | sales |
+------------+--------+--------+
| September | wy | 435 |
| January | wy | 631 |
... etc ...
returns a few hundred rows: the sum of sales for each month for each state. I want it to only return the month with the highest sum of sales, but for each state. It might be a different month for different states.
This query
SELECT `state`, MAX(order_sum) as topmonth
FROM (SELECT `state`, SUM(order_total) order_sum FROM orders GROUP BY `month`,`state`)
GROUP BY `state`;
| state | topmonth |
+--------+-----------+
| ca | 119586 |
| ga | 30140 |
returns the correct number of rows with the correct data. BUT I would also like the query to give me the month column. Whatever I try with GROUP BY, I cannot find a way to limit the results to one record per state. I have tried PartitionBy without success, and have also tried unsuccessfully to do a join.
TL;DR: one query gives me the correct columns but too many rows; the other query gives me the correct number of rows (and the correct data) but insufficient columns.
Any suggestions to make this work would be most gratefully received.
I am using Apache Drill, which is apparently ANSI-SQL compliant. Hopefully that doesn't make much difference - I am assuming that the solution would be similar across all SQL engines.
This one should do the trick
SELECT t1.`month`, t1.`state`, t1.`sales`
FROM (
/* this one selects month, state and sales*/
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders
GROUP BY `month`, `state`
) AS t1
JOIN (
/* this one selects the best value for each state */
SELECT `state`, MAX(sales) AS best_month
FROM (
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders
GROUP BY `month`, `state`
)
GROUP BY `state`
) AS t2
ON t1.`state` = t2.`state` AND
t1.`sales` = t2.`best_month`
It's basically the combination of the two queries you wrote.
Try this:
SELECT `month`, `state`, SUM(order_total) FROM orders WHERE `month` IN
( SELECT TOP 1 t.month FROM ( SELECT `month` AS month, SUM(order_total) order_sum FROM orders GROUP BY `month`
ORDER BY order_sum DESC) t)
GROUP BY `month`, state ;

UNION or JOIN for SELECT from multiple tables

My Issue
I am trying to select one row from multiple tables based on parameters, but my limited knowledge of SQL joining is holding me back. Could somebody possibly point me in the right direction?
Consider these table structures:
+-----------------------+ +---------------------+
| Customers | | Sellers |
+-------------+---------+ +-----------+---------+
| Customer_ID | Warning | | Seller_ID | Warning |
+-------------+---------+ +-----------+---------+
| 00001 | Test 1 | | 00008 | Testing |
| 00002 | Test 2 | | 00010 | Testing |
+-------------+---------+ +-----------+---------+
What I would like to do is one SELECT to retrieve only one row, and in this row will be the 'Warning' field for each of the tables based on the X_ID field.
Desired Results
So, if I submitted the following information, I would receive the following results:
Example 1:
Customer_ID = 00001
Seller_ID = 00008
Results:
+-----------------------------------+
| Customer_Warning | Seller_Warning |
+------------------+----------------+
| Test 1 | Testing |
+------------------+----------------+
Example 2:
Customer_ID = 00001
Seller_ID = 00200
Results:
+-----------------------------------+
| Customer_Warning | Seller_Warning |
+------------------+----------------+
| Test 1 | NULL |
+------------------+----------------+
What I Have Tried
This is my current code (I am receiving loads of rows):
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM Customers c,Sellers s
WHERE c.Customer_ID = #Customer_ID
OR s.Seller_ID = #Seller_ID
But I have also played around with UNION, UNION ALL and JOIN. Which method should I go for?
Since you're not really joining tables together, just selecting a single row from each, you could do this:
SELECT
(SELECT Warning
FROM Customers
WHERE Customer_ID = #Customer_ID) AS Customer_Warning,
(SELECT Warning
FROM Sellers
WHERE Seller_ID = #Seller_ID) AS Seller_Warning
The problem is you're getting a cartesian product of rows in each table where either column has the value you're looking for.
I think you just want AND instead of OR:
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM Customers c
JOIN Sellers s
ON c.Customer_ID = #Customer_ID
AND s.Seller_ID = #Seller_ID
If performance isn't good enough you could join two filtered subqueries:
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM (SELECT Warnning FROM Customers WHERE c.Customer_ID = #Customer_ID) c,
(SELECT Warning FROM Sellers s WHERE s.Seller_ID = #Seller_ID) s
But I suspect SQL will be able to optimize the filtered join just fine.
it wont return a row if one of the ID's doesnt exist.
Then you want a FULL OUTER JOIN:
SELECT c.Warning 'Customer_Warning', s.Warning AS 'Seller_Warning'
FROM Customers c
FULL OUTER JOIN Sellers s
ON c.Customer_ID = #Customer_ID
AND s.Seller_ID = #Seller_ID
The problem that you are facing is that when one of the tables has no rows, you are going to get no rows out.
I would suggest solving this with a full outer join:
SELECT c.Warning as Customer_Warning, s.Warning AS Seller_Warning
FROM Customers c FULL OUTER JOIN
Sellers s
ON c.Customer_ID = #Customer_ID AND s.Seller_ID = #Seller_ID;
Also, I strongly discourage you from using single quotes for column aliases. Use single quotes only for string and date constants. Using them for column names can lead to confusion. In this case, you don't need delimiters on the names at all.
What I have seen so far here are working examples for your scenario. However, there is no real sense behind putting unrelated data together in one row. I would propose using a UNION and separate the values in your code:
SELECT 'C' AS Type, c.Warning
FROM Customers c
WHERE c.Customer_ID = #Customer_ID
UNION
SELECT 'S' AS Type, s.Warning
FROM Sellers s
WHERE s.Seller_ID = #Seller_ID
You can use the flag to distinguish the warnings in your code. This will be more efficient then joining or sub queries and will be easy to understand later on (when refactoring). I know this is not 100% what you ask for in your question but that's why I challenge the question :)