Calculate how many days the stock was at a site with SQL - sql

I'm trying to calculate how many days the stock for an item has been sitting at a site.
There are two tables: Stock table shows the items and stock currently on hand and Receipts table show the dates when the site has received stock and quantity.
I want to do a left outer join to see all the items in the Stock table and only the rows from the Receipts table with the date where there is still stock left from.
Stock
| Item |Current Stock| Value |
|-------|-------------|-------|
|Blade |8 |$40 |
|Table |15 |$100 |
|Screen |3 |$30 |
Receipts
| Item |Receipt Date| Quantity|
|-------|------------|---------|
|Blade |1/3/2020 | 20 |
|Blade |12/10/2021 | 10 |
|Blade |1/5/2022 | 5 |
|Table |3/4/2020 | 10 |
|Table |5/1/2021 | 7 |
|Table |7/10/2021 | 5 |
|Table |8/1/2021 | 5 |
Dates are in mm/dd/yyyy format. Assuming the current date here is 2/1/2022.
Desired Results
| Item |Current Stock| Value |Receipt Date|Age in Days|
|-------|-------------|-------|------------|-----------|
|Blade |8 |$40 |12/10/2021 |53 |
|Table |15 |$100 |5/1/2021 |276 |
|Screen |3 |$30 | | |
Logic:
| Item |Receipt Date | Quantity|Running Sum|Running Sum-Current Stock|
|-------|--------------|---------|-----------|-------------------------|
|Blade |1/3/2020 | 20 |35 |27 |
|Blade |**12/10/2021**| 10 |15 |7 |
|Blade |1/5/2022 | 5 |5 |0 |
For example:
Currently there are 8 units of Blades in stock. The lastest receipt (on 1/5/2022) was 5 units. So there are still 3 units remaining from the 12/10/2021 receipt date. I want to see the first receipt date where the (Running Sum-Current Stock) is greater than 0. This is based on FIFO (First In First Out)
Thanks in advance.

You didn't mention the name of DBMS. My answer is for SQL Server. For other DBMS you need to change datediff() function.
Schema and insert statements:
create table Stock(Item varchar(50), Current_Stock int, Value int);
insert into Stock values('Blade' ,8 ,40);
insert into Stock values('Table' ,15 ,100);
insert into Stock values('Screen' ,3 ,30);
create table Receipts(Item varchar(50), Receipt_Date date, Quantity int);
insert into Receipts values('Blade','1/3/2020', 20);
insert into Receipts values('Blade','12/10/2021', 10);
insert into Receipts values('Blade','1/5/2022', 5);
insert into Receipts values('Table','3/4/2020', 10);
insert into Receipts values('Table','5/1/2021', 7);
insert into Receipts values('Table','7/10/2021', 5);
insert into Receipts values('Table','8/1/2021', 5);
Query:
with Recepts_with_runningtotal_qty as
(
select *, sum(Quantity)over(partition by Item order by Receipt_Date desc) running_total_qty from Receipts
),
current_stock as
(
select *, (select max(Receipt_Date) from Recepts_with_runningtotal_qty r where r.running_total_qty>s.current_stock and s.Item=r.Item)Receipt_Date
from Stock s
)
select *,datediff(day, Receipt_Date,'2/1/2022')Age_in_Days from current_stock
Output:
Item
Current_Stock
Value
Receipt_Date
Age_in_Days
Blade
8
40
2021-12-10
53
Table
15
100
2021-05-01
276
Screen
3
30
null
null
db<>fiddle here
For Oracle you can use below query:
with Recepts_with_runningtotal_qty as
(
select Item , Receipt_Date, Quantity, sum(Quantity)over(partition by Item order by Receipt_Date desc) running_total_qty from Receipts
),
current_stock as
(
select Item , Current_Stock, Value, (select max(Receipt_Date) from Recepts_with_runningtotal_qty r where r.running_total_qty>s.current_stock and s.Item=r.Item)Receipt_Date
from Stock s
)
select Item, Current_Stock, Value, Receipt_Date,(to_date('1 Feb 2022','DD MM YY')-Receipt_Date)Age_in_Days from current_stock
Output:
ITEM
CURRENT_STOCK
VALUE
RECEIPT_DATE
AGE_IN_DAYS
Blade
8
40
10-DEC-21
53
Table
15
100
01-MAY-21
276
Screen
3
30
null
null
Query 2: without using common table expression
select Item, Current_Stock, Value, Receipt_Date,(to_date('1 Feb 2022','DD MM YY')-Receipt_Date)Age_in_Days
from
(
select Item, Current_Stock, Value, Receipt_Date,Quantity, Running_Total_Qty,
row_number()over(partition by Item order by Receipt_Date desc)rn
from
(
Select S.Item, S.Current_Stock, S.Value, R.Receipt_date, R.Quantity,
sum(Quantity)over(partition by R.Item order by Receipt_Date desc) running_total_qty
From Stock S
Left outer join Receipts R On (S.Item = R.Item)
)
where Running_Total_Qty>= Current_Stock or Running_Total_Qty is null
)
where rn=1
Output:
ITEM
CURRENT_STOCK
VALUE
RECEIPT_DATE
AGE_IN_DAYS
Blade
8
40
10-DEC-21
53
Screen
3
30
null
null
Table
15
100
01-MAY-21
276
db<>fiddle here

You could declare variable with your current date, or use GETDATE() -
DECLARE #Today AS DATE SET #Today = GETDATE or some other date if you need.
And then, you can use DATEDIFF, like that:
SELECT DATEDIFF(day, #Today, Receipt Date) AS date_diff_days
After that just perform left outer join it should work fine. Have fun :)

Related

How can i compare 2 items in sql that are in the same column in the same table?

i have to produce a list with the following content:
For Example we have got an order with 3 Positions of the same Product, the same Quantity, etc
The only difference is the desired date of shipment of the customer. e.g the first position should be delievered on the first of january. The 2nd Position should be delievered on the first of April, and the third Position on the first of July.
Furthermore we can set a checkmark in our System that the customer cant split his Orders for various Reasons.
So i need to find out which Orders have the checkmark set to "NO SPLIT ORDER-SHIPMENTS" and still have gotten different shipment-dates for the positions.
I'm atm completely clueless how to tackle that Problem.
For Example Table A contains:
ordernumber|desired-date|orderposition|productid|quantitiy
123456789 | 01-01-2022 | 10 | 0815 | 100
123456798 | 01-04-2022 | 20 | 0815 | 100
123456789 | 01-07-2022 | 30 | 0815 | 100
123456789 | 04-02-2022 | 10 | 5152 | 66
In our System we have set an option that the sutomer of this order can no get split shipments. So we have an issue here. The order containst three different shipment-dates but the system wont allow that.
How can i find exactly those rows in that Table that have this Problem. I dont want to see row Number 4 of Table A only the first 3.
The following query will find the orders with different delivery dates for the same product with the same order id for the same customer.
The column names will need to be replaced with the column names in your database and may have to join 2 or more tables in the query to get all the information.
create table tableA (ordernumber int, desired_date date, orderposition int, productid int, quantitiy int);
insert into tableA values
(123456789 , '2022-01-01' , 10 , 0815 , 100),
(123456798 , '2022-04-01' , 20 , 0815 , 100),
(123456789 , '2022-07-01' , 30 , 0815 , 100),
(123456789 , '2022-02-04' , 10 , 5152 , 66 );
select
count(distinct desired_date) number_lines,
ordernumber,
productid
from tableA
group by
ordernumber,
productid
having
count(distinct desired_date) > 1
/*and
checkmark = "NO SPLIT ORDER-SHIPMENTS";*/
number_lines | ordernumber | productid
-----------: | ----------: | --------:
2 | 123456789 | 815
db<>fiddle here

How can I write a Postgres (SQL) query for FIFO 'closing stock' inventory valuation?

Background
I need to implement inventory valuation / costing using the FIFO (first-in, first-out) method.
I'm running Postgres 11 running on CentOS 7.
I've looked at, and tried, a fair number of hypotheses from SO and the wider internet (as well as searching my own print library which includes SQL Queries for Mere Mortals, PostgreSQL Up & Running, The SQL Cookbook, Practical Issues In Database Management, and other quality reference works), and to date, I can't find a solution that works for closing inventory valuation.
(I've also tried reasoning it out on my own, but have failed to come up with a plausible appraoch)
NOTE In my case, I have permission to change the table structure, etc, of the setup, so I can add / remove / change anything in the setup as needed (such as, e.g., adding a direction column to the movements table, as some approaches I've tried have indicated, changing queries, etc etc)
Current setup
I have a table mockup_inv_movements:
CREATE TABLE the_schema.mockup_inv_movements (
id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
created_at TIMESTAMP WITH TIME ZONE DEFAULT now(),
sku TEXT,
adjustment_quantity NUMERIC,
unit_cost NUMERIC(19,2),
po_num INTEGER
);
and this view mockup_inv_movements_with_fifo_cost adds FIFO cost for sale / 'out' rows, calculated from a query (shown later below):
CREATE VIEW the_schema.mockup_inv_movements_with_fifo_cost AS (
select
i.id,
i.created_at,
i.po_num,
i.sku,
i.adjustment_quantity,
i.unit_cost,
m.fifo_unit_cost
FROM
the_schema.mockup_inv_movements i
LEFT OUTER JOIN
the_schema.fifo_hypothesis_2_mockup m
ON
i.id = m.id
ORDER BY i.id
);
Adding some test inventory movement data:
-- insert receipt / 'in' records
INSERT INTO the_schema.mockup_inv_movements (sku, adjustment_quantity, unit_cost, po_num, created_at )
VALUES ('foo_product',100,4,123, now()+'1 hour'), ('foo_product',10,3,987, now()+'2 hour'), ('foo_product',20,7,223, now()+'3 hours')
;
INSERT INTO the_schema.mockup_inv_movements (sku, adjustment_quantity, unit_cost, po_num, created_at )
VALUES ('bar_product',100,5,123, now()+'4 hours'),('bar_product',30,6,963, now()+'5 hours'),('bar_product',50,8,223, now()+'6 hours'),('bar_product',5,5,456, now()+'7 hours')
;
--insert sale / 'out' records
INSERT INTO the_schema.mockup_inv_movements (sku, adjustment_quantity, unit_cost, po_num, created_at )
VALUES ('bar_product',-50,null,null, now()+'8 hours'),('bar_product',-30, null,null, now()+'9 hours'),
('bar_product',-20,null,null, now()+'10 hours'),('bar_product',-10,null,null, now()+'11 hours')
;
INSERT INTO the_schema.mockup_inv_movements (sku, adjustment_quantity, unit_cost, po_num, created_at )
VALUES ('foo_product',-70,null,null, now()+'12 hours'), ('foo_product',-5,null,null, now()+'13 hours'),
('foo_product',-20,null,null, now()+'14 hours'),('foo_product',-10,null,null, now()+'15 hours')
;
OK, now here's the query that calculates the 'sale/out' price for each, taken from this question, which seems to work; note that I'm only pulling in the column fifo_unit_cost from this query at the moment:
CREATE VIEW the_schema.fifo_hypothesis_2_mockup AS (
SELECT
id,
sku,
created_at AT TIME ZONE 'mst',
qty_sold,
-- 5
round((cumulative_sold_cost - coalesce(lag(cumulative_sold_cost) over w, 0))/qty_sold, 2) as fifo_unit_cost,
qty_bought,
prev_bought,
total_cost,
prev_total_cost,
cumulative_sold_cost,
coalesce(lag(cumulative_sold_cost) over w, 0) as prev_cumulative_sold_cost
FROM (
SELECT id,
tneg.sku,
created_at,
qty_sold,
tpos.qty_bought,
prev_bought,
total_cost,
prev_total_cost,
-- 4
round(prev_total_cost + ((tneg.cumulative_sold - tpos.prev_bought)/(tpos.qty_bought - tpos.prev_bought))*(total_cost-prev_total_cost), 2) as cumulative_sold_cost
FROM (
SELECT
id,
sku,
created_at,
-(adjustment_quantity) as qty_sold,
sum(-(adjustment_quantity)) over w as cumulative_sold
FROM the_schema.mockup_inv_movements
WHERE adjustment_quantity < 0
WINDOW w AS (PARTITION BY sku ORDER BY created_at)
-- 1
) tneg
LEFT JOIN (
SELECT
sku,
sum(adjustment_quantity) over w as qty_bought,
coalesce(sum(adjustment_quantity) over prevw, 0) as prev_bought,
adjustment_quantity * unit_cost as cost,
sum(adjustment_quantity * unit_cost) over w as total_cost,
coalesce(sum(adjustment_quantity * unit_cost) over prevw, 0) as prev_total_cost
FROM the_schema.mockup_inv_movements
WHERE adjustment_quantity > 0
WINDOW w AS (PARTITION BY sku ORDER BY created_at),
prevw AS (PARTITION BY sku ORDER BY created_at ROWS BETWEEN unbounded preceding AND 1 preceding)
-- 2
) tpos
-- 3
ON
((tneg.cumulative_sold > tpos.prev_bought )
AND ( tneg.cumulative_sold <= tpos.qty_bought ))
AND tneg.sku = tpos.sku
) t
WINDOW w AS (PARTITION BY sku ORDER BY created_at)
ORDER BY id
)
;
Now here's the part where I'm having trouble.
I need to calculate the value of remaining stock / inventory on hand, also known as "closing stock" or "closing inventory." I've tried a number of approaches including this question and this 'set-based speed phreakery' method, the latter of which I readily admit that I don't fully comprehend,
The approach that has come closest to working for me is this older hypothesis from Ranjeet Rana, BUT although it does seem to assign the FIFO costs according to the correct breakdown, the sum of closing stock for each SKU does not seem to match the raw difference between 'in' and 'out' quantities.
Here's the closing stock query adapted from Rana (comments mine; I left them in just in case they might indicate where my error is).
CREATE VIEW the_schema.closing_inv_hyp_3 AS (
select *,
case
when cumulative>0 and adjustment_quantity>=cumulative -- note that sale/out adjustment_quantity / cumulative is always less than zero
then cumulative*cost -- in this case, some amount of this row's receipt has been sold, and the remainder qty is shown in 'cumulative'
when cumulative>0 and adjustment_quantity<cumulative
then adjustment_quantity*cost -- in this case, none of this row's receipt has been sold, and so the entire adjustment amount is multiplied by unit cost
else 0 -- sale rows are assigned zero for this column
end as closing_stock
from (
select
*, -- all rows from subquery
sum(adjustment_quantity) over (order by srl) as cumulative -- THIS is the problematic column
from (
select
0 as srl, -- this ensures that all 'sale / out' rows float to the top
id,
sku,
adjustment_quantity,
COALESCE(fifo_unit_cost,unit_cost) AS cost,
created_at
from
the_schema.mockup_inv_movements_with_fifo_cost
where adjustment_quantity < 0 -- SALE / OUT only
UNION -- gets all from both queries (less any dupes)
select
row_number() over(order by created_at) as srl, -- this assigns a synthetic sequential row number to the 'PO / in' rows and ensures the are pushed to the bottom
id,
sku,
adjustment_quantity,
COALESCE(fifo_unit_cost,unit_cost) AS cost,
created_at
from
the_schema.mockup_inv_movements_with_fifo_cost
where
adjustment_quantity > 0 -- PO / IN only
ORDER BY srl
)as tab
) as maintab
);
With this in place, we should be able to get the sum of closing stock value per SKU with:
SELECT
sku,
sum(closing_stock) as closing_stock_sum_value
FROM
the_schema.closing_inv_hyp_3
WHERE closing_stock > 0
GROUP BY sku
ORDER by sku
;
However, as I mentioned, the totals do not match up with the basic inventory difference calculation (specifically in this test example, I would expect 75 units of bar_product to be represented in closing stock, whereas this query shows 100):
srl | id | sku | adjustment_quantity | cost | created_at | cumulative | closing_stock
-----+-----+-------------+---------------------+------+-------------------------------+------------+---------------
0 | 102 | foo_product | -70 | 4.00 | 2022-03-10 07:27:05.447572+00 | -215 | 0
0 | 100 | bar_product | -20 | 5.00 | 2022-03-10 05:27:05.447572+00 | -215 | 0
0 | 101 | bar_product | -10 | 6.00 | 2022-03-10 06:27:05.447572+00 | -215 | 0
0 | 103 | foo_product | -5 | 4.00 | 2022-03-10 08:27:05.447572+00 | -215 | 0
0 | 105 | foo_product | -10 | 3.50 | 2022-03-10 10:27:05.447572+00 | -215 | 0
0 | 99 | bar_product | -30 | 5.00 | 2022-03-10 04:27:05.447572+00 | -215 | 0
0 | 98 | bar_product | -50 | 5.00 | 2022-03-10 03:27:05.447572+00 | -215 | 0
0 | 104 | foo_product | -20 | 4.00 | 2022-03-10 09:27:05.447572+00 | -215 | 0
1 | 91 | foo_product | 100 | 4.00 | 2022-03-09 20:27:05.447572+00 | -115 | 0
2 | 92 | foo_product | 10 | 3.00 | 2022-03-09 21:27:05.447572+00 | -105 | 0
3 | 93 | foo_product | 20 | 7.00 | 2022-03-09 22:27:05.447572+00 | -85 | 0
4 | 94 | bar_product | 100 | 5.00 | 2022-03-09 23:27:05.447572+00 | 15 | 75.00
5 | 95 | bar_product | 30 | 6.00 | 2022-03-10 00:27:05.447572+00 | 45 | 180.00
6 | 96 | bar_product | 50 | 8.00 | 2022-03-10 01:27:05.447572+00 | 95 | 400.00
7 | 97 | bar_product | 5 | 5.00 | 2022-03-10 02:27:05.447572+00 | 100 | 25.00
(15 rows)
It seems like this would be the kind of thing that has a more-or-less standardized solution, but so far none of the resources I've found / tried has guided me to a working approach.
How can I accurately do FIFO closing stock / inventory valuation in Postgres?
All guidance much appreciated!
Using "Set-based Speed Phreakery: The FIFO Stock Inventory SQL Problem" as an example, re-working that approach for Postgres and the change of table/columns produces this query:
/* Sum up the ins and outs to calculate the remaining stock level */
WITH cteStockSum
AS ( SELECT sku ,
SUM(adjustment_quantity) AS TotalStock
FROM mockup_inv_movements
GROUP BY sku
)
, cteReverseInSum
AS ( SELECT s.sku ,
s.created_at ,
( SELECT SUM(i.adjustment_quantity)
FROM mockup_inv_movements AS i
WHERE i.sku = s.sku
AND i.adjustment_quantity > 0
AND i.created_at >= s.created_at
) AS RollingStock ,
s.adjustment_quantity AS ThisStock
FROM mockup_inv_movements AS s
WHERE s.adjustment_quantity > 0
)
/* Using the rolling balance above find the first stock movement in that meets
(or exceeds) our required stock level */
/* and calculate how much stock is required from the earliest stock in */
, cteWithLastTranDate
AS ( SELECT w.sku ,
w.TotalStock ,
LastPartialStock.created_at ,
LastPartialStock.StockToUse ,
LastPartialStock.RunningTotal ,
w.TotalStock - LastPartialStock.RunningTotal
+ LastPartialStock.StockToUse AS UseThisStock
FROM cteStockSum AS w
CROSS JOIN LATERAL ( SELECT
z.created_at ,
z.ThisStock AS StockToUse ,
z.RollingStock AS RunningTotal
FROM cteReverseInSum AS z
WHERE z.sku = w.sku
AND z.RollingStock >= w.TotalStock
ORDER BY z.created_at DESC
LIMIT 1
) AS LastPartialStock
)
/* Sum up the cost of 100% of the stock movements in after the returned stockid and for that stockid we need 'UseThisStock' items' */
SELECT y.sku ,
y.TotalStock AS CurrentItems ,
SUM(CASE WHEN e.created_at = y.created_at THEN y.UseThisStock
ELSE e.adjustment_quantity
END * Price.unit_cost) AS CurrentValue
FROM cteWithLastTranDate AS y
INNER JOIN mockup_inv_movements AS e
ON e.SKU = y.SKU
AND e.created_at >= y.created_at
AND e.adjustment_quantity > 0
CROSS JOIN LATERAL (
/* Find the Price of the item in */ SELECT
p.unit_cost
FROM mockup_inv_movements AS p
WHERE p.SKU = e.SKU
AND p.created_at <= e.created_at
AND p.adjustment_quantity > 0
ORDER BY p.created_at DESC
LIMIT 1
) AS Price
GROUP BY y.sku ,y.TotalStock
ORDER BY y.sku
and from your sample data the result produced is this:
+-------------+--------------+--------------+
| sku | currentitems | currentvalue |
+-------------+--------------+--------------+
| bar_product | 75 | 545.00 |
| foo_product | 25 | 155.00 |
+-------------+--------------+--------------+
also see: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=f564a6cfda3374c2057b437f845a4bdf

Postgresql: how to select from map of multiple values

I have a SOME_DELTA table which records all party related transactions with amount change
Ex.:
PARTY_ID | SOME_DATE | AMOUNT
--------------------------------
party_id_1 | 2019-01-01 | 100
party_id_1 | 2019-01-15 | 30
party_id_1 | 2019-01-15 | -60
party_id_1 | 2019-01-21 | 80
party_id_2 | 2019-01-02 | 50
party_id_2 | 2019-02-01 | 100
I have a case where where MVC controller accepts map someMap(party_id, some_date) and I need to get part_id list with summed amount till specific some_date
In this case if I send mapOf("party_id_1" to Date(2019 - 1 - 15), "party_id_2" to Date(2019 - 1 - 2))
I should get list of party_id with summed amount till some_date
Output should look like:
party_id_1 | 70
party_id_2 | 50
Currently code is:
select sum(amount) from SOME_DELTA where party_id=:partyId and some_date <= :someDate
But in this case I need to iterate through map and do multiple DB calls for summed amount for eatch party_id till some_date which feels wrong
Is there a more delicate way to get in one select query? (to avoid +100 DB calls)
You can use a lateral join for this:
select map.party_id,
c.amount
from (
values
('party_id_1', date '2019-01-15'),
('party_id_2', date '2019-01-02')
) map (party_id, cutoff_date)
join lateral (
select sum(amount) amount
from some_delta sd
where sd.party_id = map.party_id
and sd.some_date <= map.cutoff_date
) c on true
order by map.party_id;
Online example

Running total of values from a table until it matches value from another table

I have 2 tables.
Table 1 is a temp variable table:
declare #Temp as table ( proj_num varchar(10), sum_dom decimal(23,8))
My temp table is populated with a list of project numbers, and a month end accounting dollar amount.
For example:
proj_num | sum_dom
11522 | 2477.15
11524 | 26474.20
41865 | 9012.10
Table 2 is a Project Transactions table.
We're concerned with just the following columns:
proj_num
amount
cost_code
tran_date
Individual values will somemething like this:
proj_num | cost_code | amount | tran_date
11522 | LBR | 112.10 | 10/1/2018
11522 | LBR | 1765.90 | 10/2/2018
11522 | MAT | 599.15 | 10/3/2018
11522 | FRT | 57.50 | 10/4/2018
So for this project, since the grand total of $2477.15 is met on 10/3, example output would be:
proj_num | cost_code | amount
11522 | LBR | 1878.00
11522 | MAT | 599.15
I want to sum the amounts (grouped by cost_code, and ordered by tran_date) under the project transaction table until the total sum of values for that project value matches the value in the sum_dom column of the temp table, at which point I will output that data.
Can you help me figure out how to write the query to do that?
I know I should avoid cursors, but I havent had much luck with my attempts so far. I cant seem to get it to keep a running total.
Running sum is done using SUM(...) OVER (ORDER BY ...). You just need to tell where to stop:
SELECT sq.*
FROM projects
INNER JOIN (
SELECT
proj_num,
cost_code,
amount,
SUM(amount) OVER (PARTITION BY proj_num ORDER BY tran_date) AS running_sum
FROM project_transactions
) AS sq ON projects.proj_num = sq.proj_num
WHERE running_sum <= projects.sum_dom
DB Fiddle

SQL to find max of sum of data in one table, with extra columns

Apologies if this has been asked elsewhere. I have been looking on Stackoverflow all day and haven't found an answer yet. I am struggling to write the query to find the highest month's sales for each state from this example data.
The data looks like this:
| order_id | month | cust_id | state | prod_id | order_total |
+-----------+--------+----------+--------+----------+--------------+
| 67212 | June | 10001 | ca | 909 | 13 |
| 69090 | June | 10011 | fl | 44 | 76 |
... etc ...
My query
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders GROUP BY `month`, `state`
ORDER BY sales;
| month | state | sales |
+------------+--------+--------+
| September | wy | 435 |
| January | wy | 631 |
... etc ...
returns a few hundred rows: the sum of sales for each month for each state. I want it to only return the month with the highest sum of sales, but for each state. It might be a different month for different states.
This query
SELECT `state`, MAX(order_sum) as topmonth
FROM (SELECT `state`, SUM(order_total) order_sum FROM orders GROUP BY `month`,`state`)
GROUP BY `state`;
| state | topmonth |
+--------+-----------+
| ca | 119586 |
| ga | 30140 |
returns the correct number of rows with the correct data. BUT I would also like the query to give me the month column. Whatever I try with GROUP BY, I cannot find a way to limit the results to one record per state. I have tried PartitionBy without success, and have also tried unsuccessfully to do a join.
TL;DR: one query gives me the correct columns but too many rows; the other query gives me the correct number of rows (and the correct data) but insufficient columns.
Any suggestions to make this work would be most gratefully received.
I am using Apache Drill, which is apparently ANSI-SQL compliant. Hopefully that doesn't make much difference - I am assuming that the solution would be similar across all SQL engines.
This one should do the trick
SELECT t1.`month`, t1.`state`, t1.`sales`
FROM (
/* this one selects month, state and sales*/
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders
GROUP BY `month`, `state`
) AS t1
JOIN (
/* this one selects the best value for each state */
SELECT `state`, MAX(sales) AS best_month
FROM (
SELECT `month`, `state`, SUM(order_total) AS sales
FROM orders
GROUP BY `month`, `state`
)
GROUP BY `state`
) AS t2
ON t1.`state` = t2.`state` AND
t1.`sales` = t2.`best_month`
It's basically the combination of the two queries you wrote.
Try this:
SELECT `month`, `state`, SUM(order_total) FROM orders WHERE `month` IN
( SELECT TOP 1 t.month FROM ( SELECT `month` AS month, SUM(order_total) order_sum FROM orders GROUP BY `month`
ORDER BY order_sum DESC) t)
GROUP BY `month`, state ;