SQL Aggregate Sum to Only Net Out Negative Rows - sql

I'm trying to roll up product values based on dates. The example below starts out with 20,000, adds 5,000, and then subtracts 7,000. The result should be eating through the entire 5,000 and then into the prior positive row. This would remove the 5,000 row.
I think this would be as simple as doing a sum window function ordered by date descending. However, as you can see below, I want to stop summing at any row that remains positive and then move to the next.
I cannot figure out the logic in SQL to make this work. In my head, it should be:
SUM(Value) OVER (PARTITION BY Product, (positive valued rows) ORDER BY Date DESC)
But there could be multiple positive valued rows in a row where a negative valued row could eat through all of them, or there could be multiple negative values in a row.
This post seemed promising, but I don't think the logic would work for if a negative value would be larger than the positive value.
HAVE:
+------------+----------------+-------+
| Date | Product | Value |
+------------+----------------+-------+
| 01/13/2015 | Prod1 | 20000 |
| 08/13/2015 | Prod1Addition1 | 5000 |
| 12/13/2015 | Prod1Removal | -7000 |
| 02/13/2016 | Prod1Addition2 | 2000 |
| 03/13/2016 | Prod1Addition3 | 1000 |
| 04/13/2016 | Prod1Removal | -1500 |
+------------+----------------+-------+
WANT:
+------------+----------------+-------+
| Date | Product | Value |
+------------+----------------+-------+
| 01/13/2015 | Prod1 | 18000 |
| 02/13/2016 | Prod1Addition2 | 1500 |
+------------+----------------+-------+

i can only think of a recursive cte solution
; with
cte as
(
select Date, Product, Value, rn = row_number() over (order by Date)
from yourtable
),
rcte as
(
select Date, Product, Value, rn, grp = 1
from cte
where rn = 1
union all
select Date = case when r.Value < 0 then c.Date else r.Date end,
Product = case when r.Value < 0 then c.Product else r.Product end,
c.Value,
c.rn,
grp = case when r.Value < 0 then r.grp + 1 else r.grp end
from rcte r
inner join cte c on r.rn = c.rn - 1
)
select Date, Product, Value = sum(Value)
from rcte
group by Date, Product, grp
order by Date

I think that you want this:
select Date,
Product,
Sum(Value) As Value
From TABLE_NAME
Group By Date, Product
Order by Date, Product;
thats correct?

Related

How to create BigQuery this query in retail dataset

I have a table with user retail transactions. It includes sales and cancels. If Qty is positive - it sells, if negative - cancels. I want to attach cancels to the most appropriate sell. So, I have tables likes that:
| CustomerId | StockId | Qty | Date |
|--------------+-----------+-------+------------|
| 1 | 100 | 50 | 2020-01-01 |
| 1 | 100 | -10 | 2020-01-10 |
| 1 | 100 | 60 | 2020-02-10 |
| 1 | 100 | -20 | 2020-02-10 |
| 1 | 100 | 200 | 2020-03-01 |
| 1 | 100 | 10 | 2020-03-05 |
| 1 | 100 | -90 | 2020-03-10 |
User with ID 1 has the following actions: buy 50 -> return 10 -> buy 60 -> return 20 -> buy 200 -> buy 10 - return 90. For each cancel row (with negative Qty) I find the previous row (by Date) with positive Qty and greater than cancel Qty.
So I need to create BigQuery queries to create table likes this:
| CustomerId | StockId | Qty | Date | CancelQty |
|--------------+-----------+-------+------------+-------------|
| 1 | 100 | 50 | 2020-01-01 | -10 |
| 1 | 100 | 60 | 2020-02-10 | -20 |
| 1 | 100 | 200 | 2020-03-01 | -90 |
| 1 | 100 | 10 | 2020-03-05 | 0 |
Does anybody help me with these queries? I have created one candidate query (split cancel and sales, join them, and do some staff for removing), but it works incorrectly in the above case.
I use BigQuery, so any BQ SQL features could be applied.
Any ideas will be helpful.
You can use the following query.
;WITH result AS (
select t1.*,t2.Qty as cQty,t2.Date as Date_t2 from
(select *,ROW_NUMBER() OVER (ORDER BY qty DESC) AS [ROW NUMBER] from Test) t1
join
(select *,ROW_NUMBER() OVER (ORDER BY qty) AS [ROW NUMBER] from Test) t2
on t1.[ROW NUMBER] = t2.[ROW NUMBER]
)
select CustomerId,StockId,Qty,Date,ISNULL(cQty, 0) As CancelQty,Date_t2
from (select CustomerId,StockId,Qty,Date,case
when cQty < 0 then cQty
else NULL
end AS cQty,
case
when cQty < 0 then Date_t2
else NULL
end AS Date_t2 from result) t
where qty > 0
order by cQty desc
result: https://dbfiddle.uk
You can do this as a gaps-and-islands problem. Basically, add a grouping column to the rows based on a cumulative reverse count of negative values. Then within each group, choose the first row where the sum is positive. So:
select t.* (except cancelqty, grp),
(case when min(case when cancelqty + qty >= 0 then date end) over (partition by customerid grp) = date
then cancelqty
else 0
end) as cancelqty
from (select t.*,
min(cancelqty) over (partition by customerid, grp) as cancelqty
from (select t.*,
countif(qty < 0) over (partition by customerid order by date desc) as grp
from transactions t
) t
from t
) t;
Note: This works for the data you have provided. However, there may be complicated scenarios where this does not work. In fact, I don't think there is a simple optimal solution assuming that the returns are not connected to the original sales. I would suggest that you fix the data model so you record where the returns come from.
The below query seems to satisfy the conditions and the output mentioned.The solution is based on mapping the base table (t) and having the corresponding canceled qty row alongside from same table(t1)
First, a self join based on the customer and StockId is done since they need to correspond to the same customer and product.
Additionally, we are bringing in the canceled transactions t1 that happened after the base row in table t t.Dt<=t1.Dt and to ensure this is a negative qty t1.Qty<0 clause is added
Further we cannot attribute the canceled qty if they are less than the Original Qty. Therefore I am checking if the positive is greater than the canceled qty. This is done by adding a '-' sign to the cancel qty so that they can be compared easily. -(t1.Qty)<=t.Qty
After the Join, we are interested only in the positive qty, so adding a where clause to filter the other rows from the base table t with canceled quantities t.Qty>0.
Now we have the table joined to every other canceled qty row which is less than the transaction date. For example, the Qty 50 can have all the canceled qty mapped to it but we are interested only in the immediate one came after. So we first group all the base quantity values and then choose the date of the canceled Qty that came in first in the Having clause condition HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
Finally we get the rows we need and we can exclude the last column if required using an outer select query
SELECT t.CustomerId,t.StockId,t.Qty,t.Dt,IFNULL(t1.Qty, 0) CancelQty
,t1.dt dt_t1
FROM tbl t
LEFT JOIN tbl t1 ON t.CustomerId=t1.CustomerId AND
t.StockId=t1.StockId
AND t.Dt<=t1.Dt AND t1.Qty<0 AND -(t1.Qty)<=t.Qty
WHERE t.Qty>0
GROUP BY 1,2,3,4
HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
ORDER BY 1,2,4,3
fiddle
Consider below approach
with sales as (
select * from `project.dataset.table` where Qty > 0
), cancels as (
select * from `project.dataset.table` where Qty < 0
)
select any_value(s).*,
ifnull(array_agg(c.Qty order by c.Date limit 1)[offset(0)], 0) as CancelQty
from sales s
left join cancels c
on s.CustomerId = c.CustomerId
and s.StockId = c.StockId
and s.Date <= c.Date
and s.Qty > abs(c.Qty)
group by format('%t', s)
if applied to sample data in your question - output is

Counting current items by month

I'm trying to build a monthly tally of active equipment, grouped by service area from a database log table. I think I'm 90% of the way there; I have a list of months, along with the total number of items that existed, and grouped by region.
However, I also need to know the state of each item as they were on the first of each month, and this is the part I'm stuck on. For instance, Item 1 is in region A in January, but moves to Region B in February. Item 2 is marked as 'inactive' in February, so shouldn't be counted. My existing query will always count item 1 in region A, and item 2 as 'active'.
I can correctly show that Item 3 is deleted in March, and Item 4 doesn't show up until the April count. I realize that I'm getting the first values because my query is specifying the min date, I'm just not sure how I need to change it to get what I want.
I think I'm looking for a way to group by Max(OperationDate) for each Month.
The Table looks like this:
| EQUIPID | EQUIPNAME | EQUIPACTIVE | DISTRICT | REGION | OPERATIONDATE | OPERATION |
|---------|-----------|-------------|----------|--------|----------------------|-----------|
| 1 | Item 1 | 1 | 1 | A | 2015-01-01T00:00:00Z | INS |
| 2 | Item 2 | 1 | 1 | A | 2015-01-01T00:00:00Z | INS |
| 3 | Item 3 | 1 | 1 | A | 2015-01-01T00:00:00Z | INS |
| 2 | Item 2 | 0 | 1 | A | 2015-02-10T00:00:00Z | UPD |
| 1 | Item 1 | 1 | 1 | B | 2015-02-15T00:00:00Z | UPD |
| 3 | (null) | (null) | (null) | (null) | 2015-02-21T00:00:00Z | DEL |
| 1 | Item 1 | 1 | 1 | A | 2015-03-01T00:00:00Z | UPD |
| 4 | Item 4 | 1 | 1 | B | 2015-03-10T00:00:00Z | INS |
There is also a subtable that holds attributes that I care about. It's structure is similar. Unfortunately, due to previous design decisions, there is no correlation to operations between the two tables. Any joins will need to be done using the EquipmentID, and have the overlapping states matched up for each date.
Current query:
--cte to build date list
WITH calendar (dt) AS
(SELECT &fromdate from dual
UNION ALL
SELECT Add_Months(dt,1)
FROM calendar
WHERE dt < &todate)
SELECT dt, a.district, a.region, count(*)
FROM
(SELECT EQUIPID, DISTRICT, REGION, OPERATION, MIN(OPERATIONDATE ) AS FirstOp, deleted.deldate
FROM Equipment_Log
LEFT JOIN
(SELECT EQUIPID,MAX(OPERATIONDATE) as DelDate
FROM Equipment_Log
WHERE OPERATION = 'DEL'
GROUP BY EQUIPID
) Deleted
ON Equipment_Log.EQUIPID = Deleted.EQUIPID
WHERE OPERATION <> 'DEL' --AND additional unimportant filters
GROUP BY EQUIPID,DISTRICT, REGION , OPERATION, deldate
) a
INNER JOIN calendar
ON (calendar.dt >= FirstOp AND calendar.dt < deldate)
OR (calendar.dt >= FirstOp AND deldate is null)
LEFT JOIN
( SELECT EQUIPID, MAX(OPERATIONDATE) as latestop
FROM SpecialEquip_Table_Log
--where SpecialEquip filters
group by EQUIPID
) SpecialEquip
ON a.EQUIPID = SpecialEquip.EQUIPID and calendar.dt >= SpecialEquip.latestop
GROUP BY dt, district, region
ORDER BY dt, district, region
Take only last operation for each id. This is what row_number() and where rn = 1 do.
We have calendar and data. Make partitioned join.
I assumed that you need to fill values for months where entries for id are missing. So nvl(lag() ignore nulls) are needed, because if something appeared in January it still exists in Feb, March and we need district, region values from last not empty row.
Now you have everything to make count. That part where you mentioned SpecialEquip_Table_Log is up to you, because you left-joined this table and not used it later, so what is it for? Join if you need it, you have id.
db<>fiddle
with
calendar(mth) as (
select date '2015-01-01' from dual union all
select add_months(mth, 1) from calendar where mth < date '2015-05-01'),
data as (
select id, dis, reg, dt, op, act
from (
select equipid id, district dis, region reg,
to_char(operationdate, 'yyyy-mm') dt,
row_number()
over (partition by equipid, trunc(operationdate, 'month')
order by operationdate desc) rn,
operation op, nvl(equipactive, 0) act
from t)
where rn = 1 )
select mth, dis, reg, sum(act) cnt
from (
select id, mth,
nvl(dis, lag(dis) ignore nulls over (partition by id order by mth)) dis,
nvl(reg, lag(reg) ignore nulls over (partition by id order by mth)) reg,
nvl(act, lag(act) ignore nulls over (partition by id order by mth)) act
from calendar
left join data partition by (id) on dt = to_char(mth, 'yyyy-mm') )
group by mth, dis, reg
having sum(act) > 0
order by mth, dis, reg
It may seem complicated, so please run subqueries separately at first to see what is going on. And test :) Hope this helps.

How to pass the value of previous row to current row?

How can I pass the result of previous row to the computation of the current row
Given the unit and the cost, I need to get the average cost of each transactions:
The formula:
Average cost is the sum of transaction cost
If Type is Sub then Trx cost is equal to cost
If Type is Red then Trx cost is Unit * (sum of previous trx cost/sum of previous units)
| Row | Type | Unit | Cost | TrxCost | Ave_cost |
| 1 |Sub | 0.2 | 1000 | 1000 | 1000 |
| 2 |Sub | 0.3 | 2500 | 2500 | 3500 |
| 3 |Sub | 0.1 | 600 | 600 | 4100 |
| 4 |Red |- 0.2 |-1100 | -1366.67 | 2733.33 |
| 5 |Sub | 0.3 | 1000 | 1000 | 3733.33 |
| 6 |Red | -0.6 | -600 | -3200 | 533.33 |
Update:
Order is based on row number.
Thanks.
You may use Recursive CTE
WITH cte (row_num,
type,
unit,
sum_of_unit,
cost,
trxcost,
ave_cost
) AS (
SELECT row_num,
type,
unit,
unit AS sum_of_unit,
cost,
cost AS trxcost,
cost AS ave_cost
FROM t
WHERE row_num IN (
SELECT MIN(row_num)
FROM t
)
UNION ALL
SELECT t.row_num,
t.type,
t.unit,
c.sum_of_unit + t.unit AS sum_of_unit,
t.cost,
CASE t.type
WHEN 'Sub' THEN t.cost
WHEN 'Red' THEN t.unit * ( c.ave_cost / c.sum_of_unit )
END
AS trxcost,
c.ave_cost + CASE t.type
WHEN 'Sub' THEN t.cost
WHEN 'Red' THEN t.unit * ( c.ave_cost / c.sum_of_unit )
END AS ave_cost
FROM t
JOIN cte c ON t.row_num = c.row_num + 1
)
SELECT * FROM cte
Dbfiddle Demo
You can do this in two passes: one to get theTrxCost, then one to get the Ave_cost.
What you are calling "average" is a running total by the way; you are merely adding up values.
You need window functions with ROWS BETWEEN clauses. (In case of SUM(...) OVER (ORDER BY ...) this is implicitly BETWEEN UNBOUNDED PRECEDING AND CURRENT, however).
select
id, type, unit, cost, round(trxcost, 2) as trxcost,
round(sum(trxcost) over (order by id), 2) as ave_cost
from
(
select
id, type, unit, cost,
case
when type = 'Sub' then cost
else
unit *
sum(cost) over (order by id rows between unbounded preceding and 1 preceding) /
sum(unit) over (order by id rows between unbounded preceding and 1 preceding)
end as trxcost
from mytable
)
order by id;
I renamed your row column id, because ROW is a reserved word.
The last row's results differ from yours. I used your formula, but get different figures.
Rextester demo: https://rextester.com/ASXFY4323
See the SQL Window Functions , which allow you to access values from other rows in the result set. In your case, you will need to tell us some more criteria for when to stop looking etc.:
select
lag(unit,1) over (partition by type order by whatever)
* lag(cost,1) over (partition by type order by whatever)
from Trx
But I'm still missing how you want to correlate the transactions and the reductions to each other. There must be some column you're not telling us about. If that column (PartNumber?) is known, you can simply group by and sum by that.

Select only 1 payment from a table with customers with multiple payments

I have a table called "payments" where I store all the payments of my costumers and I need to do a select to calculate the non-payment rate in a given month.
The costumers can have multiples payments in that month, but I should count him only once: 1 if any of the payments is done and 0 if any of the payment was made.
Example:
+----+------------+--------+
| ID | DATEDUE | AMOUNT |
+----+------------+--------+
| 1 | 2016-11-01 | 0 |
| 1 | 2016-11-15 | 20.00 |
| 2 | 2016-11-10 | 0 |
+----+------------+--------+
The result I expect is from the rate of november:
+----+------------+--------+
| ID | DATEDUE | AMOUNT |
+----+------------+--------+
| 1 | 2016-11-15 | 20.00 |
| 2 | 2016-11-10 | 0 |
+----+------------+--------+
So the rate will be 50%.
But if the select is:
SELECT * FROM payment WHERE DATEDUE BETWEEN '2016-11-01' AND '2016-11-30'
It will return me 3 rows and the rate will be 66%, witch is wrong. Ideas?
PS: This is a simpler example of the real table. The real query have a lot of columns, subselects, etc.
It sounds like you need to partition your results per customer.
SELECT TOP 1 WITH TIES
ID,
DATEDUE,
AMOUNT
ORDER BY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AMOUNT DESC)
WHERE DATEDUE BETWEEN '2016-11-01' AND '2016-11-30'
PS: The BETWEEN operator is frowned upon by some people. For clarity it might be better to avoid it:
What do BETWEEN and the devil have in common?
Try this
SELECT
id
, SUM(AMOUNT) AS AMOUNT
FROM
Payment
GROUP BY
id;
This might help if you want other columns.
WITH cte (
SELECT
id
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AMOUNT DESC ) AS RowNum
-- other row
)
SELECT *
FROM
cte
WHERE
RowNum = 1;
To calculate the rate, you can use explicit division:
select 1 - count(distinct case when amount > 0 then id end) / count(*)
from payment
where . . .;
Or, in a way that is perhaps easier to follow:
select avg(flag * 1.0)
from (select id, (case when max(amount) > 0 then 0 else 1 end) as flag
from payment
where . . .
group by id
) i

Stock calculation in Postgres

I have a table p1 with transactions in Postgres like this:
| id | product_id | transaction_date | quantity |
|----|------------|------------------|----------|
| 1 | 1 | 2015-01-01 | 1 |
| 2 | 1 | 2015-01-02 | 2 |
| 3 | 1 | 2015-01-03 | 3 |
and p2 table with products like this:
| id | product | stock |
|----|--------------|-------|
| 1 | Product A | 15 |
stock in p2' has been be reduced for every new record in p1.
How to reconstruct previous states to get this result?
| product | first_stock | quantity | last_stock |
|-----------|-------------|----------|------------|
| Product A | 21 | 1 | 20 |
| Product A | 20 | 2 | 18 |
| Product A | 18 | 3 | 15 |
I have tried using lead() to get the quantity after the current row.
SELECT p2.product, p1.quantity, lead(p1.quantity) OVER(ORDER BY p1.id DESC)
FROM p1 INNER JOIN p2 ON p1.product_id = p2.id;
But how to calculate leading rows from the current stock?
You don't just need lead() you need the running sum over all rows in between to reconstruct previous states from transaction data:
SELECT p2.product
, p2.stock + px.sum_quantity AS first_stock
, px.quantity
, p2.stock + px.sum_quantity - quantity AS last_stock
FROM p2
JOIN (
SELECT product_id, quantity, transaction_date
, sum(quantity) OVER (PARTITION BY product_id
ORDER BY transaction_date DESC) AS sum_quantity
FROM p1
) px ON px.product_id = p2.id
ORDER BY px.transaction_date;
Assuming the course of events actually indicated by transaction_date.
Use the aggregate function sum() as window-aggregate function to get the running sum. Use a subquery, since we use the running sum of quantities (sum_quantity) multiple times.
For last_stock subtract quantity of the current row (after adding it redundantly).
Nitpick
Theoretically, it would be cheaper to use a custom frame definition for the window frame to only sum quantities up to preceding row, so we don't add and subtract the quantity of the current row redundantly. But that's more complex and hardly faster in reality:
SELECT p2.id, p2.product, px.transaction_date -- plus id and date for context
, p2.stock + COALESCE(px.pre_sum_q + px.quantity, 0) AS first_stock
, px.quantity
, p2.stock + COALESCE(px.pre_sum_q, 0) AS last_stock
FROM p2
LEFT JOIN (
SELECT id, product_id, transaction_date
, quantity
, sum(quantity) OVER (PARTITION BY product_id
ORDER BY transaction_date DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS pre_sum_q
FROM p1
) px ON px.product_id = p2.id
ORDER BY px.transaction_date, px.id;
Explanation for the frame definition in this related answer:
Grouping based on sequence of rows
While being at it, also demonstrating how to prevent missing rows and NUll values with LEFT JOIN and COALESCE for products that don't have any related rows in p1, and a stable sort order if there are multiple transactions for the same product on the same day.
Still assuming all columns to be defined NOT NULL, or you need to do some more for corner cases with NULL values.