SUM a specific column in next rows until a condition is true - sql

Here is a table of articles and I want to store sum of Mass Column from next rows in sumNext Column based on a condition.
If next row has same floor (in floorNo column) as current row, then add the mass of next rows until the floor is changed
E.g : Rows three has sumNext = 2. That is computed by adding the mass from row four and row five because both rows has same floor number as row three.
id
mass
symbol
floorNo
sumNext
2891176
1
D
1
0
2891177
1
L
8
0
2891178
1
L
1
2
2891179
1
L
1
1
2891180
1
1
0
2891181
1
5
2
2891182
1
5
1
2891183
1
5
0
Here is the query, that is generating this table, I just want to add sumNext column with the right value inside.
WITH items AS (SELECT
SP.id,
SP.mass,
SP.symbol,
SP.floorNo
FROM articles SP
ORDER BY
DECODE(SP.symbol,
'P',1,
'D',2,
'L',3,
4 ) asc)
SELECT CLS.*
FROM items CLS;

You could use below solution which uses
common table expression (cte) technique to put all consecutive rows with same FLOORNO value in the same group (new grp column).
Then uses the analytic version of SUM function to sum all next MASS per grp column as required.
Items_RowsNumbered (id, mass, symbol, floorNo, rnb) as (
select ID, MASS, SYMBOL, FLOORNO
, row_number()over(
order by DECODE(symbol, 'P',1, 'D',2, 'L',3, 4 ) asc, ID )
/*
You need to add ID column (or any others columns that can identify each row uniquely)
in the "order by" clause to make the result deterministic
*/
from (Your source query)Items
)
, cte(id, mass, symbol, floorNo, rnb, grp) as (
select id, mass, symbol, floorNo, rnb, 1 grp
from Items_RowsNumbered
where rnb = 1
union all
select t.id, t.mass, t.symbol, t.floorNo, t.rnb
, case when t.floorNo = c.floorNo then c.grp else c.grp + 1 end grp
from Items_RowsNumbered t
join cte c on (c.rnb + 1 = t.rnb)
)
select
ID, MASS, SYMBOL, FLOORNO
/*, RNB, GRP*/
, nvl(
sum(MASS)over(
partition by grp
order by rnb
ROWS BETWEEN 1 FOLLOWING and UNBOUNDED FOLLOWING)
, 0
) sumNext
from cte
;
demo on db<>fiddle

This is a typical gaps-and-islands problem. You can use LAG() in order to determine the exact partitions, and then SUM() analytic function such as
WITH ii AS
(
SELECT i.*,
ROW_NUMBER() OVER (ORDER BY id DESC) AS rn2,
ROW_NUMBER() OVER (PARTITION BY floorNo ORDER BY id DESC) AS rn1
FROM items i
)
SELECT id,mass,symbol, floorNo,
SUM(mass) OVER (PARTITION BY rn2-rn1 ORDER BY id DESC)-1 AS sumNext
FROM ii
ORDER BY id
Demo

Related

Finding unique combination of columns associated with 1 non-unique column

Here's my table:
ItemID
ItemName
ItemBatch
TrackingNumber
a
bag
1
498239
a
bag
1
498239
a
bag
1
958103
b
paper
2
123444
b
paper
2
123444
I'm trying to find occurrences of ItemID + ItemName + ItemBatch that have a non-unique TrackingNumber. So in the example above, there are 3 occurrences of a bag 1 and at least 1 of those rows has a different TrackingNumber from any of the other rows. In this case 958103 is different from 498239 so it should be a hit.
For b paper 2 the TrackingNumber is unique for all the respective rows so we ignore this. Is there a query that can pull this combination of columns with 3 identical fields and 1 non-unique field?
Yet another option:
SELECT *
FROM tab
WHERE ItemBatch IN (SELECT ItemBatch
FROM tab
GROUP BY ItemBatch, TrackingNumber
HAVING COUNT(TrackingNumber) = 1)
This query finds the combination of (ItemBatch, TrackingNumber) that occur only once, then gets all rows corresponding to their ItemBatch values.
Try it here.
You can use GROUP BY and HAVING
SELECT
t.ItemID,
t.ItemName,
t.ItemBatch,
COUNT(*)
FROM YourTable t
GROUP BY
t.ItemID,
t.ItemName,
t.ItemBatch
HAVING COUNT(DISTINCT TrackingNumber) > 1;
Or if you want each individual row you can use a window function. You cannot use COUNT(DISTINCT in a window function, but you can simulate it with DENSE_RANK and MAX
SELECT
t.*
FROM (
SELECT *,
Count = MAX(dr) OVER (PARTITION BY t.ItemID, t.ItemName, t.ItemBatch)
FROM (
SELECT *,
dr = DENSE_RANK() OVER (PARTITION BY t.ItemID, t.ItemName, t.ItemBatch ORDER BY t.TrackingNumber)
FROM YourTable t
) t
) t
WHERE t.Count > 1;
db<>fiddle

Running Total Minus

I am trying to calculate a minus running total in SQL with this code but it is not giving me the expected result. After the date 01/2021, I would like to minus the sales for each month.
select Name, Date, Sales, MinusRunningTotal = B.Sales - SUM(A.sales)
OVER (PARTITION BY Name ORDER BY Date)
From TableA A
Join TableB B on A.ID = B.ID
Where Date > '01/2021'
This is how the data is displayed
Name Date Sales
A 01/2021 10
A 02/2021 1
A 03/2021 2
A 04/2021 3
This is what I want to achieve
Name Date Sales MinusRunningTotal
A 01/2021 10 10
A 02/2021 1 9
A 03/2021 2 7
A 04/2021 3 4
If that data already exists in a table with name, date, and sales columns, try:
SELECT [name],
[date],
[sales],
FIRST_VALUE([sales]) OVER (PARTITION BY [name]
ORDER BY [date]) * 2
- SUM([sales]) OVER (PARTITION BY [name]
ORDER BY [date]
ROWS UNBOUNDED PRECEDING) AS minus_running_total
FROM my_table
sql fiddle
This computes all preceding sales for the current name (including current value)
`SUM([sales]) OVER (PARTITION BY [name]
ORDER BY [date]`)
This computes the first chronological value for the current name, X 2:
`FIRST_VALUE([sales]) OVER (PARTITION BY [name]
ORDER BY [date]) * 2`
So the first row computes as (10 x 2) - 10 = 10
Second row is (10 x 2) - (10 + 1) = 9
Third row is (10 x 2) - (10 + 1 + 2) = 7
etc
select Name, Date, Sales,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Date) as RN into
#temp2 From table
select Name, Date, Sales,
case when rn != 1
then (select sale from #temp2 where RN = 1) - sum(sales)
OVER (PARTITION BY Name ORDER BY Date
rows between UNBOUNDED PRECEDING and current row)
else sales end
From #temp2
I made a table sss the contains the data you show in your second code section. This query will give the results you want. If you need to join in other data, do it in the CTE called xs below.
with
xs as (
select name, dt, sales, sales tot,
ROW_NUMBER() over (partition by name order by dt) n
from sss
),
rec as (
select * from xs where n = 1
union all
select xs.name, xs.dt, xs.sales, rec.tot - xs.sales, xs.n
from rec
join xs on rec.name = xs.name
and xs.n = rec.n + 1
)
select *
from rec
How it works:
In the CTE (common table expression) xs, we number the rows associated with a given name in ascending dt order.
CTE rec is a recursive query that begins by fetching the first row of the group associated with a name via filtering with "n = 1". That becomes the first row of the output. The second part of rec fetches succeeding rows where n of the new row equals n + 1 of the previous row. The desired running total, kept in column tot, is gotten by subtracting the new row's sales from the previous rows tot.

BigQuery Standard SQL - Cumulative Count of (almost) Duplicated Rows

With the following data:
id
field
eventTime
1
A
1
1
A
2
1
B
3
1
A
4
1
B
5
1
B
6
1
B
7
For visualisation purposes, I would like to turn it into the below. Consecutive occurrences of the same field value essentially get aggregated into one.
id
field
eventTime
1
Ax2
1
1
B
3
1
A
4
1
Bx3
5
I will then use STRING_AGG() to turn it into "Ax2 > B > A > Bx3".
I've tried using ROW_NUMBER() to count the repeated instances, with the plan being to utilise the highest row number to modify the string in field, but if I partition on eventTime, there are no consecutive "duplicates", and if I don't partition on it then all rows with the same field value are counted - not just consecutive ones.
I though about bringing in the previous field with LAG() for a comparison to reset the row count, but that only works for transitions from one field value to the other and is a problem if the same field is repeated consecutively.
I'm been struggling with this to the point where I'm considering writing a script that just CASE WHENs up to a reasonable number of consecutive hits, but I've seen it get as high as 17 on a given day and really don't want to be doing that!
My other alternative will just be to enforce a maximum number of field values to help control this, but now I've started this problem I'd quite like to solve it without that, if at all possible.
Thanks!
Consider below
select id,
any_value(field) || if(count(1) = 1, '', 'x' || count(1)) field,
min(eventTime) eventTime
from (
select id, field, eventTime,
countif(ifnull(flag, true)) over(partition by id order by eventTime) grp
from (
select id, field, eventTime,
field != lag(field) over(partition by id order by eventTime) flag
from `project.dataset.table`
)
)
group by id, grp
# order by eventTime
If applied to sample data in your question - output is
Just use lag() to detect when the value of field changes. You can now do that with qualify:
select t.*
from t
where 1=1
qualify lag(field, 1, '') over (partition by id order by eventtime) <> field;
For your final step, you can use a subquery:
select id, string_agg(field, '->' order by eventtime)
from (select t.*
from t
where 1=1
qualify lag(field, 1, '') over (partition by id order by eventtime) <> field
) t
group by id;

How to retrieve MAX Turntime of Top Two earliest date?

How would I construct a query to receive the MAX TurnTime per ID of the first 2 rounds? Rounds being defined as minimum Beginning_Date to mininmum End_Date of an ID. Without reusing either of the dates for the second round Turn Time calculation.
You can use row_number() . . . twice:
select d.*
from (select d.*,
row_number() over (partition by id order by turn_time desc) as seqnum_turntime
from (select d.*,
row_number() over (partition by id order by beginning_end desc) as seqnum_round
from data d
) d
where seqnum_round <= 2
) d
where seqnum_turntime = 1;
The innermost subquery gets the first two rounds. The outer subquery gets the maximum.
You could express this without window functions as well:
select top (1) with ties d.*
from data d
where d.beginning_date <= (select d2.beginning_date
from data d2
where d2.id = d.id
offset 1 fetch first 1 row only
)
order by row_number() over (partition by id order by turntime desc);
SELECT
ID
,turn_time
,beginning_date
,end_date
FROM
(
SELECT
ID
,MAX(turn_time) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS turn_time --Maximum turn time of the current row and preceding row
,MIN(BeginningDate) OVER (PARTITION BY Id ORDER BY BeginningDate ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS beginning_date --Minimum begin date over current row and preceding row (could also use LAG)
,end_date
,ROW_NUMBER() OVER (PARTITION BY Id ORDER BY BeginningDate) AS Turn_Number
FROM
<whatever your table is>
) turn_summary
WHERE
Turn_Number = 2

SQL Ranking N records by one criteria and N records by another and repeat

In my table I have 4 columns Id, Type InitialRanking & FinalRanking. Based on certain criteria I’ve managed to apply InitialRanking to the records (1-20). I now need to apply FinalRanking by identifying the top 7 of Type 1 followed by the
top 3 of Type 2. Then I need to repeat the above until all records have a FinalRanking. My goal would be to achieve the output in the final column of the attached image.
The 7 & 3 will vary over time but for the purposes of this example let’s say they are fixed.
you can try like this
SELECT * FROM(
( SELECT ID,DISTINCT TYPE,
CASE WHEN TYPE=1 THEN
( SELECT TOP 7 INITIALRANK, FINALRANK
from table where type=1)
ELSE
( SELECT TOP 3 INITIALRANK, FINALRANK
from table where type=2)
END CASE
FROM TABLE WHERE TYPE IN (1,2)
)
UNION
( SELECT ID,TYPE,
INITIALRANK, FINALRANK
from table where type not in (1,2))
)
)
A simple (or simplistic) approach to your Final Rank would be the following:
row_number() over (partition by type order by initrank) +
case type
when 1 then (ceil((row_number() over (partition by type order by initrank))/7)-1)*(10-7)
when 2 then (ceil((row_number() over (partition by type order by initrank))/3)-1)*(10-3)+7
end FinalRank
This can be generalized for more than 2 groups for example with three groups of size 7, 3 and 2, the pattern size is 7+3+2=12 the general form is PartitionedRowNum+(Ceil(PartitionedRowNum/GroupSize)-1)*(PaternSize-GroupSize)+Offset where the offset is the sum of the preceding group sizes:
row_number() over (partition by type order by initrank) +
case type
when 1 then (ceil((row_number() over (partition by type order by initrank))/7)-1)*(12-7)
when 2 then (ceil((row_number() over (partition by type order by initrank))/3)-1)*(12-3)+7
when 3 then (ceil((row_number() over (partition by type order by initrank))/2)-1)*(12-2)+7+3
end FinalRank