Rolling sum on sequentially linked data - sql

I am working on large a legacy dataset with sequentially related data of which I lack the words to explain so I made a beautiful paint image. This isn't of course of the real dataset but it is close. In the example there are three sequences.
Each record has an ID and a value. It also has a pointer to the next related ID. The sequence length is random and stops when the next related ID hits a 0 value. All records are only used once in one sequence, meaning they cant merge or split. A sequence can consist of only one record.
What I need to accomplish is to get the rolling sum on each record of a sequence using a SQL query(SQL server 2014). I know how to do this if there is a common identifier in the sequence, but in this case there is not.
I have been able to accomplish it in Excel (for what it's worth) by finding the previous sum (if it exists) and adding the current value. But I'm unable to translate it to SQL. Does anyone know where to start to get to the end goal of the 'rolling sum result' column in SQL?
[previous sum] formule: =IFNA(INDEX([rolling sum formula],MATCH([#id],[next_pointer],0),0),0)
[rolling sum result] formula: =[#[previous sum]]+[#value]
*The data sequences aren't sorted like in the Excel example. This just makes it easier to read in the example.

You need something like a RECURSIVE query.
You can do this with CTE. This is a test with your data (the column you seek is "cumul", the others are there to help understand what's going on):
WITH sequenza AS (
SELECT
id,
value,
nextid,
id AS lastid,
value as cumul
FROM
items
WHERE nextid = 0
UNION ALL
SELECT
curr.id,
curr.value,
curr.nextid,
prev.lastid,
prev.cumul + curr.value AS cumul
FROM
items AS curr
INNER JOIN sequenza AS prev
ON prev.id = curr.nextid
)
SELECT * FROM sequenza
WHERE id = 31;
To do this in reverse order... there is probably more than one way. Off the top of my head I'd get, for each chain (identified by its lastid), the minimum and maximum cumul value, then I'd apply the ladder algorithm - in this case the descending rolling sum is VALMIN+VALMAX-ROLLING.
So, something like
WITH sequenza AS (
SELECT
id,
value,
nextid,
id AS lastid,
value as cumul
FROM
items
WHERE nextid = 0
UNION ALL
SELECT
curr.id,
curr.value,
curr.nextid,
prev.lastid,
prev.cumul + curr.value AS cumul
FROM
items AS curr
INNER JOIN sequenza AS prev
ON prev.id = curr.nextid
),
sequenza2 AS (
SELECT
id,
value,
nextid,
id AS lastid,
value as cumul
FROM
items
WHERE nextid = 0
UNION ALL
SELECT
curr.id,
curr.value,
curr.nextid,
prev.lastid,
prev.cumul + curr.value AS cumul
FROM
items AS curr
INNER JOIN sequenza2 AS prev
ON prev.id = curr.nextid
)
SELECT sequenza.*, m1+m2-cumul AS cumulasc FROM sequenza
JOIN (
SELECT lastid, MIN(cumul) AS m1, MAX(cumul) AS m2
FROM sequenza2
GROUP BY lastid
) AS cirpo ON (sequenza.lastid = cirpo.lastid)
ORDER BY sequenza.lastid, cumul DESC

Related

Ranking players in SQL database will return always 1 when where id_user is used

I basically have a table "race" with columns for "id_race", "id_user" and columns for user predictions "pole_position", "1st", "2nd", "3rd" and "fastest_lap". In addition to those columns, each prediction column also has a control column such as "PPC", "1eC", "2eC", "3eC" and "srC". Those control columns are then compared by a query against a "result" table. Then the control columns in race are awarded points for a correct prediction.
table race
I want to add up those results per user and then rank them per user. I want to show that rank on the player's user page. I have a query for my SQL which works fine in itself and gives me a list with rank column.
SELECT
#rownum := #rownum +1 AS rank,
total,
id_user
FROM
(SELECT
SUM(PPC + 1eC + 2eC + 3eC + srC ) AS total,
id_user
FROM
race
GROUP BY
id_user
ORDER BY
total DESC) T,
(SELECT #rownum := 0) a;
Output of rank query:
However when I add the where id_user it always gets the first rank. Does anyone have an idea if this can be solved and how I could achieve it to add where to my rank query?
I've already tried filtering. In addition, I have tried to use the Row_number function. It also always gives a result of 1 because only 1 user remains after filtering. I am unable to filter out the correct position. So please help!
You have to create a view to extracting the correct rank. Once you use WHERE clause, you will get the rank based on the population rather that the subset.
Please find an indicative answer on fiddle where a CTE and ROW function are used. The indicative code is:
WITH sum_cte AS (
SELECT ROW_NUMBER() OVER(ORDER BY SUM(PPC + 1EC + 2eC + 3eC + srC) DESC) AS Row,
id_user,
SUM(PPC + 1EC + 2eC + 3eC + srC) AS total_sum
FROM race
GROUP BY id_user)
SELECT Row, id_user, total_sum
FROM sum_cte
WHERE id_user = 1
User 1 with the second score will appear with a row valuation 2.

Select all rows where the sum of column X is greather or equal than Y

I need to find a group of lots to satisfy X demand for items. I can't do it with aggregate functions, it seems to me that I need something more than a window function, do you know anything that can help me solve this problem?
For example, if I have a demand for 1 Item, the query should return any lot with a quantity greater than or equal to 1. But if I have a demand for 15, there are no lots with that availability, so it should return a lot of 10 and another with 5 or one of 10 and two of 3, etc.
With a programming language like Java this is simple, but with SQL is it possible? I am trying to achieve it with sales functions but I cannot find a way to add the available quantity of the current row until reaching the required quantity.
SELECT id,VC_NUMERO_LOTE,SF_FECHA_CREACION,SI_ID_M_ARTICULO,VI_CANTIDAD,NEXT, VI_CANTIDAD + NEXT AS TOT FROM (
SELECT row_number() over (ORDER BY SF_FECHA_CREACION desc) id ,VC_NUMERO_LOTE,SF_FECHA_CREACION,SI_ID_M_ARTICULO,
VI_CANTIDAD,LEAD(VI_CANTIDAD,1) OVER (ORDER BY SF_FECHA_CREACION desc) as NEXT FROM PUBLIC.M_LOTE WHERE SI_ID_M_ARTICULO = 44974
AND VI_CANTIDAD > 0 ) AS T
WHERE MOD(id, 2) != 0
I tried with lead to then sum only odd records but I saw that it is not the way, any suggestions?
You need a recursive query like this:
demo:db<>fiddle
WITH RECURSIVE lots_with_rowcount AS ( -- 1
SELECT
*,
row_number() OVER (ORDER BY avail_qty DESC) as rowcnt
FROM mytable
), lots AS ( -- 2
SELECT -- 3
lot_nr,
avail_qty,
rowcnt,
avail_qty as total_qty
FROM lots_with_rowcount
WHERE rowcnt = 1
UNION
SELECT
t.lot_nr,
t.avail_qty,
t.rowcnt,
l.total_qty + t.avail_qty -- 4
FROM lots_with_rowcount t
JOIN lots l ON t.rowcnt = l.rowcnt + 1
AND l.total_qty < --<your demand here>
)
SELECT * FROM lots -- 5
This CTE is only to provide a row count to each record which can be used within the recursion to join the next records.
This is the recursive CTE. A recursive CTE contains two parts: The initial SELECT statement and the recursion.
Initial part: Queries the lot record with the highest avail_qty value. Naturally, you can order them in any order you like. Most qty first yield the smallest output.
After the UNION the recursion part: Here the current row is joined the previous output AND as an additional condition: Join only if the previous output doesn't fit your demand value. In that case, the next total_qty value is calculated using the previous and the current qty value.
Recursion end, when there's no record left which fits the join condition. Then you can SELECT the entire recursion output.
Notice: If your demand was higher than your all your available quantities in total, this would return the entire table because the recursion runs as long as the demanded is not reached or your table ends. You should add a query before, which checks this:
SELECT SUM(avail_qty) > demand FROM mytable
I gratefully fiddled around with S-Man's fiddle and found a query, at least simpler to understand
select lot_nr, avail_qty, tot_amount from
(select lot_nr, avail_qty,
sum(avail_qty) over (order by avail_qty desc rows between unbounded preceding and current row) as tot_amount,
sum(avail_qty) over (order by avail_qty desc rows between unbounded preceding and current row) - avail_qty as last_amount
from mytable) amounts
where last_amount < 15 -- your amount here
so this lists all rows where with the predecesor (in descending order by avail_qty) the limit isn't yet reached
Here is a simple old-school PL/pgSQL version that uses a (slow) loop. It returns only the lot numbers as an illustration. Basically what it does is return lot numbers for a particular item_id in certain order (that reflects the required business rules) and allocates the available quantities until the allocated quantity is equal or exceeds the required quantity.
create function get_lots(required_item integer, required_qty integer) returns setof text as
$$
declare
r record;
allocated_qty integer := 0;
begin
for r in select * from lots where item_id = required_item order by <your biz-rule> loop
return next r.lot_number;
allocated_qty := allocated_qty + r.available_qty;
exit when allocated_qty >= required_qty;
end loop;
end;
$$ language plpgsql;
-- Use
select lot_id from get_lots(1, 17) lot_id;

Oracle - return only the first row for each product

In the first SELECT statement, the report grabs inv details for all products with shipment activity. There is then a UNION that connect another SELECT statement to grab products without activity from the last calendar year.
However, the records that are returned in the second SELECT statement have multiple header_id’s and therefore multiple lines… instead of single lines like the first SELECT statement. Do you know how to pull only the first header_id of each record in the second SELECT statement?
Code and example result set below. In data, product #7 should only list the row for header_id 1372288 which is the last ID entered into the DB.
select 3 sort_key, header_Id,location_id,nlasinv.product,
start_inv,produced produced_inv,stored,from_stock,shipped,
(start_inv + produced + stored) - (from_stock + shipped) end_inv,nlas_ops_mtd_prodsize(111,nlasinv.product,'31-DEC-19'), nlas_ops_mtd_shipsize(111,nlasinv.product,'31-DEC-19'),nlas_ops_ytd_prodsize(111,nlasinv.product,'31-DEC-19'), nlas_ops_ytd_shipsize(111,nlasinv.product,'31-DEC-19')
from nlas_header inv,
nlas_inventory nlasinv
where nlasinv.header_id = 1372168
and inv.id = nlasinv.header_id
union
select distinct
3 sort_key,header_Id,location_id,nlasinv.product,
start_inv,produced produced_inv,stored,from_stock,shipped,
(start_inv + produced + stored) - (from_stock + shipped) end_inv,nlas_ops_mtd_prodsize(111,nlasinv.product,'31-DEC-19'),
nlas_ops_mtd_shipsize(111,nlasinv.product,'31-DEC-19'),nlas_ops_ytd_prodsize(111,nlasinv.product,'31-DEC-19'),
nlas_ops_ytd_shipsize(111,nlasinv.product,'31-DEC-19')
from
nlas_inventory nlasinv,
nlas_header hdr
where
nlasinv.header_id = hdr.id
and hdr.location_id = 409
and hdr.observation_date >= trunc(to_date('31-DEC-19','dd-mon-rr'),'year')
and nlasinv.product not in
(select distinct product from
nlas_header h,
nlas_inventory i
where i.header_id = 1372168)
order by product, header_id des
c
I don't know what your query has to do with the "table" data that you show. But you seem to want row_number():
select t.*
from (select t.*, row_number() over (partition by product order by header_id desc) as seqnum
from t
) t
where seqnum = 1;
If that query is being used to generate the data, then just wrap it in a CTE.
This can also be accomplished by adding a self join in your where clause
and hdr.header_id = (
select max(hdr2.header_id)
from nlas_header hdr2
where hdr2.location_id = hdr.location_id
and hdr2.product_id = hdr.product_id)

FIFO Implementation in Inventory using SQL

This is basically an inventory project which tracks the "Stock In" and "Stock Out" of items through Purchase and sales respectively.
The inventory system follows FIFO Method (the items which are first purchased are always sold first). For example:
If we purchased Item A in months January, February and March
When a customer comes we give away items purchased during January
only when the January items are over we starts giving away February items and so on
So I have to show here the total stock in my hand and the split up so that I can see the total cost incurred.
Actual table data:
The result set I need to obtain:
My client insists that I should not use Cursor, so is there any other way of doing so?
As some comment already said a CTE can solve this
with cte as (
select item, wh, stock_in, stock_out, price, value
, row_number() over (partition by item, wh order by item, wh) as rank
from myTable)
select a.item, a.wh
, a.stock_in - coalesce(b.stock_out, 0) stock
, a.price
, a.value - coalesce(b.value, 0) value
from cte a
left join cte b on a.item = b.item and a.wh = b.wh and a.rank = b.rank - 1
where a.stock_in - coalesce(b.stock_out, 0) > 0
If the second "Item B" has the wrong price (the IN price is 25, the OUT is 35).
SQL 2008 fiddle
Just for fun, with sql server 2012 and the introduction of the LEAD and LAG function the same thing is possible in a somewhat easier way
with cte as (
select item, wh, stock_in
, coalesce(LEAD(stock_out)
OVER (partition by item, wh order by item, wh), 0) stock_out
, price, value
, coalesce(LEAD(value)
OVER (partition by item, wh order by item, wh), 0) value_out
from myTable)
select item
, wh
, (stock_in - stock_out) stock
, price
, (value - value_out) value
from cte
where (stock_in - stock_out) > 0
SQL2012 fiddle
Update
ATTENTION -> To use the two query before this point the data need to be in the correct order.
To have the details with more then one row per day you need something reliable to order the row with the same date, like a date column with time, an autoincremental ID or something down the same line, and it's not possible to use the query already written because they are based on the position of the data.
A better idea is to split the data in IN and OUT, order it by item, wh and data, and apply a rank on both data, like this:
SELECT d_in.item
, d_in.wh
, d_in.stock_in - coalesce(d_out.stock_out, 0) stock
, d_in.price
, d_in.value - coalesce(d_out.value, 0) value
FROM (SELECT item, wh, stock_in, price, value
, rank = row_number() OVER
(PARTITION BY item, wh ORDER BY item, wh, date)
FROM myTable
WHERE stock_out = 0) d_in
LEFT JOIN
(SELECT item, wh, stock_out, price, value
, rank = row_number() OVER
(PARTITION BY item, wh ORDER BY item, wh, date)
FROM myTable
WHERE stock_in = 0) d_out
ON d_in.item = d_out.item AND d_in.wh = d_out.wh
AND d_in.rank = d_out.rank
WHERE d_in.stock_in - coalesce(d_out.stock_out, 0) > 0
SQLFiddle
But this query is NOT completely reliable, the order of data in the same order group is not stable.
I haven't change the query to recalculate the price if the IN.price is different from the OUT.price
If cursors aren't an option, a SQLCLR stored procedure might be. This way you could obtain the raw data into .net objects, manipulate / sort it using c# or vb.net and set the resulting data as the procedure's output. Not only this will give you what you want, it may even turn up being much easier than trying to do the same in pure T-SQL, depending on your programming background.

DB2 SQL Query to Identify what event occured prior to a particular event in a sequence

I have a table from our IVR that contains a unique call id, sequence number, event code, and event description. I would like to write a query that let's me know what was the event prior to a particular event.
Assuming all you have is the particular event's "unique call id":
SELECT *
FROM tbl
WHERE sequence_number = (
SELECT MAX(sequence_number)
FROM tbl
WHERE sequence_number = (
SELECT sequence_number FROM tbl WHERE unique_id = PARTICULAR_EVENT_UNIQUE_ID
)
);
If the sequence number of the particular event is known (instead or in addition to the unique call id), then the most inner select can be replaced in its entirety by that value.
Depending on what indexes exist on the table, a straightforward inner join may receive a better-performing access plan from the query optimizer.
SELECT n.call_id,
n.event_dt,
n.sequence_number,
p.call_id as prior_call_id,
p.event_id as prior_event_id,
p.event_dt as prior_event_dt,
p.sequence_number as prior_sequence_number
FROM daily_events n
INNER JOIN daily_events p
ON p.sequence_number = n.sequence_number - 1
WHERE n.event_id = '5047'
AND n.event_dt >= DATE( '01/06/2012' )
AND n.event_dt <= DATE( '01/07/2012' );
The query assumes that any event with a sequence number that differs by one is an appropriate match, and that the call_id doesn't also need to match. If that assumption is incorrect,
then add AND n.call_id = p.call_id to the ON clause of the join.
Assuming that the sequence number is sequential (ie. the next record always has a sequence number 1 greater than the current record), try:
select i.*
from ivr_table i
where exists
(select 1
from ivr_table ni
where i.sequence + 1 = ni.sequence and ni.event_code = '5047')
EDIT: select null in subquery replaced with select 1