Running Total Using LAG Function - sql

I wonder if someone could help me calculate a running total.
I am converting this from an existing excel solution so i know what i am aiming for.
I am trying to use LAG to get the values from the previous row but the calculation is not matching my target. I think i need to use the result from the previous row in the lag column but that doesn't look possible.
Any help appreciated.
use tempdb;
--Create Temp Table
IF OBJECT_ID('tempdb..#WareHouseData') IS NOT NULL DROP TABLE #WareHouseData
CREATE TABLE #WareHouseData
(
ItemId INT,
DateID INT,
OpenningWareHouseUnits INT,
FcastSales INT,
GoodsIncoming INT,
TargetRunningStock INT
);
--Fill It With example Data
--OpenningWareHouseUnits only exists in the first week
--Fcast sales can be in any week though normally all weeks
--Goods Incoming can be in any weeks
INSERT INTO #WareHouseData
([ItemId],[DateID],[OpenningWareHouseUnits],[FcastSales],[GoodsIncoming],[TargetRunningStock])
VALUES
(987654,201450,200,10,NULL,190),
(987654,201451,NULL,20,NULL,170),
(987654,201452,NULL,30,NULL,140),
(987654,201501,NULL,20,NULL,120),
(987654,201502,NULL,10,NULL,110),
(987654,201503,NULL,50,NULL,60),
(987654,201504,NULL,60,NULL,0),
(987654,201505,NULL,70,100,30),
(987654,201506,NULL,70,80,40),
(987654,201507,NULL,80,100,60),
(987654,201508,NULL,30,NULL,30),
(987654,201509,NULL,20,NULL,10),
(987654,201510,NULL,20,NULL,0),
(123456,201450,300,50,NULL,250),
(123456,201451,NULL,60,NULL,190),
(123456,201452,NULL,70,100,220),
(123456,201501,NULL,80,NULL,140),
(123456,201502,NULL,100,100,140),
(123456,201503,NULL,105,NULL,35),
(123456,201504,NULL,100,100,35),
(123456,201505,NULL,95,NULL,0),
(123456,201506,NULL,30,100,70),
(123456,201507,NULL,20,NULL,50),
(123456,201508,NULL,5,NULL,45),
(123456,201509,NULL,5,NULL,40),
(123456,201510,NULL,5,NULL,35),
(369258,201450,1000,100,NULL,900),
(369258,201451,NULL,100,NULL,800),
(369258,201452,NULL,100,NULL,700),
(369258,201501,NULL,100,NULL,600),
(369258,201502,NULL,100,NULL,500),
(369258,201503,NULL,100,NULL,400),
(369258,201504,NULL,100,NULL,300),
(369258,201505,NULL,100,NULL,200),
(369258,201506,NULL,100,NULL,100),
(369258,201507,NULL,100,500,500),
(369258,201508,NULL,100,NULL,400),
(369258,201509,NULL,100,NULL,300),
(369258,201510,NULL,100,NULL,200);
;
--Match The Target Runing Stock Total
--I need to match the TargetRunningStock Totals
--This can be recreated in excel by pasting the columns
--{ItemId DateID OpenningWareHouseUnits FcastSales GoodsIncoming}
--Into cell A1 with headers, and pasting this formula
-- =IF(C2="",IF((F1-D2+E2)<0,0,(F1-D2+E2)),(C2-D2+E2)) into cell F2
SELECT w.ItemId
, w.DateID
, w.OpenningWareHouseUnits
, w.FcastSales
, w.GoodsIncoming
, w.TargetRunningStock
, CASE WHEN w.OpenningWareHouseUnits IS NOT NULL
THEN (ISNULL(w.OpenningWareHouseUnits,0) - ISNULL(w.FcastSales,0) + ISNULL(w.GoodsIncoming,0))
ELSE CASE WHEN ((((LAG(ISNULL(w.OpenningWareHouseUnits,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID))-
(LAG(ISNULL(w.FcastSales,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID)) +
(LAG(ISNULL(w.GoodsIncoming,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID)))) -
ISNULL(w.FcastSales,0) + ISNULL(w.GoodsIncoming,0)) < 0
THEN 0
ELSE ((((LAG(ISNULL(w.OpenningWareHouseUnits,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID))-
(LAG(ISNULL(w.FcastSales,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID)) +
(LAG(ISNULL(w.GoodsIncoming,0),1) OVER (PARTITION BY w.ItemId ORDER BY w.ItemId,w.DateID)))) -
ISNULL(w.FcastSales,0) + ISNULL(w.GoodsIncoming,0))
END
END CalculatedRunningStock
FROM #WareHouseData w
ORDER BY w.ItemId
, w.DateID

Ignoring most of the calculation logic for simplicity (and time), you almost certainly need to sum() over (partition by ... order by...).
select ItemId, DateId, TargetRunningStock,
sum(TargetRunningStock) over (partition by itemid order by dateid)
from WarehouseData
order by ItemId, DateId;
ItemId DateId TargetRunningStock Sum
--
123456 201450 250 250
123456 201451 190 440
123456 201452 220 660
...
987654 201507 60 920
987654 201508 30 950
987654 201509 10 960
987654 201510 0 960
Since you're trying to reproduce the results from a spreadsheet, you might need to wrap something like this around some calculated columns that use lag(). I didn't look that deeply into your spreadsheet logic.

The basic syntax for a running sum is to use an order by in the partition clause for the sum() window function:
SELECT w.ItemId, w.DateID, w.OpenningWareHouseUnits, w.FcastSales,
w.GoodsIncoming, w.TargetRunningStock,
SUM(OpenningWareHouseUnits) OVER (PARTITION BY w.ItemId, w.DateId)
FROM #WareHouseData w
ORDER BY w.ItemId, w.DateID ;
I am a little unclear how to apply this to your formula. Sample data and desired results would be a big help.

Related

how can you use rowid in bigquery to get the first value of your dataset by date and put all other values as 0 for a given day

I have a dataset of 3 columns; date, sales and new_sales.
What i am trying to do in bigquery is for a given date, grab the first sales value and populate this into a new column called new_sales whilst leaving the rest of the days for that given date a value of 0.
How would i go about creating this query in bigquery?
You can use row_number() - but you nede a column that defines the ordering of rows having the same date - I assumed id:
select t.*,
case when row_number() over(partition by date order by id) = 1 then sales end as new_sales
from mytable t
Here is an example i made earlier - it should work for you:
http://sqlfiddle.com/#!17/5c48e/8/0
Although this answer assumes that your sales values stay consistent on the dates and do not change, if they do change e.g. 12/10/2020 has two different dates then you would need to order by date.
my code is below:
CREATE TABLE links (
date_item varchar(255),
sales INT
);
INSERT INTO links (date_item, sales)
VALUES('12/10/2020',5),
('12/10/2020',5),
('12/10/2020',5),
('13/10/2020',7),
('13/10/2020',7),
('13/10/2020',7),
('13/10/2020',7),
('13/10/2020',7),
('13/10/2020',7),
('13/10/2020',7),
('14/10/2020',3),
('14/10/2020',3),
('14/10/2020',3);
select t.*,
case when ROW_NUMBER () OVER (partition BY date_item) =1 then sales else 0 end as new_sales
from links as t

Group Duplicates in different results depending on location in data set

We would like to see how long a call has been with a department in our ticket system, we cannot use the min and max date from the call as the call can go to one department more than once:
A call can be with support , goes to branches and then come back to support so we cannot use min and max by group as it will show that the call has been with support the entire life cycle of the call.
I have a result that brings back the same information but for different times, I would like to group these into their own result
I have tried to use ranking but this didn't resolve the problem as the same rank applies to the value even when it is further down in the result set
select
min(update_time), max(update_time) ,assigned_group,version,update_time,
datediff(HOUR,min(update_time), max(Update_time)) as difference ,
dense_rank() Over (partition by assigned_group order by version ) as
pDenserank,
rank() Over (partition by assigned_group order by version) as prank,
dense_rank() Over (order by assigned_group) as denserank,
rank() Over (order by assigned_group) as rank,
assign_counter
from service_req_history
where id = 405012
group by version, assign_counter,update_time ,
assigned_group,version,update_time
order by assign_counter]
Current Result Set
I would like to see the following Results: Please see attached file
Min Update Time Max Update Time assigned_group Days with Department
2019/07/19 16:28 2019/07/22 09:01 Support 3
2019/07/22 11:32 2019/08/26 13:25 Branches 4
2019/08/26 15:44 2019/08/28 11:22 Support 2
2019/08/28 11:47 2019/08/28 15:32 Technical 0
Expected result Set
Your input would be highly appreciated, thanking you in advance.
Regards Charl
To start with you might want to do something as below:
SELECT MIN(update_time) 'Min Update Time'
,MAX(update_time) 'Max Update Time'
,assigned_group
,SUM(Call_Duration)/60.0/24 AS 'Days with Department'
FROM (
SELECT LAG(update_time) OVER(ORDER BY update_time ASC) prev_update_time
,update_time
,DATEDIFF(MINUTE, LAG(update_time) OVER(ORDER BY update_time ASC), Update_time) AS 'Call_Duration'
,assigned_group
FROM service_req_history
WHERE id = 405012
) AS CallDurationSet
GROUP BY assigned_group
To get other id's, you may want to remove the WHERE clause and add "id" column in the GROUP BY.

Can we modify the previous row and use it in current row in a SQL query for a list?

I've looked around and found a few posts with LAG() and running total type queries, but none seem to fit what I'm looking for. Maybe i'm not using the correct terms in my search or maybe I might be over complicating the situation. Hope someone could help me out.
But what I'm looking to do is to take the previous result and multiple it by the current row for a range of dates. The starting is always some base number lets do 10 to keep it simple. The values will be float, but i kept it to round numbers here to better explain my inquiry.
The first is showing the calculation part and the 2nd table below is showing what the result should look like in the end.
date val1 calc_result
20120930 null 10
20121031 2 10*2=20
20121130 3 20*3=60
20121231 1 60*1=60
20130131 2 60*2=120
20130228 1 120*1=120
The query would return
20120930 10
20121031 20
20121130 60
20121231 60
20130131 120
20130228 120
I'm trying to see if this can be done in a query type solution or would a PL/SQL table/cursors need to be used?
Any help would be appreciated.
You can do this with a recursive CTE:
with dates as (
select t.*, row_number() over (order by date) as seqnum
from t
),
cte as (
select t.date, t.val, 10 as calc_result
from dates t
where t.seqnum = 1
union all
select t.date, t.val, cte.calc_result * t.val
from cte join
dates t
on t.seqnum = cte.seqnum + 1
)
select cte.date, cte.calc_result
from cte
order by cte.date;
This is calculating a cumulative product. You can do it with some exponential arithmetic. Replace 10 in the query with the desired start value.
select date,val1
,case when row_number() over(order by date) = 1 then 10 --set start value for first row
else 10*exp(sum(ln(val1)) over(order by date)) end as res
from tbl

Calculate Sum From Moving 4 Rows in SQL

I've have the following data.
WM_Week POS_Store_Count POS_Qty POS_Sales POS_Cost
------ --------------- ------ -------- --------
201541 3965 77722 153904.67 102593.04
201542 3952 77866 154219.66 102783.12
201543 3951 70690 139967.06 94724.60
201544 3958 70773 140131.41 95543.55
201545 3958 76623 151739.31 103441.05
201546 3956 73236 145016.54 98868.60
201547 3939 64317 127368.62 86827.95
201548 3927 60762 120309.32 82028.70
I need to write a SQL query to get the last four weeks of data, and their last four weeks summed for each of the following columns: POS_Store_Count,POS_Qty,POS_Sales, and POS_Cost.
For example, if I wanted 201548's data it would contain 201548, 201547, 201546, and 201545's.
The sum of 201547 would contain 201547, 201546, 201545, and 201544.
The query should return 4 rows when ran successfully.
How would I formulate a recursive query to do this? Is there something easier than recursive to do this?
Edit: The version is Azure Sql DW with version number 12.0.2000.
Edit2: The four rows that should be returned would have the sum of the columns from itself and it's three earlier weeks.
For example, if I wanted the figures for 201548 it would return the following:
WM_Week POS_Store_Count POS_Qty POS_Sales POS_Cost
------ --------------- ------- -------- --------
201548 15780 274938 544433.79 371166.3
Which is the sum of the four (non-identity) columns from 201548, 201547, 201546, and 201545.
Pretty sure this will get you what you want.. Im using cross apply after ordering the data to apply the SUMS
Create Table #WeeklyData (WM_Week Int, POS_Store_Count Int, POS_Qty Int, POS_Sales Money, POS_Cost Money)
Insert #WeeklyData Values
(201541,3965,77722,153904.67,102593.04),
(201542,3952,77866,154219.66,102783.12),
(201543,3951,70690,139967.06,94724.6),
(201544,3958,70773,140131.41,95543.55),
(201545,3958,76623,151739.31,103441.05),
(201546,3956,73236,145016.54,98868.6),
(201547,3939,64317,127368.62,86827.95),
(201548,3927,60762,120309.32,82028.7)
DECLARE #StartWeek INT = 201548;
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY [WM_Week] DESC) rn
FROM #WeeklyData
WHERE WM_Week BETWEEN #StartWeek - 9 AND #StartWeek
)
SELECT *
FROM cte c1
CROSS APPLY (SELECT SUM(POS_Store_Count) POS_Store_Count_SUM,
SUM(POS_Qty) POS_Qty_SUM,
SUM(POS_Sales) POS_Sales_SUM,
SUM(POS_Cost) POS_Cost_SUM
FROM cte c2
WHERE c2.rn BETWEEN c1.rn AND (c1.rn + 3)
) ca
WHERE c1.rn <= 4
You can use SUM() in combination with the OVER Clause
Something like:
SELECT WM_Week.
, SUM(POS_Store_Count) OVER (ORDER BY WM_Week ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM Table
You should be able to use a SQL window function for this.
Add a column to your query like the following:
SUM(POS_Sales) OVER(
ORDER BY WM_Week
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
) AS POS_Sales_4_Weeks
If I understand correctly, you don't want to return 4 rows, but rather 4 summed columns for each group? If so, here's one option:
select max(WM_Week) as WM_Week,
sum(POS_Store_Count),
sum(POS_Qty),
sum(POS_Sales),
sum(POS_Cost)
from (select top 4 *
from yourtable
where wm_week <= 201548
order by wm_week desc) t
This uses a subquery with top to get the 4 rows you want to aggregate based on the where criteria and order by clause.
Here is a condensed fiddle demonstrating the example (sorry fiddle isn't supporting sql server right now, so the syntax is slightly off):

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate