t-sql Summing differences between timestamps - sql

I'm tracking machine state which can be 0,1 and 2,
and storing that data in sql table with time_stamp.
I have table in sql server with next fields:
id(int)
time_stamp(datetime)
machine_state(int)
Machine state is connected with machine condition:
machine_state =0 -machine stooped
machine_state =1-machine with alarm
machine_state =2-machine running
Now I want to calculate how long machine was in each state in each shift.
Shifts are
8:00-17:00
17:00-01:00
01:00-08:00.
My problem is how I can calculate time of each state of machine(sum_time_0, sum_time_1, sum_time_2) and group that times by the shift. I want to calculate time in seconds and then convert to minutes.
To have better picture I did export part of table
EXPORT_TABLE
id time_stamp machine_state
1623 6.10.2009 17:09:00 1
1624 6.10.2009 17:17:00 2
1625 6.10.2009 17:17:00 1
1626 6.10.2009 17:17:00 2
1627 6.10.2009 17:18:00 1
1628 6.10.2009 17:18:00 2
1629 6.10.2009 18:04:00 1
1630 6.10.2009 18:06:00 2
1631 6.10.2009 18:07:00 1
1632 6.10.2009 18:12:00 2
1633 6.10.2009 18:28:00 1
1634 6.10.2009 18:28:00 2
1635 6.10.2009 19:16:00 1
1636 6.10.2009 19:21:00 2
1637 6.10.2009 19:49:00 1
1638 6.10.2009 20:23:00 2
Any advice will help.
Thanks in advance.

You can join the next machine state for each row then group by the state and sum the difference in time...
create table #t(id int identity(1,1), ts datetime, ms tinyint);
insert into #t
select '6.10.2009 17:09:00', 1
union select '6.10.2009 17:17:00', 2
union select '6.10.2009 17:17:00', 1
union select '6.10.2009 17:17:00', 2
union select '6.10.2009 17:18:00', 1
union select '6.10.2009 17:18:00', 2
union select '6.10.2009 18:04:00', 1
union select '6.10.2009 18:06:00', 2
union select '6.10.2009 18:07:00', 1
union select '6.10.2009 18:12:00', 2
union select '6.10.2009 18:28:00', 1
union select '6.10.2009 18:28:00', 2
union select '6.10.2009 19:16:00', 1
union select '6.10.2009 19:21:00', 2
union select '6.10.2009 19:49:00', 1
union select '6.10.2009 20:23:00', 2
select
t.ms,
sum(datediff(mi, t.ts, tn.ts)) as total_mintues
from
#t t
inner join #t tn on
tn.id = (select top 1 t2.id
from #t t2
where t2.id > t.id and t2.ms <> t.ms
order by t2.id)
group by
t.ms
/*
ms total_mintues
1 54
2 140
*/
drop table #t

Here's an outline of how I'd do it. I am making some assumptions which may be invalid or not apply to your situation, so I'm not coding everything out.
First, I'd break the problem into chunks: calculate the data for one shift at a time. (I'm guessing you run this once a day, or maybe once a week.)
I would implement this as a stored procedure with two parameters:
#ShiftDate, specifying the date to be calculated (use the date portion only, ignore any time value)
#Shift, specifying which shift to analyze (1, 2, 3, as you defined)
Build two "full" datetimes, one for the start of the shift, one for the end. For example, if #ShiftDate = 'Oct 22, 2009' and #Shift = 2, you'd get
#ShiftStart = 'Oct 22, 2009 17:00:00'
#ShiftStop = 'Oct 23, 2009 1:00:00'
Create a temp table to hold the subset of the data that we'll be analyzing. Populate it like so:
Copy over all the data for between #ShiftStart and #ShiftStop
Do NOT include any data where consecutive (by time) entries have the same state. If any such data exists, discard all but the earliest entry.
(It looks like your data is generated this way--but do you want to assume the data will always be good?)
Add a column for a uniformly incrementing counter (1, 2, 3, etc.). It looks like you've already got this too, but again, you want to be sure here.
Next, check if entries are present for both #ShiftStart and #ShiftStop. If there are no such entries:
For #ShiftStart, create the entry and set machine_state to whatever the value from the most recent entry before #ShiftStart
For #ShiftStop, create the entry and set machine_state to, well anything, as we won't reference that value
In both cases, make sure you correctly configure the counter column (#ShiftStart's counter is one less than the earliest value, #ShiftStops' counter is one greater than the last value)
(The above is why you make it a temp table. If you can't load these dummy rows, you'll have to use procedural code to walk through the tables, which is the kind of procedural code that bogs down database servers.)
You need these entries to get the data for the time between the start of the shift and the first recorded entry within that shift, and ditto for the end of the shift.
At this point, items are ordered in time, with a uniformly incrementing counter column (1, 2, 3). Assuming all the above, the following query
should return the data you're looking for:
SELECT
et.machine_state
,sum(datediff(ss, et.time_stamp, thru.time_stamp)) TotalSeconds
,sum(datediff(ss, et.time_stamp, thru.time_stamp)) / 60 TotalMinutes
from #EXPORT_TABLE et
inner join #EXPORT_TABLE thru
on thru.id = et.id + 1
group by et.machine_state
order by et.machine_state
Notes:
This is written for MS SQL Server. Your language syntax may differ.
I have not tested this code. Any typos were intentionally included so that your final version will be superior to mine.
EXPORT_TABLE is the temporary table described above.
In MS SQL, dividing the sum of an integer by an integer will produce a truncated integer, meaning 59 seconds will turn into 0 minutes.
If you need better accuracy, dividing by 60.0 would produce a decimal value.
This is just a framework. I think you'd be able to exapnd this to whatever conditions you have to deal with.

You can use an exclusive join to find the previous row:
select
State = prev.ms,
MinutesInState = sum(datediff(mi, prev.ts, cur.ts))
from #t cur
inner join #t prev
on prev.id < cur.id
left join #t inbetween
on prev.id < inbetween.id
and inbetween.id < cur.id
where inbetween.id is null
group by prev.ms
The query then groups by machine state. The result differs from other answers here, I'm curious which one is right!
State MinutesInState
1 54
2 140
Here's the sample data I used:
declare #t table (id int identity(1,1), ts datetime, ms tinyint);
insert into #t
select '6.10.2009 17:09:00', 1
union select '6.10.2009 17:17:00', 2
union select '6.10.2009 17:17:00', 1
union select '6.10.2009 17:17:00', 2
union select '6.10.2009 17:18:00', 1
union select '6.10.2009 17:18:00', 2
union select '6.10.2009 18:04:00', 1
union select '6.10.2009 18:06:00', 2
union select '6.10.2009 18:07:00', 1
union select '6.10.2009 18:12:00', 2
union select '6.10.2009 18:28:00', 1
union select '6.10.2009 18:28:00', 2
union select '6.10.2009 19:16:00', 1
union select '6.10.2009 19:21:00', 2
union select '6.10.2009 19:49:00', 1
union select '6.10.2009 20:23:00', 2

If you just want quick and dirty, this will do:
select curr.*, prev.*
from EXPORT_TABLE curr
outer apply (
select top 1 * from EXPORT_TABLE prev
where curr.time_stamp > prev.time_stamp
order by time_stamp desc, id desc
) prev
And go from there.
But this method, and some of the similar methods on this page involving a non-equijoin, will not scale well with volume. To handle a high volume of data, we must use different techniques.
Your id appears sequential. Is it? This can be useful. If not, we should create one.
if object_id('tempdb..#pass1') is not null drop table #pass1
create table #pass1 (
id int
, time_stamp smalldatetime
, machine_state tinyint
, seqno int primary key -- this is important
)
insert #pass1
select
id
, time_stamp
, machine_state
, seqno = row_number() over (order by time_stamp, id)
from EXPORT_TABLE
Once we have a sequential id, we can equi-join on it:
if object_id('tempdb..#pass2') is not null drop table #pass2
create table #pass2 (
id int
, time_stamp smalldatetime
, machine_state tinyint
, seqno int primary key
, time_stamp_prev smalldatetime
)
insert #pass2
select
id
, time_stamp
, machine_state
, seqno
, time_stamp_prev = b.time_stamp
from #pass1 a
left join #pass1 b on a.seqno = b.seqno + 1
From here, your query should just about write itself. Look out for machine states that overlap a shift, though.
This method, though it looks expensive, will scale well with volume. You order the data once, and join once. If the id is sequential, you can skip the first step, make sure there is a clustered primary key on id, and join on id rather than seqno.
If you have a really high volume of data, you do this instead:
if object_id('tempdb..#export_table') is not null drop table #export_table
create table #pass1 (
id int
, time_stamp smalldatetime
, machine_state tinyint
, seqno int primary key -- ensures proper ordering for the UPDATE
, time_stamp_prev smalldatetime
)
insert #export_table (
id
, time_stamp
, machine_state
, seqno
)
select
id
, time_stamp
, machine_state
, seqno = row_number() over (order by time_stamp, id)
from EXPORT_TABLE
-- do some magic
declare #time_stamp smalldatetime
update #export_table set
time_stamp_prev = #time_stamp
, #time_stamp = time_stamp
This will out-perform all other methods. And if your id is in the right order (it does not have to be sequential, just in the right order), you can skip the first step and define a clustered index on id instead, if it's not already there.

You can do smth like this:
select t1.time_stamp time_start, t2.time_stamp time_finish, t1.machine_state
from EXPORT_TABLE t1, EXPORT_TABLE t2
where t2.time_stamp = (select min(time_stamp) from #table where time_stamp > t1.time_stamp)
This will return you the interval in one row, after that it's easy to calculate cumulative time for each state.
You can also look at this question. It seems to be almost similar to yours.

thanks on help.
I surprised how detail is the answer.
I will tests you solution and inform you about result.
Again I'm very surprised with detail answer.
I did test first part(to sum time of machine state 0, 1 i 2) and this is OK.
Now I will test rest part of the answer.
Biggest problem for me was time splitting during shift transition.
example:
'6.10.2009 16:30:00', 1
'6.10.2009 17:30:00', 2
'6.10.2009 19:16:00', 1
In time between 16:30 and 17:00 machine was in state 1 and that time I have to add to shift 1, and time between 17:00 and 17:30 machine was in state 1 and that time I have to add to shift 2.
But first I will go through you answer to see did you already make solution for this.
thanks again

CREATE PROCEDURE dbo.final #shiftdate datetime, #shift int
AS
BEGIN
DECLARE
#shiftstart as datetime ,
#shiftstop as datetime,
#date_m as varchar(33),
#timestart as char(8),
#smjena as int,
#ms_prev as int,
#t_rad as int,
#t_stop as int,
#t_alarm as int
if #shift = 1
begin
set #timestart = '08:00:00'
set #smjena=9
end
if #shift = 2
begin
set #timestart = '17:00:00'
set #smjena=8
end
if #shift = 3
begin
set #timestart = '01:00:00'
set #smjena=7
end
SELECT #date_m = convert(varchar, #shiftdate, 104) + ' ' + convert(varchar, #timestart, 114)
set #shiftstart = convert(datetime,#date_m,104)
select #shiftstop = dateadd(hh,#smjena,#shiftstart)
create table #t(id int identity(1,1), ts datetime, ms tinyint);
insert #t select time_stamp, stanje_stroja from perini where perini.time_stamp between #shiftstart and #shiftstop order by perini.time_stamp
if (select count(#t.id) from #t where #t.ts=#shiftstart)= 0
BEGIN
if (select count(perini.id) from perini where time_stamp < #shiftstart) > 0
begin
set #ms_prev = (select top 1 stanje_stroja from perini where time_stamp<#shiftstart order by time_stamp asc)
insert #t values (#shiftstart,#ms_prev)
end
end
if (select count(#t.id) from #t where #t.ts=#shiftstop)= 0
BEGIN
if (select count(perini.id) from perini where time_stamp > #shiftstop) > 0
begin
set #ms_prev = (select top 1 stanje_stroja from perini where time_stamp>#shiftstop order by time_stamp asc)
insert #t values (#shiftstop,#ms_prev)
end
end
select * into #t1 from #t where 1=2
insert into #t1 select ts, ms from #t order by ts
create table #t3(stanje int, trajanje int)
insert into #t3 select a.ms as stanje, convert(int,sum(datediff(ss,b.ts, a.ts))/60) as trajanje from
#t1 a left join #t1 b on a.id = b.id + 1
group by a.ms
set #t_rad = (select trajanje from #t3 where stanje = 2)
set #t_alarm = (select trajanje from #t3 where stanje = 1)
set #t_stop = (select trajanje from #t3 where stanje = 0)
insert into perini_smjene_new (smjena,t_rad, t_stop, t_alarm, time_stamp) values (#shift,#t_rad,#t_stop, #t_alarm, convert(datetime, #shiftdate, 103))
select * from #t3
END

Related

How do I make a Running Total? [duplicate]

Imagine the following table (called TestTable):
id somedate somevalue
-- -------- ---------
45 01/Jan/09 3
23 08/Jan/09 5
12 02/Feb/09 0
77 14/Feb/09 7
39 20/Feb/09 34
33 02/Mar/09 6
I would like a query that returns a running total in date order, like:
id somedate somevalue runningtotal
-- -------- --------- ------------
45 01/Jan/09 3 3
23 08/Jan/09 5 8
12 02/Feb/09 0 8
77 14/Feb/09 7 15
39 20/Feb/09 34 49
33 02/Mar/09 6 55
I know there are various ways of doing this in SQL Server 2000 / 2005 / 2008.
I am particularly interested in this sort of method that uses the aggregating-set-statement trick:
INSERT INTO #AnotherTbl(id, somedate, somevalue, runningtotal)
SELECT id, somedate, somevalue, null
FROM TestTable
ORDER BY somedate
DECLARE #RunningTotal int
SET #RunningTotal = 0
UPDATE #AnotherTbl
SET #RunningTotal = runningtotal = #RunningTotal + somevalue
FROM #AnotherTbl
... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.
But maybe there are other ways that people can suggest?
edit: Now with a SqlFiddle with the setup and the 'update trick' example above
Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947
The problem is that the SQL Server implementation of the Over clause is somewhat limited.
Oracle (and ANSI-SQL) allow you to do things like:
SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table
SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.
The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):
Full working sample:
drop table #t
create table #t ( ord int primary key, total int, running_total int)
insert #t(ord,total) values (2,20)
-- notice the malicious re-ordering
insert #t(ord,total) values (1,10)
insert #t(ord,total) values (3,10)
insert #t(ord,total) values (4,1)
declare #total int
set #total = 0
update #t set running_total = #total, #total = #total + total
select * from #t
order by ord
ord total running_total
----------- ----------- -------------
1 10 10
2 20 30
3 10 40
4 1 41
You asked for a benchmark this is the lowdown.
The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.
The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.
Bottom line, for production code I would go with the cursor.
Test data:
create table #t ( ord int primary key, total int, running_total int)
set nocount on
declare #i int
set #i = 0
begin tran
while #i < 10000
begin
insert #t (ord, total) values (#i, rand() * 100)
set #i = #i +1
end
commit
Test 1:
SELECT ord,total,
(SELECT SUM(total)
FROM #t b
WHERE b.ord <= a.ord) AS b
FROM #t a
-- CPU 11731, Reads 154934, Duration 11135
Test 2:
SELECT a.ord, a.total, SUM(b.total) AS RunningTotal
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord
-- CPU 16053, Reads 154935, Duration 4647
Test 3:
DECLARE #TotalTable table(ord int primary key, total int, running_total int)
DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord
OPEN forward_cursor
DECLARE #running_total int,
#ord int,
#total int
SET #running_total = 0
FETCH NEXT FROM forward_cursor INTO #ord, #total
WHILE (##FETCH_STATUS = 0)
BEGIN
SET #running_total = #running_total + #total
INSERT #TotalTable VALUES(#ord, #total, #running_total)
FETCH NEXT FROM forward_cursor INTO #ord, #total
END
CLOSE forward_cursor
DEALLOCATE forward_cursor
SELECT * FROM #TotalTable
-- CPU 359, Reads 30392, Duration 496
Test 4:
declare #total int
set #total = 0
update #t set running_total = #total, #total = #total + total
select * from #t
-- CPU 0, Reads 58, Duration 139
In SQL Server 2012 you can use SUM() with the OVER() clause.
select id,
somedate,
somevalue,
sum(somevalue) over(order by somedate rows unbounded preceding) as runningtotal
from TestTable
SQL Fiddle
While Sam Saffron did great work on it, he still didn't provide recursive common table expression code for this problem. And for us who working with SQL Server 2008 R2 and not Denali, it's still fastest way to get running total, it's about 10 times faster than cursor on my work computer for 100000 rows, and it's also inline query.
So, here it is (I'm supposing that there's an ord column in the table and it's sequential number without gaps, for fast processing there also should be unique constraint on this number):
;with
CTE_RunningTotal
as
(
select T.ord, T.total, T.total as running_total
from #t as T
where T.ord = 0
union all
select T.ord, T.total, T.total + C.running_total as running_total
from CTE_RunningTotal as C
inner join #t as T on T.ord = C.ord + 1
)
select C.ord, C.total, C.running_total
from CTE_RunningTotal as C
option (maxrecursion 0)
-- CPU 140, Reads 110014, Duration 132
sql fiddle demo
update
I also was curious about this update with variable or quirky update. So usually it works ok, but how we can be sure that it works every time? well, here's a little trick (found it here - http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - you just check current and previous ord and use 1/0 assignment in case they are different from what you expecting:
declare #total int, #ord int
select #total = 0, #ord = -1
update #t set
#total = #total + total,
#ord = case when ord <> #ord + 1 then 1/0 else ord end,
------------------------
running_total = #total
select * from #t
-- CPU 0, Reads 58, Duration 139
From what I've seen if you have proper clustered index/primary key on your table (in our case it would be index by ord_id) update will proceed in a linear way all the time (never encountered divide by zero). That said, it's up to you to decide if you want to use it in production code :)
update 2 I'm linking this answer, cause it includes some useful info about unreliability of the quirky update - nvarchar concatenation / index / nvarchar(max) inexplicable behavior.
The APPLY operator in SQL 2005 and higher works for this:
select
t.id ,
t.somedate ,
t.somevalue ,
rt.runningTotal
from TestTable t
cross apply (select sum(somevalue) as runningTotal
from TestTable
where somedate <= t.somedate
) as rt
order by t.somedate
SELECT TOP 25 amount,
(SELECT SUM(amount)
FROM time_detail b
WHERE b.time_detail_id <= a.time_detail_id) AS Total FROM time_detail a
You can also use the ROW_NUMBER() function and a temp table to create an arbitrary column to use in the comparison on the inner SELECT statement.
Use a correlated sub-query. Very simple, here you go:
SELECT
somedate,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
GROUP BY somedate
ORDER BY somedate
The code might not be exactly correct, but I'm sure that the idea is.
The GROUP BY is in case a date appears more than once, you would only want to see it once in the result set.
If you don't mind seeing repeating dates, or you want to see the original value and id, then the following is what you want:
SELECT
id,
somedate,
somevalue,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
ORDER BY somedate
You can also denormalize - store running totals in the same table:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Selects work much faster than any other solutions, but modifications may be slower
If you are using Sql server 2008 R2 above. Then, It would be shortest way to do;
Select id
,somedate
,somevalue,
LAG(runningtotal) OVER (ORDER BY somedate) + somevalue AS runningtotal
From TestTable
LAG is use to get previous row value. You can do google for more info.
[1]:
Assuming that windowing works on SQL Server 2008 like it does elsewhere (that I've tried), give this a go:
select testtable.*, sum(somevalue) over(order by somedate)
from testtable
order by somedate;
MSDN says it's available in SQL Server 2008 (and maybe 2005 as well?) but I don't have an instance to hand to try it.
EDIT: well, apparently SQL Server doesn't allow a window specification ("OVER(...)") without specifying "PARTITION BY" (dividing the result up into groups but not aggregating in quite the way GROUP BY does). Annoying-- the MSDN syntax reference suggests that its optional, but I only have SqlServer 2000 instances around at the moment.
The query I gave works in both Oracle 10.2.0.3.0 and PostgreSQL 8.4-beta. So tell MS to catch up ;)
Though best way is to get it done will be using a window function, it can also be done using a simple correlated sub-query.
Select id, someday, somevalue, (select sum(somevalue)
from testtable as t2
where t2.id = t1.id
and t2.someday <= t1.someday) as runningtotal
from testtable as t1
order by id,someday;
Here are 2 simple ways to calculate running total:
Approach 1: It can be written this way if your DBMS supports Analytical Functions
SELECT id
,somedate
,somevalue
,runningtotal = SUM(somevalue) OVER (ORDER BY somedate ASC)
FROM TestTable
Approach 2: You can make use of OUTER APPLY if your database version / DBMS itself does not support Analytical Functions
SELECT T.id
,T.somedate
,T.somevalue
,runningtotal = OA.runningtotal
FROM TestTable T
OUTER APPLY (
SELECT runningtotal = SUM(TI.somevalue)
FROM TestTable TI
WHERE TI.somedate <= S.somedate
) OA;
Note:- If you have to calculate the running total for different partitions separately, it can be done as posted here: Calculating Running totals across rows and grouping by ID
I believe a running total can be achieved using the simple INNER JOIN operation below.
SELECT
ROW_NUMBER() OVER (ORDER BY SomeDate) AS OrderID
,rt.*
INTO
#tmp
FROM
(
SELECT 45 AS ID, CAST('01-01-2009' AS DATETIME) AS SomeDate, 3 AS SomeValue
UNION ALL
SELECT 23, CAST('01-08-2009' AS DATETIME), 5
UNION ALL
SELECT 12, CAST('02-02-2009' AS DATETIME), 0
UNION ALL
SELECT 77, CAST('02-14-2009' AS DATETIME), 7
UNION ALL
SELECT 39, CAST('02-20-2009' AS DATETIME), 34
UNION ALL
SELECT 33, CAST('03-02-2009' AS DATETIME), 6
) rt
SELECT
t1.ID
,t1.SomeDate
,t1.SomeValue
,SUM(t2.SomeValue) AS RunningTotal
FROM
#tmp t1
JOIN #tmp t2
ON t2.OrderID <= t1.OrderID
GROUP BY
t1.OrderID
,t1.ID
,t1.SomeDate
,t1.SomeValue
ORDER BY
t1.OrderID
DROP TABLE #tmp
The following will produce the required results.
SELECT a.SomeDate,
a.SomeValue,
SUM(b.SomeValue) AS RunningTotal
FROM TestTable a
CROSS JOIN TestTable b
WHERE (b.SomeDate <= a.SomeDate)
GROUP BY a.SomeDate,a.SomeValue
ORDER BY a.SomeDate,a.SomeValue
Having a clustered index on SomeDate will greatly improve the performance.
Using join
Another variation is to use join. Now the query could look like:
SELECT a.id, a.value, SUM(b.Value)FROM RunTotalTestData a,
RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;
for more you can visite this link
http://askme.indianyouth.info/details/calculating-simple-running-totals-in-sql-server-12
BEGIN TRAN
CREATE TABLE #Table (_Id INT IDENTITY(1,1) ,id INT , somedate VARCHAR(100) , somevalue INT)
INSERT INTO #Table ( id , somedate , somevalue )
SELECT 45 , '01/Jan/09', 3 UNION ALL
SELECT 23 , '08/Jan/09', 5 UNION ALL
SELECT 12 , '02/Feb/09', 0 UNION ALL
SELECT 77 , '14/Feb/09', 7 UNION ALL
SELECT 39 , '20/Feb/09', 34 UNION ALL
SELECT 33 , '02/Mar/09', 6
;WITH CTE ( _Id, id , _somedate , _somevalue ,_totvalue ) AS
(
SELECT _Id , id , somedate , somevalue ,somevalue
FROM #Table WHERE _id = 1
UNION ALL
SELECT #Table._Id , #Table.id , somedate , somevalue , somevalue + _totvalue
FROM #Table,CTE
WHERE #Table._id > 1 AND CTE._Id = ( #Table._id-1 )
)
SELECT * FROM CTE
ROLLBACK TRAN

sql running total over duplicate values [duplicate]

Imagine the following table (called TestTable):
id somedate somevalue
-- -------- ---------
45 01/Jan/09 3
23 08/Jan/09 5
12 02/Feb/09 0
77 14/Feb/09 7
39 20/Feb/09 34
33 02/Mar/09 6
I would like a query that returns a running total in date order, like:
id somedate somevalue runningtotal
-- -------- --------- ------------
45 01/Jan/09 3 3
23 08/Jan/09 5 8
12 02/Feb/09 0 8
77 14/Feb/09 7 15
39 20/Feb/09 34 49
33 02/Mar/09 6 55
I know there are various ways of doing this in SQL Server 2000 / 2005 / 2008.
I am particularly interested in this sort of method that uses the aggregating-set-statement trick:
INSERT INTO #AnotherTbl(id, somedate, somevalue, runningtotal)
SELECT id, somedate, somevalue, null
FROM TestTable
ORDER BY somedate
DECLARE #RunningTotal int
SET #RunningTotal = 0
UPDATE #AnotherTbl
SET #RunningTotal = runningtotal = #RunningTotal + somevalue
FROM #AnotherTbl
... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.
But maybe there are other ways that people can suggest?
edit: Now with a SqlFiddle with the setup and the 'update trick' example above
Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947
The problem is that the SQL Server implementation of the Over clause is somewhat limited.
Oracle (and ANSI-SQL) allow you to do things like:
SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table
SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.
The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):
Full working sample:
drop table #t
create table #t ( ord int primary key, total int, running_total int)
insert #t(ord,total) values (2,20)
-- notice the malicious re-ordering
insert #t(ord,total) values (1,10)
insert #t(ord,total) values (3,10)
insert #t(ord,total) values (4,1)
declare #total int
set #total = 0
update #t set running_total = #total, #total = #total + total
select * from #t
order by ord
ord total running_total
----------- ----------- -------------
1 10 10
2 20 30
3 10 40
4 1 41
You asked for a benchmark this is the lowdown.
The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.
The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.
Bottom line, for production code I would go with the cursor.
Test data:
create table #t ( ord int primary key, total int, running_total int)
set nocount on
declare #i int
set #i = 0
begin tran
while #i < 10000
begin
insert #t (ord, total) values (#i, rand() * 100)
set #i = #i +1
end
commit
Test 1:
SELECT ord,total,
(SELECT SUM(total)
FROM #t b
WHERE b.ord <= a.ord) AS b
FROM #t a
-- CPU 11731, Reads 154934, Duration 11135
Test 2:
SELECT a.ord, a.total, SUM(b.total) AS RunningTotal
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord
-- CPU 16053, Reads 154935, Duration 4647
Test 3:
DECLARE #TotalTable table(ord int primary key, total int, running_total int)
DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord
OPEN forward_cursor
DECLARE #running_total int,
#ord int,
#total int
SET #running_total = 0
FETCH NEXT FROM forward_cursor INTO #ord, #total
WHILE (##FETCH_STATUS = 0)
BEGIN
SET #running_total = #running_total + #total
INSERT #TotalTable VALUES(#ord, #total, #running_total)
FETCH NEXT FROM forward_cursor INTO #ord, #total
END
CLOSE forward_cursor
DEALLOCATE forward_cursor
SELECT * FROM #TotalTable
-- CPU 359, Reads 30392, Duration 496
Test 4:
declare #total int
set #total = 0
update #t set running_total = #total, #total = #total + total
select * from #t
-- CPU 0, Reads 58, Duration 139
In SQL Server 2012 you can use SUM() with the OVER() clause.
select id,
somedate,
somevalue,
sum(somevalue) over(order by somedate rows unbounded preceding) as runningtotal
from TestTable
SQL Fiddle
While Sam Saffron did great work on it, he still didn't provide recursive common table expression code for this problem. And for us who working with SQL Server 2008 R2 and not Denali, it's still fastest way to get running total, it's about 10 times faster than cursor on my work computer for 100000 rows, and it's also inline query.
So, here it is (I'm supposing that there's an ord column in the table and it's sequential number without gaps, for fast processing there also should be unique constraint on this number):
;with
CTE_RunningTotal
as
(
select T.ord, T.total, T.total as running_total
from #t as T
where T.ord = 0
union all
select T.ord, T.total, T.total + C.running_total as running_total
from CTE_RunningTotal as C
inner join #t as T on T.ord = C.ord + 1
)
select C.ord, C.total, C.running_total
from CTE_RunningTotal as C
option (maxrecursion 0)
-- CPU 140, Reads 110014, Duration 132
sql fiddle demo
update
I also was curious about this update with variable or quirky update. So usually it works ok, but how we can be sure that it works every time? well, here's a little trick (found it here - http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - you just check current and previous ord and use 1/0 assignment in case they are different from what you expecting:
declare #total int, #ord int
select #total = 0, #ord = -1
update #t set
#total = #total + total,
#ord = case when ord <> #ord + 1 then 1/0 else ord end,
------------------------
running_total = #total
select * from #t
-- CPU 0, Reads 58, Duration 139
From what I've seen if you have proper clustered index/primary key on your table (in our case it would be index by ord_id) update will proceed in a linear way all the time (never encountered divide by zero). That said, it's up to you to decide if you want to use it in production code :)
update 2 I'm linking this answer, cause it includes some useful info about unreliability of the quirky update - nvarchar concatenation / index / nvarchar(max) inexplicable behavior.
The APPLY operator in SQL 2005 and higher works for this:
select
t.id ,
t.somedate ,
t.somevalue ,
rt.runningTotal
from TestTable t
cross apply (select sum(somevalue) as runningTotal
from TestTable
where somedate <= t.somedate
) as rt
order by t.somedate
SELECT TOP 25 amount,
(SELECT SUM(amount)
FROM time_detail b
WHERE b.time_detail_id <= a.time_detail_id) AS Total FROM time_detail a
You can also use the ROW_NUMBER() function and a temp table to create an arbitrary column to use in the comparison on the inner SELECT statement.
Use a correlated sub-query. Very simple, here you go:
SELECT
somedate,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
GROUP BY somedate
ORDER BY somedate
The code might not be exactly correct, but I'm sure that the idea is.
The GROUP BY is in case a date appears more than once, you would only want to see it once in the result set.
If you don't mind seeing repeating dates, or you want to see the original value and id, then the following is what you want:
SELECT
id,
somedate,
somevalue,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
ORDER BY somedate
You can also denormalize - store running totals in the same table:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Selects work much faster than any other solutions, but modifications may be slower
If you are using Sql server 2008 R2 above. Then, It would be shortest way to do;
Select id
,somedate
,somevalue,
LAG(runningtotal) OVER (ORDER BY somedate) + somevalue AS runningtotal
From TestTable
LAG is use to get previous row value. You can do google for more info.
[1]:
Assuming that windowing works on SQL Server 2008 like it does elsewhere (that I've tried), give this a go:
select testtable.*, sum(somevalue) over(order by somedate)
from testtable
order by somedate;
MSDN says it's available in SQL Server 2008 (and maybe 2005 as well?) but I don't have an instance to hand to try it.
EDIT: well, apparently SQL Server doesn't allow a window specification ("OVER(...)") without specifying "PARTITION BY" (dividing the result up into groups but not aggregating in quite the way GROUP BY does). Annoying-- the MSDN syntax reference suggests that its optional, but I only have SqlServer 2000 instances around at the moment.
The query I gave works in both Oracle 10.2.0.3.0 and PostgreSQL 8.4-beta. So tell MS to catch up ;)
Though best way is to get it done will be using a window function, it can also be done using a simple correlated sub-query.
Select id, someday, somevalue, (select sum(somevalue)
from testtable as t2
where t2.id = t1.id
and t2.someday <= t1.someday) as runningtotal
from testtable as t1
order by id,someday;
Here are 2 simple ways to calculate running total:
Approach 1: It can be written this way if your DBMS supports Analytical Functions
SELECT id
,somedate
,somevalue
,runningtotal = SUM(somevalue) OVER (ORDER BY somedate ASC)
FROM TestTable
Approach 2: You can make use of OUTER APPLY if your database version / DBMS itself does not support Analytical Functions
SELECT T.id
,T.somedate
,T.somevalue
,runningtotal = OA.runningtotal
FROM TestTable T
OUTER APPLY (
SELECT runningtotal = SUM(TI.somevalue)
FROM TestTable TI
WHERE TI.somedate <= S.somedate
) OA;
Note:- If you have to calculate the running total for different partitions separately, it can be done as posted here: Calculating Running totals across rows and grouping by ID
I believe a running total can be achieved using the simple INNER JOIN operation below.
SELECT
ROW_NUMBER() OVER (ORDER BY SomeDate) AS OrderID
,rt.*
INTO
#tmp
FROM
(
SELECT 45 AS ID, CAST('01-01-2009' AS DATETIME) AS SomeDate, 3 AS SomeValue
UNION ALL
SELECT 23, CAST('01-08-2009' AS DATETIME), 5
UNION ALL
SELECT 12, CAST('02-02-2009' AS DATETIME), 0
UNION ALL
SELECT 77, CAST('02-14-2009' AS DATETIME), 7
UNION ALL
SELECT 39, CAST('02-20-2009' AS DATETIME), 34
UNION ALL
SELECT 33, CAST('03-02-2009' AS DATETIME), 6
) rt
SELECT
t1.ID
,t1.SomeDate
,t1.SomeValue
,SUM(t2.SomeValue) AS RunningTotal
FROM
#tmp t1
JOIN #tmp t2
ON t2.OrderID <= t1.OrderID
GROUP BY
t1.OrderID
,t1.ID
,t1.SomeDate
,t1.SomeValue
ORDER BY
t1.OrderID
DROP TABLE #tmp
The following will produce the required results.
SELECT a.SomeDate,
a.SomeValue,
SUM(b.SomeValue) AS RunningTotal
FROM TestTable a
CROSS JOIN TestTable b
WHERE (b.SomeDate <= a.SomeDate)
GROUP BY a.SomeDate,a.SomeValue
ORDER BY a.SomeDate,a.SomeValue
Having a clustered index on SomeDate will greatly improve the performance.
Using join
Another variation is to use join. Now the query could look like:
SELECT a.id, a.value, SUM(b.Value)FROM RunTotalTestData a,
RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;
for more you can visite this link
http://askme.indianyouth.info/details/calculating-simple-running-totals-in-sql-server-12
BEGIN TRAN
CREATE TABLE #Table (_Id INT IDENTITY(1,1) ,id INT , somedate VARCHAR(100) , somevalue INT)
INSERT INTO #Table ( id , somedate , somevalue )
SELECT 45 , '01/Jan/09', 3 UNION ALL
SELECT 23 , '08/Jan/09', 5 UNION ALL
SELECT 12 , '02/Feb/09', 0 UNION ALL
SELECT 77 , '14/Feb/09', 7 UNION ALL
SELECT 39 , '20/Feb/09', 34 UNION ALL
SELECT 33 , '02/Mar/09', 6
;WITH CTE ( _Id, id , _somedate , _somevalue ,_totvalue ) AS
(
SELECT _Id , id , somedate , somevalue ,somevalue
FROM #Table WHERE _id = 1
UNION ALL
SELECT #Table._Id , #Table.id , somedate , somevalue , somevalue + _totvalue
FROM #Table,CTE
WHERE #Table._id > 1 AND CTE._Id = ( #Table._id-1 )
)
SELECT * FROM CTE
ROLLBACK TRAN

SQL for running count with start date, end date [duplicate]

Imagine the following table (called TestTable):
id somedate somevalue
-- -------- ---------
45 01/Jan/09 3
23 08/Jan/09 5
12 02/Feb/09 0
77 14/Feb/09 7
39 20/Feb/09 34
33 02/Mar/09 6
I would like a query that returns a running total in date order, like:
id somedate somevalue runningtotal
-- -------- --------- ------------
45 01/Jan/09 3 3
23 08/Jan/09 5 8
12 02/Feb/09 0 8
77 14/Feb/09 7 15
39 20/Feb/09 34 49
33 02/Mar/09 6 55
I know there are various ways of doing this in SQL Server 2000 / 2005 / 2008.
I am particularly interested in this sort of method that uses the aggregating-set-statement trick:
INSERT INTO #AnotherTbl(id, somedate, somevalue, runningtotal)
SELECT id, somedate, somevalue, null
FROM TestTable
ORDER BY somedate
DECLARE #RunningTotal int
SET #RunningTotal = 0
UPDATE #AnotherTbl
SET #RunningTotal = runningtotal = #RunningTotal + somevalue
FROM #AnotherTbl
... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.
But maybe there are other ways that people can suggest?
edit: Now with a SqlFiddle with the setup and the 'update trick' example above
Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947
The problem is that the SQL Server implementation of the Over clause is somewhat limited.
Oracle (and ANSI-SQL) allow you to do things like:
SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table
SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.
The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):
Full working sample:
drop table #t
create table #t ( ord int primary key, total int, running_total int)
insert #t(ord,total) values (2,20)
-- notice the malicious re-ordering
insert #t(ord,total) values (1,10)
insert #t(ord,total) values (3,10)
insert #t(ord,total) values (4,1)
declare #total int
set #total = 0
update #t set running_total = #total, #total = #total + total
select * from #t
order by ord
ord total running_total
----------- ----------- -------------
1 10 10
2 20 30
3 10 40
4 1 41
You asked for a benchmark this is the lowdown.
The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.
The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.
Bottom line, for production code I would go with the cursor.
Test data:
create table #t ( ord int primary key, total int, running_total int)
set nocount on
declare #i int
set #i = 0
begin tran
while #i < 10000
begin
insert #t (ord, total) values (#i, rand() * 100)
set #i = #i +1
end
commit
Test 1:
SELECT ord,total,
(SELECT SUM(total)
FROM #t b
WHERE b.ord <= a.ord) AS b
FROM #t a
-- CPU 11731, Reads 154934, Duration 11135
Test 2:
SELECT a.ord, a.total, SUM(b.total) AS RunningTotal
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord
-- CPU 16053, Reads 154935, Duration 4647
Test 3:
DECLARE #TotalTable table(ord int primary key, total int, running_total int)
DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord
OPEN forward_cursor
DECLARE #running_total int,
#ord int,
#total int
SET #running_total = 0
FETCH NEXT FROM forward_cursor INTO #ord, #total
WHILE (##FETCH_STATUS = 0)
BEGIN
SET #running_total = #running_total + #total
INSERT #TotalTable VALUES(#ord, #total, #running_total)
FETCH NEXT FROM forward_cursor INTO #ord, #total
END
CLOSE forward_cursor
DEALLOCATE forward_cursor
SELECT * FROM #TotalTable
-- CPU 359, Reads 30392, Duration 496
Test 4:
declare #total int
set #total = 0
update #t set running_total = #total, #total = #total + total
select * from #t
-- CPU 0, Reads 58, Duration 139
In SQL Server 2012 you can use SUM() with the OVER() clause.
select id,
somedate,
somevalue,
sum(somevalue) over(order by somedate rows unbounded preceding) as runningtotal
from TestTable
SQL Fiddle
While Sam Saffron did great work on it, he still didn't provide recursive common table expression code for this problem. And for us who working with SQL Server 2008 R2 and not Denali, it's still fastest way to get running total, it's about 10 times faster than cursor on my work computer for 100000 rows, and it's also inline query.
So, here it is (I'm supposing that there's an ord column in the table and it's sequential number without gaps, for fast processing there also should be unique constraint on this number):
;with
CTE_RunningTotal
as
(
select T.ord, T.total, T.total as running_total
from #t as T
where T.ord = 0
union all
select T.ord, T.total, T.total + C.running_total as running_total
from CTE_RunningTotal as C
inner join #t as T on T.ord = C.ord + 1
)
select C.ord, C.total, C.running_total
from CTE_RunningTotal as C
option (maxrecursion 0)
-- CPU 140, Reads 110014, Duration 132
sql fiddle demo
update
I also was curious about this update with variable or quirky update. So usually it works ok, but how we can be sure that it works every time? well, here's a little trick (found it here - http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - you just check current and previous ord and use 1/0 assignment in case they are different from what you expecting:
declare #total int, #ord int
select #total = 0, #ord = -1
update #t set
#total = #total + total,
#ord = case when ord <> #ord + 1 then 1/0 else ord end,
------------------------
running_total = #total
select * from #t
-- CPU 0, Reads 58, Duration 139
From what I've seen if you have proper clustered index/primary key on your table (in our case it would be index by ord_id) update will proceed in a linear way all the time (never encountered divide by zero). That said, it's up to you to decide if you want to use it in production code :)
update 2 I'm linking this answer, cause it includes some useful info about unreliability of the quirky update - nvarchar concatenation / index / nvarchar(max) inexplicable behavior.
The APPLY operator in SQL 2005 and higher works for this:
select
t.id ,
t.somedate ,
t.somevalue ,
rt.runningTotal
from TestTable t
cross apply (select sum(somevalue) as runningTotal
from TestTable
where somedate <= t.somedate
) as rt
order by t.somedate
SELECT TOP 25 amount,
(SELECT SUM(amount)
FROM time_detail b
WHERE b.time_detail_id <= a.time_detail_id) AS Total FROM time_detail a
You can also use the ROW_NUMBER() function and a temp table to create an arbitrary column to use in the comparison on the inner SELECT statement.
Use a correlated sub-query. Very simple, here you go:
SELECT
somedate,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
GROUP BY somedate
ORDER BY somedate
The code might not be exactly correct, but I'm sure that the idea is.
The GROUP BY is in case a date appears more than once, you would only want to see it once in the result set.
If you don't mind seeing repeating dates, or you want to see the original value and id, then the following is what you want:
SELECT
id,
somedate,
somevalue,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
ORDER BY somedate
You can also denormalize - store running totals in the same table:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
Selects work much faster than any other solutions, but modifications may be slower
If you are using Sql server 2008 R2 above. Then, It would be shortest way to do;
Select id
,somedate
,somevalue,
LAG(runningtotal) OVER (ORDER BY somedate) + somevalue AS runningtotal
From TestTable
LAG is use to get previous row value. You can do google for more info.
[1]:
Assuming that windowing works on SQL Server 2008 like it does elsewhere (that I've tried), give this a go:
select testtable.*, sum(somevalue) over(order by somedate)
from testtable
order by somedate;
MSDN says it's available in SQL Server 2008 (and maybe 2005 as well?) but I don't have an instance to hand to try it.
EDIT: well, apparently SQL Server doesn't allow a window specification ("OVER(...)") without specifying "PARTITION BY" (dividing the result up into groups but not aggregating in quite the way GROUP BY does). Annoying-- the MSDN syntax reference suggests that its optional, but I only have SqlServer 2000 instances around at the moment.
The query I gave works in both Oracle 10.2.0.3.0 and PostgreSQL 8.4-beta. So tell MS to catch up ;)
Though best way is to get it done will be using a window function, it can also be done using a simple correlated sub-query.
Select id, someday, somevalue, (select sum(somevalue)
from testtable as t2
where t2.id = t1.id
and t2.someday <= t1.someday) as runningtotal
from testtable as t1
order by id,someday;
Here are 2 simple ways to calculate running total:
Approach 1: It can be written this way if your DBMS supports Analytical Functions
SELECT id
,somedate
,somevalue
,runningtotal = SUM(somevalue) OVER (ORDER BY somedate ASC)
FROM TestTable
Approach 2: You can make use of OUTER APPLY if your database version / DBMS itself does not support Analytical Functions
SELECT T.id
,T.somedate
,T.somevalue
,runningtotal = OA.runningtotal
FROM TestTable T
OUTER APPLY (
SELECT runningtotal = SUM(TI.somevalue)
FROM TestTable TI
WHERE TI.somedate <= S.somedate
) OA;
Note:- If you have to calculate the running total for different partitions separately, it can be done as posted here: Calculating Running totals across rows and grouping by ID
I believe a running total can be achieved using the simple INNER JOIN operation below.
SELECT
ROW_NUMBER() OVER (ORDER BY SomeDate) AS OrderID
,rt.*
INTO
#tmp
FROM
(
SELECT 45 AS ID, CAST('01-01-2009' AS DATETIME) AS SomeDate, 3 AS SomeValue
UNION ALL
SELECT 23, CAST('01-08-2009' AS DATETIME), 5
UNION ALL
SELECT 12, CAST('02-02-2009' AS DATETIME), 0
UNION ALL
SELECT 77, CAST('02-14-2009' AS DATETIME), 7
UNION ALL
SELECT 39, CAST('02-20-2009' AS DATETIME), 34
UNION ALL
SELECT 33, CAST('03-02-2009' AS DATETIME), 6
) rt
SELECT
t1.ID
,t1.SomeDate
,t1.SomeValue
,SUM(t2.SomeValue) AS RunningTotal
FROM
#tmp t1
JOIN #tmp t2
ON t2.OrderID <= t1.OrderID
GROUP BY
t1.OrderID
,t1.ID
,t1.SomeDate
,t1.SomeValue
ORDER BY
t1.OrderID
DROP TABLE #tmp
The following will produce the required results.
SELECT a.SomeDate,
a.SomeValue,
SUM(b.SomeValue) AS RunningTotal
FROM TestTable a
CROSS JOIN TestTable b
WHERE (b.SomeDate <= a.SomeDate)
GROUP BY a.SomeDate,a.SomeValue
ORDER BY a.SomeDate,a.SomeValue
Having a clustered index on SomeDate will greatly improve the performance.
Using join
Another variation is to use join. Now the query could look like:
SELECT a.id, a.value, SUM(b.Value)FROM RunTotalTestData a,
RunTotalTestData b
WHERE b.id <= a.id
GROUP BY a.id, a.value
ORDER BY a.id;
for more you can visite this link
http://askme.indianyouth.info/details/calculating-simple-running-totals-in-sql-server-12
BEGIN TRAN
CREATE TABLE #Table (_Id INT IDENTITY(1,1) ,id INT , somedate VARCHAR(100) , somevalue INT)
INSERT INTO #Table ( id , somedate , somevalue )
SELECT 45 , '01/Jan/09', 3 UNION ALL
SELECT 23 , '08/Jan/09', 5 UNION ALL
SELECT 12 , '02/Feb/09', 0 UNION ALL
SELECT 77 , '14/Feb/09', 7 UNION ALL
SELECT 39 , '20/Feb/09', 34 UNION ALL
SELECT 33 , '02/Mar/09', 6
;WITH CTE ( _Id, id , _somedate , _somevalue ,_totvalue ) AS
(
SELECT _Id , id , somedate , somevalue ,somevalue
FROM #Table WHERE _id = 1
UNION ALL
SELECT #Table._Id , #Table.id , somedate , somevalue , somevalue + _totvalue
FROM #Table,CTE
WHERE #Table._id > 1 AND CTE._Id = ( #Table._id-1 )
)
SELECT * FROM CTE
ROLLBACK TRAN

SQL Server 2008 filling gaps with dimension

I have a data table as below
#data
---------------
Account AccountType
---------------
1 2
2 0
3 5
4 2
5 1
6 5
AccountType 2 is headers and 5 is totals. Meaning accounts of type 2 have to look after the next 1 or 0 to determin if its Dim value is 1 or 0. Totals of type 5 have to look up at nearest 1 or 0 to determin its Dim value. Accounts of type 1 or 0 have there type as Dim.
Accounts of type 2 appear as islands so its not enough to just check RowNumber + 1 and same goes for accounsts of type 5.
I have arrived at the following table using CTE's. But can't find a quick way to go from here to my final result of Account, AccountType, Dim for all accounts
T3
-------------------
StartRow EndRow AccountType Dim
-------------------
1 1 2 0
2 2 0 0
3 3 5 0
4 4 2 1
5 5 0 1
6 6 5 1
Below code is MS TSQL copy paste it all and see it run. The final join on the CTE select statement is extremly slow for even 500 rows it takes 30 sec. I have 100.000 rows i need to handle. I done a cursor based solution which do it in 10-20 sec thats workable and a fast recursive CTE solution that do it in 5 sec for 100.000 rows, but it dependent on the fragmentation of the #data table. I should add this is simplified the real problem have alot more dimension that need to be taking into account. But it will work the same for this simple problem.
Anyway is there a fast way to do this using joins or another set based solution.
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#data') IS NOT NULL
DROP TABLE #data
CREATE TABLE #data
(
Account INTEGER IDENTITY(1,1),
AccountType INTEGER,
)
BEGIN -- TEST DATA
DECLARE #Counter INTEGER = 0
DECLARE #MaxDataRows INTEGER = 50 -- Change here to check performance
DECLARE #Type INTEGER
WHILE(#Counter < #MaxDataRows)
BEGIN
SET #Type = CASE
WHEN #Counter % 10 < 3 THEN 2
WHEN #Counter % 10 >= 8 THEN 5
WHEN #Counter % 10 >= 3 THEN (CASE WHEN #Counter < #MaxDataRows / 2.0 THEN 0 ELSE 1 END )
ELSE 0
END
INSERT INTO #data VALUES(#Type)
SET #Counter = #Counter + 1
END
END -- TEST DATA END
;WITH groupIds_cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY AccountType ORDER BY Account) - Account AS GroupId
FROM #data
),
islandRanges_cte AS
(
SELECT
MIN(Account) AS StartRow,
MAX(Account) AS EndRow,
AccountType
FROM groupIds_cte
GROUP BY GroupId,AccountType
),
T3 AS
(
SELECT I.*, J.AccountType AS Dim
FROM islandRanges_cte I
INNER JOIN islandRanges_cte J
ON (I.EndRow + 1 = J.StartRow AND I.AccountType = 2)
UNION ALL
SELECT I.*, J.AccountType AS Dim
FROM islandRanges_cte I
INNER JOIN islandRanges_cte J
ON (I.StartRow - 1 = J.EndRow AND I.AccountType = 5)
UNION ALL
SELECT *, AccountType AS Dim
FROM islandRanges_cte
WHERE AccountType = 0 OR AccountType = 1
),
T4 AS
(
SELECT Account, Dim
FROM (
SELECT FlattenRow AS Account, StartRow, EndRow, Dim
FROM T3 I
CROSS APPLY (VALUES(StartRow),(EndRow)) newValues (FlattenRow)
) T
)
--SELECT * FROM T3 ORDER BY StartRow
--SELECT * FROM T4 ORDER BY Account
-- Final correct result but very very slow
SELECT D.Account, D.AccountType, I.Dim FROM T3 I
INNER JOIN #data D
ON D.Account BETWEEN I.StartRow AND I.EndRow
ORDER BY Account
EDIT with some time testing
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#data') IS NULL
CREATE TABLE #times
(
RecId INTEGER IDENTITY(1,1),
Batch INTEGER,
Method NVARCHAR(255),
MethodDescription NVARCHAR(255),
RunTime INTEGER
)
IF OBJECT_ID('tempdb..#batch') IS NULL
CREATE TABLE #batch
(
Batch INTEGER IDENTITY(1,1),
Bit BIT
)
INSERT INTO #batch VALUES(0)
IF OBJECT_ID('tempdb..#data') IS NOT NULL
DROP TABLE #data
CREATE TABLE #data
(
Account INTEGER
)
CREATE NONCLUSTERED INDEX data_account_index ON #data (Account)
IF OBJECT_ID('tempdb..#islands') IS NOT NULL
DROP TABLE #islands
CREATE TABLE #islands
(
AccountFrom INTEGER ,
AccountTo INTEGER,
Dim INTEGER,
)
CREATE NONCLUSTERED INDEX islands_from_index ON #islands (AccountFrom, AccountTo, Dim)
BEGIN -- TEST DATA
INSERT INTO #data
SELECT TOP 100000 ROW_NUMBER() OVER(ORDER BY t1.number) AS N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
INSERT INTO #islands
SELECT MIN(Account) AS Start, MAX(Account), Grp
FROM (SELECT *, NTILE(10) OVER (ORDER BY Account) AS Grp FROM #data) T
GROUP BY Grp ORDER BY Start
END -- TEST DATA END
--SELECT * FROM #data
--SELECT * FROM #islands
--PRINT CONVERT(varchar(20),DATEDIFF(MS,#RunDate,GETDATE()))+' ms Sub Query'
DECLARE #RunDate datetime
SET #RunDate=GETDATE()
SELECT Account, (SELECT Dim From #islands WHERE Account BETWEEN AccountFrom AND AccountTo) AS Dim
FROM #data
INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'subquery','',DATEDIFF(MS,#RunDate,GETDATE()))
SET #RunDate=GETDATE()
SELECT D.Account, V.Dim
FROM #data D
CROSS APPLY
(
SELECT Dim From #islands V
WHERE D.Account BETWEEN V.AccountFrom AND V.AccountTo
) V
INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'crossapply','',DATEDIFF(MS,#RunDate,GETDATE()))
SET #RunDate=GETDATE()
SELECT D.Account, I.Dim
FROM #data D
JOIN #islands I
ON D.Account BETWEEN I.AccountFrom AND I.AccountTo
INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'join','',DATEDIFF(MS,#RunDate,GETDATE()))
SET #RunDate=GETDATE()
;WITH cte AS
(
SELECT Account, AccountFrom, AccountTo, Dim, 1 AS Counting
FROM #islands
CROSS APPLY (VALUES(AccountFrom),(AccountTo)) V (Account)
UNION ALL
SELECT Account + 1 ,AccountFrom, AccountTo, Dim, Counting + 1
FROM cte
WHERE (Account + 1) > AccountFrom AND (Account + 1) < AccountTo
)
SELECT Account, Dim, Counting FROM cte OPTION(MAXRECURSION 32767)
INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'recursivecte','',DATEDIFF(MS,#RunDate,GETDATE()))
You can select from the #times table to see the run times :)
I think you want a join, but using an inequality rather than an equality:
select tt.id, tt.dim1, it.dim2
from TallyTable tt join
IslandsTable it
on tt.id between it."from" and it."to"
This works for the data that you provide in the question.
Here is another idea that might work. Here is the query:
select d.*,
(select top 1 AccountType from #data d2 where d2.Account > d.Account and d2.AccountType not in (2, 5)
) nextAccountType
from #data d
order by d.account;
I just ran this on 50,000 rows and this version took 17 seconds on my system. Changing the table to:
CREATE TABLE #data (
Account INTEGER IDENTITY(1,1) primary key,
AccountType INTEGER,
);
Has actually slowed it down to about 1:33 -- quite to my surprise. Perhaps one of these will help you.

Clearing prioritized overlapping ranges in SQL Server

This one is nasty complicated to solve.
I have a table containing date ranges, each date range has a priority. Highest priority means this date range is the most important.
Or in SQL
create table #ranges (Start int, Finish int, Priority int)
insert #ranges values (1 , 10, 0)
insert #ranges values (2 , 5 , 1)
insert #ranges values (3 , 4 , 2)
insert #ranges values (1 , 5 , 0)
insert #ranges values (200028, 308731, 0)
Start Finish Priority
----------- ----------- -----------
1 10 0
2 5 1
3 4 2
1 5 0
200028 308731 0
I would like to run a series of SQL queries on this table that will result in the table having no overlapping ranges, it is to take the highest priority ranges over the lower ones. Split off ranges as required, and get rid of duplicate ranges. It allows for gaps.
So the result should be:
Start Finish Priority
----------- ----------- -----------
1 2 0
2 3 1
3 4 2
4 5 1
5 10 0
200028 308731 0
Anyone care to give a shot at the SQL? I would also like it to be as efficient as possible.
This is most of the way there, possible improvement would be joining up adjacent ranges of the same priority. It's full of cool trickery.
select Start, cast(null as int) as Finish, cast(null as int) as Priority
into #processed
from #ranges
union
select Finish, NULL, NULL
from #ranges
update p
set Finish = (
select min(p1.Start)
from #processed p1
where p1.Start > p.Start
)
from #processed p
create clustered index idxStart on #processed(Start, Finish, Priority)
create index idxFinish on #processed(Finish, Start, Priority)
update p
set Priority =
(
select max(r.Priority)
from #ranges r
where
(
(r.Start <= p.Start and r.Finish > p.Start) or
(r.Start >= p.Start and r.Start < p.Finish)
)
)
from #processed p
delete from #processed
where Priority is null
select * from #processed
Here is something to get you started. It is helpful if you use a calendar table:
CREATE TABLE dbo.Calendar
(
dt SMALLDATETIME NOT NULL
PRIMARY KEY CLUSTERED
)
GO
SET NOCOUNT ON
DECLARE #dt SMALLDATETIME
SET #dt = '20000101'
WHILE #dt < '20200101'
BEGIN
INSERT dbo.Calendar(dt) SELECT #dt
SET #dt = #dt + 1
END
GO
Code to setup the problem:
create table #ranges (Start DateTime NOT NULL, Finish DateTime NOT NULL, Priority int NOT NULL)
create table #processed (dt DateTime NOT NULL, Priority int NOT NULL)
ALTER TABLE #ranges ADD PRIMARY KEY (Start,Finish, Priority)
ALTER TABLE #processed ADD PRIMARY KEY (dt)
declare #day0 datetime,
#day1 datetime,
#day2 datetime,
#day3 datetime,
#day4 datetime,
#day5 datetime
select #day0 = '2000-01-01',
#day1 = #day0 + 1,
#day2 = #day1 + 1,
#day3 = #day2 + 1,
#day4 = #day3 + 1,
#day5 = #day4 + 1
insert #ranges values (#day0, #day5, 0)
insert #ranges values (#day1, #day4, 1)
insert #ranges values (#day2, #day3, 2)
insert #ranges values (#day1, #day4, 0)
Actual solution:
DECLARE #start datetime, #finish datetime, #priority int
WHILE 1=1 BEGIN
SELECT TOP 1 #start = start, #finish = finish, #priority = priority
FROM #ranges
ORDER BY priority DESC, start, finish
IF ##ROWCOUNT = 0
BREAK
INSERT INTO #processed (dt, priority)
SELECT dt, #priority FROM calendar
WHERE dt BETWEEN #start and #finish
AND NOT EXISTS (SELECT * FROM #processed WHERE dt = calendar.dt)
DELETE FROM #ranges WHERE #start=start AND #finish=finish AND #priority=priority
END
Results: SELECT * FROM #processed
dt Priority
----------------------- -----------
2000-01-01 00:00:00.000 0
2000-01-02 00:00:00.000 1
2000-01-03 00:00:00.000 2
2000-01-04 00:00:00.000 2
2000-01-05 00:00:00.000 1
2000-01-06 00:00:00.000 0
The solution is not in the exact same format, but the idea is there.
I'm a little confused about what you want to end up with. Is this the same as simply having a set of dates where one range continues until the next one starts (in which case you don't really need the Finish date, do you?)
Or can a range Finish and there's a gap until the next one starts sometimes?
If the range Start and Finish are explicitly set, then I'd be inclined to leave both, but have the logic to apply the higher priority during the overlap. I'd suspect that if dates start getting adjusted, you'll eventually need to roll back a range that got shaved, and the original setting will be gone.
And you'll never be able to explain "how it got that way".
Do you want simply a table with a row for each date, including its priority value? Then when you have a new rule, you can bump the dates that would be trumped by the new rule?
I did a medical office scheduling app once that started with work/vacation/etc. requests with range-type data (plus a default work-week template.) Once I figured out to store the active schedule info as user/date/timerange records, things fell into place a lot more easily. YMMV.
This can be done in 1 SQL (i first made the query in Oracle using lag and lead, but since MSSQL doesn't support those functions i rewrote the query using row_number. I'm not sure if the result is MSSQL compliant, but it should be very close):
with x as (
select rdate rdate
, row_number() over (order by rdate) rn
from (
select start rdate
from ranges
union
select finish rdate
from ranges
)
)
select d.begin
, d.end
, max(r.priority)
from (
select begin.rdate begin
, end.rdate end
from x begin
, x end
where begin.rn = end.rn - 1
) d
, ranges r
where r.start <= d.begin
and r.finish >= d.end
and d.begin <> d.end
group by d.begin
, d.end
order by 1, 2
I first made a table (x) with all dates. Then I turned this into buckets by joining x with itself and taking 2 following rows. After this I linked all the possible priorities with the result. By taking the max(priority) I get the requested result.