sql lowest running balance in a group - sql

I've been trying for days to solve this problem to no solution.
I want to get the lowest running balance in a group.
Here is a sample data
The running balance is imaginary and is not part of the table.
the running balance is also computed dynamically.
the problem is I want to get the lowest running balance in a Specific month (January)
so the output should be 150 for memberid 10001 and 175 for memberid 10002 as highlighted in the image.
my desired out put should be
memberid | balance
10001 | 150
10002 | 175
Is that possible using sql query only?
PS. Using c# to compute lowest running balance is very slow since I have more than 600,000 records in my table.
I've updated the question.

The answer provided by Mihir Shah gave me the idea how solve my problem.
His answer takes to much time to process making it as slow as my computation on my c# program because his code loops on every record.
Here is my answer to get the minimum lowest value in a specific group (specific month) with a running value or running total without sacrificing a lot of performance.
with IniMonth1 as
(
select a.memberid, a.iniDeposit, a.iniWithdrawal,
(cast(a.iniDeposit as decimal(10,2)) - cast(a.iniWithdrawal as decimal(10,2))) as RunningTotal
from
(
select b.memberid, sum(b.depositamt) as iniDeposit, sum(b.withdrawalamt) as iniWithdrawal
from savings b
where trdate < '01/01/2016'
group by b.memberid
) a /*gets all the memberid, sum of deposit amount and withdrawal amt from the beginning of the savings before the specific month */
where cast(a.iniDeposit as decimal(10,2)) - cast(a.iniWithdrawal as decimal(10,2)) > 0 /*filters zero savings */
)
,DetailMonth1 as
(
select a.memberid, a.depositamt,a.withdrawalamt,
(cast(a.depositamt as decimal(10,2)) - cast(a.withdrawalamt as decimal(10,2))) as totalBal,
Row_Number() Over(Partition By a.memberid Order By a.trdate Asc) RowID
from savings a
where
a.trdate >= '01/01/2016'
and
a.trdate <= '01/31/2016'
and (a.depositamt<>0 or a.withdrawalamt<>0)
) /* gets all record within the specific month and gives a no of row as an id for the running value in the next procedure*/
,ComputedDetailMonth1 as
(
select a.memberid, min(a.runningbalance) as MinRunningBal
from
(
select a.rowid, a.memberid, a.totalbal,
(
sum(b.totalbal) +
(case
when c.runningtotal is null then 0
else c.runningtotal
end)
)as runningbalance , c.runningtotal as oldbalance
from DetailMonth1 a
inner join DetailMonth1 b
on b.rowid<=a.rowid
and a.memberid=b.memberid
left join IniMonth1 c
on a.memberid=c.memberid
group by a.rowid,a.memberid,a.totalbal,c.runningtotal
) a
group by a.memberid
) /* the loop is only for the records of the specific month only making it much faster */
/* this gets the running balance of specific month ONLY and ADD the sum total of IniMonth1 using join to get the running balance from the beginning of savings to the specific month */
/* I then get the minimum of the output using the min function*/
, OldBalanceWithNoNewSavingsMonth1 as
(
select a.memberid,a.RunningTotal
from
IniMonth1 a
left join
DetailMonth1 b
on a.memberid = b.memberid
where b.totalbal is null
)/*this gets all the savings that is not zero and has no transaction in the specific month making and this will become the default value as the lowest value if the member has no transaction in the specific month. */
,finalComputedMonth1 as
(
select a.memberid,a.runningTotal as MinRunTotal from OldBalanceWithNoNewSavingsMonth1 a
union
select b.memberid,b.MinRunningBal from ComputedDetailMonth1 b
)/*the record with minimum running total with clients that has a transaction in the specific month Unions with the members with no current transaction in the specific month*/
select * from finalComputedMonth1 order by memberid /* display the final output */
I have more than 600k savings record on my savings table
Surprisingly the performance of this code is very efficient.
It takes almost 2hr using my c# program to manually compute every record of all the members.
This code makes only 2 secs and at most 9 secs just to compute everything.
i Just display to c# for another 2secs.
The output of this code was tested and compared with my computation using my c# program.

May be below one is help you
Set Nocount On;
Declare #CashFlow Table
(
savingsid Varchar(50)
,memberid Int
,trdate Date
,deposit Decimal(18,2)
,withdrawal Decimal(18,2)
)
Insert Into #CashFlow(savingsid,memberid,trdate,deposit,withdrawal) Values
('10001-0002',10001,'01/01/2015',1000,0)
,('10001-0003',10001,'01/07/2015',25,0)
,('10001-0004',10001,'01/13/2015',25,0)
,('10001-0005',10001,'01/19/2015',0,900)
,('10001-0006',10001,'01/25/2015',25,0)
,('10001-0007',10001,'01/31/2015',25,0)
,('10001-0008',10001,'02/06/2015',25,0)
,('10001-0009',10001,'02/12/2015',25,0)
,('10001-0010',10001,'02/18/2015',0,200)
,('10002-0001',10002,'01/01/2015',500,0)
,('10002-0002',10002,'01/07/2015',25,0)
,('10002-0003',10002,'01/13/2015',0,200)
,('10002-0004',10002,'01/19/2015',25,0)
,('10002-0005',10002,'01/25/2015',25,0)
,('10002-0006',10002,'01/31/2015',0,200)
,('10002-0007',10002,'02/06/2015',25,0)
,('10002-0008',10002,'02/12/2015',25,0)
,('10002-0009',10002,'02/12/2015',0,200)
;With TrialBalance As
(
Select Row_Number() Over(Partition By cf.memberid Order By cf.trdate Asc) RowNum
,cf.memberid
,cf.deposit
,cf.withdrawal
,cf.trdate
From #CashFlow As cf
)
,RunningBalance As
(
Select tb.RowNum
,tb.memberid
,tb.deposit
,tb.withdrawal
,tb.trdate
From TrialBalance As tb
Where tb.RowNum = 1
Union All
Select tb.RowNum
,rb.memberid
,Cast((rb.deposit + tb.deposit - tb.withdrawal) As Decimal(18,2))
,rb.withdrawal
,tb.trdate
From TrialBalance As tb
Join RunningBalance As rb On tb.RowNum = (rb.Rownum + 1) And tb.memberid = rb.memberid
)
Select rb.memberid
,Min(rb.deposit) As runningBalance
From RunningBalance As rb
Where Year(rb.trdate) = 2015
And Month(rb.trdate) = 1
Group By rb.memberid

Related

Multiple sum subqueries for percentage

I need help with the following problem: I want to make a query that contains multiples sums and then takes those sums and uses them to get a percentage: percentage= s1/s1+s2.
I have as input the following data:
Orders shipping date, Nb of orders that have arrived late, Nb of orders that have arrived on time
What I want as output: The percentage of orders that have arrived late and orders that have arrived on time.
I want another column in the table that will have the percentage using SQL.
Concrete example:
*On 2022/01/04 **10:00 AM** I have 3 orders late and 4 order on time=> 7 orders in total. Percentage=3/7 (late), (4/7) on time
*At 2022/01/04 **11:00 AM** I have 5 orders late and 6 orders on time=>11 orders in total (but all this entry is summed with the previous entry so:) <=> 5+3 orders late, 4+6 orders on time, 18 orders in total => percentage= 8/18 late, 10 on time.
In order to sum previous entries order numbers with status "LATE" to current on time order number I wrote the following sql:
(sum1=s1)
SELECT s1.EventDate, (
SELECT SUM(s2.NbOfOrders)
FROM OrderShipmentStats s2
WHERE s2.EventDate <= s1.EventDate AND s2.Status='LATE'
) AS cnt
FROM OrderShipmentStats s1
GROUP BY s1.EventDate, s1.Status
The same kind of sql was written for "On Time" and it works. But what I need to do now is get the values and add them together of the two sql queries and based on the status which is late or on time do s1/s1+s2 or s2/s2+s1.
My problem is that I do not know how to do this formula in a single query using those 2 subqueries, any help would be great.
Picture with Table
Above there is the link with the picture containing how the table looks(I am new so I am not allowed to embed a photo).
The percentage column is the one I will add and there are lines pointing towards how that is calculated.
I created the table based on your image and added a few rows to it.
In the query you could see total orders count per hour, per status and the grand total as you mentioned in the image.
The query looks like:
create table OrderShipmentsStats
(
EventDate datetime not null,
Status varchar(10) not null,
OrdersCount int not null
)
insert into OrderShipmentsStats
values
('2022-01-04T10:00:00','Late',3),
('2022-01-04T10:00:00','On Time',4),
('2022-01-04T11:00:00','Late',5),
('2022-01-04T11:00:00','On Time',6),
('2022-01-04T12:00:00','Late',1),
('2022-01-04T12:00:00','On Time',2)
SELECT
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
GrandStatusTotal,
-- at the line below, multiplying by 1.0 will convert the result and we would receive smth like 0.45, 0.123, some percentage
-- but we want the actual percent like 15%, or 50%. to obtain it, just multiply by 100
cast(1.0 * o.StatusTotal / o.GrandStatusTotal as decimal(5,3)) * 100 as Percentage
from
(
select
EventDate,
Status,
OrdersCount,
TotalPerHour,
StatusTotal,
SUM(TotalPerHour) over (partition by Status order by EventDate asc) as GrandStatusTotal
from
(
select
EventDate,
Status,
OrdersCount,
Sum(OrdersCount) over (partition by EventDate order by EventDate asc) as TotalPerHour,
SUM(OrdersCount) over (partition by Status order by EventDate asc) as StatusTotal
from OrderShipmentsStats
) as t
) as o
order by EventDate, Status

Can this cursor be replaced

I am currently using a cursor in my sql server procedure. Wanted to know if there is anyway to replace it with a better approach. Process is
Customer pays some money, and I create an entry for it in the payment table.
I start a cursor that selects all payments of that customer that have an available balance, from PAYMENT TABLE
Then I start an inner cursor that fetches all the bills of that customer which are still unpaid, from BILL TABLE
I pay off each bill till the current payment is exhausted and then repeat the process
How can I remove the cursors for a more effective way in steps 2 and 3. Also does use of cursors mean that the PAYMENT and BILL tables remain locked till the procedure runs?
Tx
Here's one way it can be done, with made up tables and data since we don't know what yours look like. I'm putting some narrative in in places but all of the code should be run as one single script.
Data setup:
declare #bills table (billid int, balance decimal(38,4))
declare #payments table (paymentid int, balance decimal(38,4))
insert into #bills (billid, balance) values
(1,0), (2,22.50), (3,12.75), (4,19.20)
insert into #payments (paymentid,balance) values
(1,20.19),(2,5.50),(3,20)
declare #newpayments table (billid int, paymentid int,
paymentamount decimal(38,4))
I've assumed that the bills and payments tables have a column, called balance which shows any amounts not dealt with as yet. Alternatively, you may have to calculate this from a couple of columns. But no sample data in your question means I get to make up an easy structure :-)
Query to populate #newpayments with which bills should be paid from which (partial) payments1:
; With unpaidbills as (
select billid,balance,
ROW_NUMBER() OVER (ORDER BY billid) as rn,
SUM(balance) OVER (ORDER BY billid
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) as endbalance,
SUM(balance) OVER (ORDER BY billid
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - balance as startbalance
from #bills
where balance > 0
), unusedpayments as (
select paymentid,balance,
ROW_NUMBER() OVER (ORDER BY paymentid) as rn,
SUM(balance) OVER (ORDER BY paymentid
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) as endbalance,
SUM(balance) OVER (ORDER BY paymentid
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) - balance as startbalance
from #payments
where balance > 0
), overlaps as (
select
billid,paymentid,
CASE WHEN ub.startbalance < up.startbalance
THEN up.startbalance ELSE ub.startbalance END as overlapstart,
CASE WHEN ub.endbalance > up.endbalance
THEN up.endbalance ELSE ub.endbalance END as overlapend
from
unpaidbills ub
inner join
unusedpayments up
on
ub.startbalance < up.endbalance and
up.startbalance < ub.endbalance
)
insert into #newpayments(billid,paymentid,paymentamount)
select billid,paymentid,overlapend - overlapstart as paymentamount
from overlaps
At this point, #newpayments can be used to generate transaction history, etc
And then, finally we update the original tables to mark the amounts used:
;With totalpaid as (
select billid,SUM(paymentamount) as payment from #newpayments
group by billid
)
update b
set b.balance = b.balance - tp.payment
from #bills b
inner join
totalpaid tp
on b.billid = tp.billid
;With totalused as (
select paymentid,SUM(paymentamount) as payment from #newpayments
group by paymentid
)
update p
set p.balance = p.balance - tu.payment
from #payments p
inner join
totalused tu
on p.paymentid = tu.paymentid
The key part was to use SUM() with window functions to calculate the running totals of the amounts owed (bills) or amounts available (payments), in both cases using a column (billid or paymentid) to determine in what order each of these items should be dealt with. E.g. the unpaidbills CTE produces a result set like this:
billid balance rn endbalance startbalance
----------- --------- -------------------- ------------- -------------
2 22.5000 1 22.5000 0.0000
3 12.7500 2 35.2500 22.5000
4 19.2000 3 54.4500 35.2500
and unusedpayments looks like this:
paymentid balance rn endbalance startbalance
----------- ---------- -------------------- ------------ -------------
1 20.1900 1 20.1900 0.0000
2 5.5000 2 25.6900 20.1900
3 20.0000 3 45.6900 25.6900
We then create the overlaps CTE which finds overlaps2 between the bills and payments where (part of) a payment can be used to satisfy (part of) a bill. The region of the overlap is the actual amount to pay for that bill.
1 The ROW_NUMBER() calls aren't really needed. In an early part of writing this query, I thought I was going to use these but it turned out to be unnecessary. But removing them doesn't shorten things enough to allow SO to stop scrolling that query anyway, and so I may as well leave them in (and not have to edit the result sets shown lower down also)
2 Many people trying to find overlaps make things absurdly complicated and deal with many special cases to find all overlaps. This can usually be done far more simply in the way that I show in the overlaps CTE - two ranges overlap if the first range starts before the second range ends, and the second range starts before the first range ends.
The only tricky thing to do is to decide whether you want to deal with two ranges that abut (the first one's end value is exactly equal to the second one's start or vice versa) but that just leads to a decision on whether to use < or <= in the comparisons.
In this instance, we don't care if a payment exactly paid off the previous bill so we use < to avoid treating such situations as an overlap.

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate

Calculating information by using values from previous line

I have the current balance for each account and I need to subtract the netamount for transactions to create the previous month's end balance for the past 24 months. Below is a sample dataset;
create table txn_by_month (
memberid varchar(15)
,accountid varchar(15)
,effective_year varchar(4)
,effective_month varchar(2)
,balance money
,netamt money
,prev_mnthendbal money)
insert into txn_by_month values
(10001,111222333,2012,12,634.15,-500,1134.15)
,(10001,111222333,2012,11,NULL,-1436,NULL)
,(10001,111222333,2012,10,NULL,600,NULL)
,(10002,111333444,2012,12,1544.20,1650,-105.80)
,(10002,111333444,2012,11,NULL,1210,NULL)
,(10002,111333444,2012,10,NULL,-622,NULL)
,(10003,111456456,2012,01,125000,1200,123800)
,(10003,111456456,2011,12,NULL,1350,NULL)
,(10003,111456456,2011,11,NULL,-102,NULL)
As you can see I already have a table of all the transactions for each month totaled up. I just need to calculate the previous month end balance on the first line and bring it down to the second, third line etc. I have been trying to use CTEs, but am not overly familiar with them and seem to be stuck at the moment. This is what I have;
;
WITH CTEtest AS
(SELECT ROW_NUMBER() OVER (PARTITION BY memberid order by(accountid)) AS Sequence
,memberid
,accountid
,prev_mnthendbal
,netamt
FROM txn_by_month)
select c1.memberid
,c1.accountid
,c1.sequence
,c2.prev_mnthendbal as prev_mnthendbal
,c1.netamt,
COALESCE(c2.prev_mnthendbal, 0) - COALESCE(c1.netamt, 0) AS cur_mnthendbal
FROM CTEtest AS c1
LEFT OUTER JOIN CTEtest AS c2
ON c1.memberid = c2.memberid
and c1.accountid = c2.accountid
and c1.Sequence = c2.Sequence + 1
This is working only for the sequence = 2. I know that my issue is that I need to bring my cur_mnthendbal value down into the next line, but I can't seem to wrap my head around how. Do I need another CTE?
Any help would be greatly appreciated!
EDIT: Maybe I need to explain it better.... If I have this;
The balance for line 2 would be the prev_mnthendbal from line 1 ($1,134.15). Then the prev_mnthendbal from line 2 would be the balance - netamt ($1,134.15 - (-$1,436) = $2,570.15). I have been trying to use CTEs, but I can't seem to figure out how to populate the balance field with the prev_mnthendbal from the previous line (since it isn't calculated until the balance is available). Maybe I can't use CTE? Do I need to use cursor?
Turns out that I needed to combine a running total with the sequential CTE I was using to begin with.
;
with CTEtest AS
(SELECT ROW_NUMBER() OVER (PARTITION BY memberid order by effective year, effective month desc) AS Sequence, *
FROM txn_by_month)
,test
as (select * , balance - netamt as running_sum from CTEtest where sequence = 1
union all
select t.*, t1.running_sum - t.netamt from CTEtest t inner join test t1
on t.memberid = t1.memberid and t.sequence = t1.Sequence+1 where t.sequence > 1)
select * from test
order by memberid, Sequence
Hopefully this will help someone else in the future.
See LEAD/LAG analytic functions.

Multiple Running Totals with Group By

I am struggling to find a good way to run running totals with a group by in it, or the equivalent. The below cursor based running total works on a complete table, but I would like to expand this to add a "Client" dimension. So I would get running totals as the below creates but for each company (ie Company A, Company B, Company C, etc.) in one table
CREATE TABLE test (tag int, Checks float, AVG_COST float, Check_total float, Check_amount float, Amount_total float, RunningTotal_Check float,
RunningTotal_Amount float)
DECLARE #tag int,
#Checks float,
#AVG_COST float,
#check_total float,
#Check_amount float,
#amount_total float,
#RunningTotal_Check float ,
#RunningTotal_Check_PCT float,
#RunningTotal_Amount float
SET #RunningTotal_Check = 0
SET #RunningTotal_Check_PCT = 0
SET #RunningTotal_Amount = 0
DECLARE aa_cursor CURSOR fast_forward
FOR
SELECT tag, Checks, AVG_COST, check_total, check_amount, amount_total
FROM test_3
OPEN aa_cursor
FETCH NEXT FROM aa_cursor INTO #tag, #Checks, #AVG_COST, #check_total, #Check_amount, #amount_total
WHILE ##FETCH_STATUS = 0
BEGIN
SET #RunningTotal_CHeck = #RunningTotal_CHeck + #checks
set #RunningTotal_Amount = #RunningTotal_Amount + #Check_amount
INSERT test VALUES (#tag, #Checks, #AVG_COST, #check_total, #Check_amount, #amount_total, #RunningTotal_check, #RunningTotal_Amount )
FETCH NEXT FROM aa_cursor INTO #tag, #Checks, #AVG_COST, #check_total, #Check_amount, #amount_total
END
CLOSE aa_cursor
DEALLOCATE aa_cursor
SELECT *, RunningTotal_Check/Check_total as CHECK_RUN_PCT, round((RunningTotal_Check/Check_total *100),0) as CHECK_PCT_BIN, RunningTotal_Amount/Amount_total as Amount_RUN_PCT, round((RunningTotal_Amount/Amount_total * 100),0) as Amount_PCT_BIN
into test_4
FROM test ORDER BY tag
create clustered index IX_TESTsdsdds3 on test_4(tag)
DROP TABLE test
----------------------------------
I can the the running total for any 1 company but I would like to do it for multiple to produce something like the results below.
CLIENT COUNT Running Total
Company A 1 6.7%
Company A 2 20.0%
Company A 3 40.0%
Company A 4 66.7%
Company A 5 100.0%
Company B 1 3.6%
Company B 2 10.7%
Company B 3 21.4%
Company B 4 35.7%
Company B 5 53.6%
Company B 6 75.0%
Company B 7 100.0%
Company C 1 3.6%
Company C 2 10.7%
Company C 3 21.4%
Company C 4 35.7%
Company C 5 53.6%
Company C 6 75.0%
Company C 7 100.0%
This is finally simple to do in SQL Server 2012, where SUM and COUNT support OVER clauses that contain ORDER BY. Using Cris's #Checks table definition:
SELECT
CompanyID,
count(*) over (
partition by CompanyID
order by Cleared, ID
) as cnt,
str(100.0*sum(Amount) over (
partition by CompanyID
order by Cleared, ID
)/
sum(Amount) over (
partition by CompanyID
),5,1)+'%' as RunningTotalForThisCompany
FROM #Checks;
SQL Fiddle here.
I originally started posting the SQL Server 2012 equivalent (since you didn't mention what version you were using). Steve has done a great job of showing the simplicity of this calculation in the newest version of SQL Server, so I'll focus on a few methods that work on earlier versions of SQL Server (back to 2005).
I'm going to take some liberties with your schema, since I can't figure out what all these #test and #test_3 and #test_4 temporary tables are supposed to represent. How about:
USE tempdb;
GO
CREATE TABLE dbo.Checks
(
Client VARCHAR(32),
CheckDate DATETIME,
Amount DECIMAL(12,2)
);
INSERT dbo.Checks(Client, CheckDate, Amount)
SELECT 'Company A', '20120101', 50
UNION ALL SELECT 'Company A', '20120102', 75
UNION ALL SELECT 'Company A', '20120103', 120
UNION ALL SELECT 'Company A', '20120104', 40
UNION ALL SELECT 'Company B', '20120101', 75
UNION ALL SELECT 'Company B', '20120105', 200
UNION ALL SELECT 'Company B', '20120107', 90;
Expected output in this case:
Client Count Running Total
--------- ----- -------------
Company A 1 17.54
Company A 2 43.86
Company A 3 85.96
Company A 4 100.00
Company B 1 20.55
Company B 2 75.34
Company B 3 100.00
One way:
;WITH gt(Client, Totals) AS
(
SELECT Client, SUM(Amount)
FROM dbo.Checks AS c
GROUP BY Client
), n (Client, Amount, rn) AS
(
SELECT c.Client, c.Amount,
ROW_NUMBER() OVER (PARTITION BY c.Client ORDER BY c.CheckDate)
FROM dbo.Checks AS c
)
SELECT n.Client, [Count] = n.rn,
[Running Total] = CONVERT(DECIMAL(5,2), 100.0*(
SELECT SUM(Amount) FROM n AS n2
WHERE Client = n.Client AND rn <= n.rn)/gt.Totals
)
FROM n INNER JOIN gt ON n.Client = gt.Client
ORDER BY n.Client, n.rn;
A slightly faster alternative - more reads but shorter duration and simpler plan:
;WITH x(Client, CheckDate, rn, rt, gt) AS
(
SELECT Client, CheckDate, rn = ROW_NUMBER() OVER
(PARTITION BY Client ORDER BY CheckDate),
(SELECT SUM(Amount) FROM dbo.Checks WHERE Client = c.Client
AND CheckDate <= c.CheckDate),
(SELECT SUM(Amount) FROM dbo.Checks WHERE Client = c.Client)
FROM dbo.Checks AS c
)
SELECT Client, [Count] = rn,
[Running Total] = CONVERT(DECIMAL(5,2), rt * 100.0/gt)
FROM x
ORDER BY Client, [Count];
While I've offered set-based alternatives here, in my experience I have observed that a cursor is often the fastest supported way to perform running totals. There are other methods such as the quirky update which perform about marginally faster but the result is not guaranteed. The set-based approach where you perform a self-join becomes more and more expensive as the source row counts go up - so what seems to perform okay in testing with a small table, as the table gets larger, the performance goes down.
I have a blog post almost fully prepared that goes through a slightly simpler performance comparison of various running totals approaches. It is simpler because it is not grouped and it only shows the totals, not the running total percentage. I hope to publish this post soon and will try to remember to update this space.
There is also another alternative to consider that doesn't require reading previous rows multiple times. It's a concept Hugo Kornelis describes as "set-based iteration." I don't recall where I first learned this technique, but it makes a lot of sense in some scenarios.
DECLARE #c TABLE
(
Client VARCHAR(32),
CheckDate DATETIME,
Amount DECIMAL(12,2),
rn INT,
rt DECIMAL(15,2)
);
INSERT #c SELECT Client, CheckDate, Amount,
ROW_NUMBER() OVER (PARTITION BY Client
ORDER BY CheckDate), 0
FROM dbo.Checks;
DECLARE #i INT, #m INT;
SELECT #i = 2, #m = MAX(rn) FROM #c;
UPDATE #c SET rt = Amount WHERE rn = 1;
WHILE #i <= #m
BEGIN
UPDATE c SET c.rt = c2.rt + c.Amount
FROM #c AS c
INNER JOIN #c AS c2
ON c.rn = c2.rn + 1
AND c.Client = c2.Client
WHERE c.rn = #i;
SET #i = #i + 1;
END
SELECT Client, [Count] = rn, [Running Total] = CONVERT(
DECIMAL(5,2), rt*100.0 / (SELECT TOP 1 rt FROM #c
WHERE Client = c.Client ORDER BY rn DESC)) FROM #c AS c;
While this does perform a loop, and everyone tells you that loops and cursors are bad, one gain with this method is that once the previous row's running total has been calculated, we only have to look at the previous row instead of summing all prior rows. The other gain is that in most cursor-based solutions you have to go through each client and then each check. In this case, you go through all clients' 1st checks once, then all clients' 2nd checks once. So instead of (client count * avg check count) iterations, we only do (max check count) iterations. This solution doesn't make much sense for the simple running totals example, but for the grouped running totals example it should be tested against the set-based solutions above. Not a chance it will beat Steve's approach, though, if you are on SQL Server 2012.
UPDATE
I've blogged about various running totals approaches here:
http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals
I didn't exactly understand the schema you were pulling from, but here is a quick query using a temp table that shows how to do a running total in a set based operation.
CREATE TABLE #Checks
(
ID int IDENTITY(1,1) PRIMARY KEY
,CompanyID int NOT NULL
,Amount float NOT NULL
,Cleared datetime NOT NULL
)
INSERT INTO #Checks
VALUES
(1,5,'4/1/12')
,(1,5,'4/2/12')
,(1,7,'4/5/12')
,(2,10,'4/3/12')
SELECT Info.ID, Info.CompanyID, Info.Amount, RunningTotal.Total, Info.Cleared
FROM
(
SELECT main.ID, SUM(other.Amount) as Total
FROM
#Checks main
JOIN
#Checks other
ON
main.CompanyID = other.CompanyID
AND
main.Cleared >= other.Cleared
GROUP BY
main.ID) RunningTotal
JOIN
#Checks Info
ON
RunningTotal.ID = Info.ID
DROP TABLE #Checks