Calculating information by using values from previous line - sql

I have the current balance for each account and I need to subtract the netamount for transactions to create the previous month's end balance for the past 24 months. Below is a sample dataset;
create table txn_by_month (
memberid varchar(15)
,accountid varchar(15)
,effective_year varchar(4)
,effective_month varchar(2)
,balance money
,netamt money
,prev_mnthendbal money)
insert into txn_by_month values
(10001,111222333,2012,12,634.15,-500,1134.15)
,(10001,111222333,2012,11,NULL,-1436,NULL)
,(10001,111222333,2012,10,NULL,600,NULL)
,(10002,111333444,2012,12,1544.20,1650,-105.80)
,(10002,111333444,2012,11,NULL,1210,NULL)
,(10002,111333444,2012,10,NULL,-622,NULL)
,(10003,111456456,2012,01,125000,1200,123800)
,(10003,111456456,2011,12,NULL,1350,NULL)
,(10003,111456456,2011,11,NULL,-102,NULL)
As you can see I already have a table of all the transactions for each month totaled up. I just need to calculate the previous month end balance on the first line and bring it down to the second, third line etc. I have been trying to use CTEs, but am not overly familiar with them and seem to be stuck at the moment. This is what I have;
;
WITH CTEtest AS
(SELECT ROW_NUMBER() OVER (PARTITION BY memberid order by(accountid)) AS Sequence
,memberid
,accountid
,prev_mnthendbal
,netamt
FROM txn_by_month)
select c1.memberid
,c1.accountid
,c1.sequence
,c2.prev_mnthendbal as prev_mnthendbal
,c1.netamt,
COALESCE(c2.prev_mnthendbal, 0) - COALESCE(c1.netamt, 0) AS cur_mnthendbal
FROM CTEtest AS c1
LEFT OUTER JOIN CTEtest AS c2
ON c1.memberid = c2.memberid
and c1.accountid = c2.accountid
and c1.Sequence = c2.Sequence + 1
This is working only for the sequence = 2. I know that my issue is that I need to bring my cur_mnthendbal value down into the next line, but I can't seem to wrap my head around how. Do I need another CTE?
Any help would be greatly appreciated!
EDIT: Maybe I need to explain it better.... If I have this;
The balance for line 2 would be the prev_mnthendbal from line 1 ($1,134.15). Then the prev_mnthendbal from line 2 would be the balance - netamt ($1,134.15 - (-$1,436) = $2,570.15). I have been trying to use CTEs, but I can't seem to figure out how to populate the balance field with the prev_mnthendbal from the previous line (since it isn't calculated until the balance is available). Maybe I can't use CTE? Do I need to use cursor?

Turns out that I needed to combine a running total with the sequential CTE I was using to begin with.
;
with CTEtest AS
(SELECT ROW_NUMBER() OVER (PARTITION BY memberid order by effective year, effective month desc) AS Sequence, *
FROM txn_by_month)
,test
as (select * , balance - netamt as running_sum from CTEtest where sequence = 1
union all
select t.*, t1.running_sum - t.netamt from CTEtest t inner join test t1
on t.memberid = t1.memberid and t.sequence = t1.Sequence+1 where t.sequence > 1)
select * from test
order by memberid, Sequence
Hopefully this will help someone else in the future.

See LEAD/LAG analytic functions.

Related

Percentage difference between numbers in two columns

My SQL experience is fairly minimal so please go easy on me here. I have a table tblForEx and I'm trying to create a query that looks at one particular column LastSalesRateChangeDate and also ForExRate.
Basically what I want to do is for the query to check that LastSalesRateChangeDate and then pull the ForExRate that is on the same line (obviously in the ForExRate column), then I need to check to see if there is a +/- 5% change since the last time the LastSalesRateChangeDate changed. I hope this makes sense, I tried to explain it as clearly as possible.
I believe I would need to create a 'subquery' to look at the LastSalesRateChangeDate and pull the ForEx rate from that date, but I just don't know how to go about this.
I should add this is being done in Access (SQL)
Sample data, here is what the table looks like:
| BaseCur | ForCur | ForExRate | LastSalesRateChangeDate
| USD | BRL | 1.718 | 12/9/2008
| USD | BRL | 1.65 | 11/8/2008
So I would need a query to look at the LastSalesRateChangeDate column, check to see if the date has changed, if so take the ForExRate value and then give a percentage difference of that ForExRate value since the last record.
So the final result would likely look like
"BaseCur" "ForCur" "Percentage Change since Last Sales Rate Change"
USD BRL X%
Gordon's answer pointed in the right direction:
SELECT t2.*, (SELECT top 1 t.ForExRate
FROM tblForEx t
where t.BaseCur=t2.BaseCur AND t.ForCur=t2.ForCur and t.LastSalesRateChangeDate<t2.LastSalesRateChangeDate
order by t.LastSalesRateChangeDate DESC, t.ForExRate DESC
) AS PreviousRate, [ForExRate]/[PreviousRate]-1 AS ChangeRatio
FROM tblForEx AS t2;
Access gives errors where the TOP 1 in the subquery causes "ties". We broke the ties and therefore removed the error by adding an extra item to the ORDER BY clause. To get the ratio to display as a percentage, switch to the design view and change the properties of that column accordingly.
If I understand correctly, you want the previous value. In MS Access, you can use a correlated subquery:
select t.*,
(select top (1) t2.LastSalesRateChangeDate
from tblForEx as t2
where t2.BaseCur = t.BaseCur and t2.ForCur = t.ForCur
t2.LastSalesRateChangeDate < t.LastSalesRateChangeDate
order by t2.LastSalesRateChangeDate desc
) as prev_LastSalesRateChangeDate
from t;
Now, with this as a subquery, you can get the previous exchange rate using a join:
select t.*, ( (t.ForExRate / tprev.ForExRate) - 1) as change_ratio
from (select t.*,
(select top (1) t2.LastSalesRateChangeDate
from tblForEx as t2
where t2.BaseCur = t.BaseCur and t2.ForCur = t.ForCur
t2.LastSalesRateChangeDate < t.LastSalesRateChangeDate
order by t2.LastSalesRateChangeDate desc
) as prev_LastSalesRateChangeDate
from t
) as t inner join
tblForEx as tprev
on tprev.BaseCur = t.BaseCur and tprev.ForCur = t.ForCur
tprev.LastSalesRateChangeDate = t.prev_LastSalesRateChangeDate;
As per my understanding, you can use LEAD function to get last changed date Rate in a new column by using below query:
WITH CTE AS (
SELECT *, LEAD(ForExRate, 1) OVER(PARTITION BY BaseCur, ForCur ORDER BY LastChangeDate DESC) LastValue
FROM #TT
)
SELECT BaseCur, ForCur, ForExRate, LastChangeDate , CAST( ((ForExRate - ISNULL(LastValue, 0))/LastValue)*100 AS float)
FROM CTE
Problem here is:
for every last row in group by you will have new calculalted column which we have made using LEAD function.
If there is only a single row for a particular BaseCur and ForCur, then also you will have NULL in column.
Resolution:
If you are sure that there will be at least two rows for each BaseCur and ForCur, then you can use WHERE clause to remove NULL values in final result.
WITH CTE AS (
SELECT *, LEAD(ForExRate, 1) OVER(PARTITION BY BaseCur, ForCur ORDER BY LastChangeDate DESC) LastValue
FROM #TT
)
SELECT BaseCur, ForCur, ForExRate, LastChangeDate , CAST( ((ForExRate - ISNULL(LastValue, 0))/LastValue)*100 AS float) Percentage
FROM CTE
WHERE LastValue IS NOT NULL
SELECT basetbl.BaseCur, basetbl.ForCur, basetbl.NewDate, basetbl.OldDate, num2.ForExRate/num1.ForExRate*100 AS PercentChange FROM
(((SELECT t.BaseCur, t.ForCur, MAX(t.LastSalesRateChangeDate) AS NewDate, summary.Last_Date AS OldDate
FROM (tblForEx AS t
LEFT JOIN (SELECT TOP 2 BaseCur, ForCur, MAX(LastSalesRateChangeDate) AS Last_Date FROM tblForEx AS t1
WHERE LastSalesRateChangeDate <>
(SELECT MAX(LastSalesRateChangeDate) FROM tblForEx t2 WHERE t2.BaseCur = t1.BaseCur AND t2.ForCur = t1.ForCur)
GROUP BY BaseCur, ForCur) AS summary
ON summary.ForCur = t.ForCur AND summary.BaseCur = t.BaseCur)
GROUP BY t.BaseCur, t.ForCur, summary.Last_Date) basetbl
LEFT JOIN tblForEx num1 ON num1.BaseCur=basetbl.BaseCur AND num1.ForCur = basetbl.ForCur AND num1.LastSalesRateChangeDate = basetbl.OldDate))
LEFT JOIN tblForEx num2 ON num2.BaseCur=basetbl.BaseCur AND num2.ForCur = basetbl.ForCur AND num2.LastSalesRateChangeDate = basetbl.NewDate;
This uses a series of subqueries. First, you are selecting the most recent date for the BaseCur and ForCur. Then, you are joining onto that the previous date. I do that by using another subquery to select the top two dates, and exclude the one that is equal to the previously established most recent date. This is the "summary" subquery.
Then, you get the BaseCur, ForCur, NewDate, and OldDate in the "basetbl" subquery. After that, it is two simple joins of the original table back onto those dates to get the rate that was applicable then.
Finally, you are selecting your BaseCur, ForCur, and whatever formula you want to use to calculate the rate change. I used a simple ratio in that one, but it is easy to change. You can remove the dates in the first line if you want, they are there solely as a reference point.
It doesn't look pretty, but complicated Access SQL queries never do.

sql lowest running balance in a group

I've been trying for days to solve this problem to no solution.
I want to get the lowest running balance in a group.
Here is a sample data
The running balance is imaginary and is not part of the table.
the running balance is also computed dynamically.
the problem is I want to get the lowest running balance in a Specific month (January)
so the output should be 150 for memberid 10001 and 175 for memberid 10002 as highlighted in the image.
my desired out put should be
memberid | balance
10001 | 150
10002 | 175
Is that possible using sql query only?
PS. Using c# to compute lowest running balance is very slow since I have more than 600,000 records in my table.
I've updated the question.
The answer provided by Mihir Shah gave me the idea how solve my problem.
His answer takes to much time to process making it as slow as my computation on my c# program because his code loops on every record.
Here is my answer to get the minimum lowest value in a specific group (specific month) with a running value or running total without sacrificing a lot of performance.
with IniMonth1 as
(
select a.memberid, a.iniDeposit, a.iniWithdrawal,
(cast(a.iniDeposit as decimal(10,2)) - cast(a.iniWithdrawal as decimal(10,2))) as RunningTotal
from
(
select b.memberid, sum(b.depositamt) as iniDeposit, sum(b.withdrawalamt) as iniWithdrawal
from savings b
where trdate < '01/01/2016'
group by b.memberid
) a /*gets all the memberid, sum of deposit amount and withdrawal amt from the beginning of the savings before the specific month */
where cast(a.iniDeposit as decimal(10,2)) - cast(a.iniWithdrawal as decimal(10,2)) > 0 /*filters zero savings */
)
,DetailMonth1 as
(
select a.memberid, a.depositamt,a.withdrawalamt,
(cast(a.depositamt as decimal(10,2)) - cast(a.withdrawalamt as decimal(10,2))) as totalBal,
Row_Number() Over(Partition By a.memberid Order By a.trdate Asc) RowID
from savings a
where
a.trdate >= '01/01/2016'
and
a.trdate <= '01/31/2016'
and (a.depositamt<>0 or a.withdrawalamt<>0)
) /* gets all record within the specific month and gives a no of row as an id for the running value in the next procedure*/
,ComputedDetailMonth1 as
(
select a.memberid, min(a.runningbalance) as MinRunningBal
from
(
select a.rowid, a.memberid, a.totalbal,
(
sum(b.totalbal) +
(case
when c.runningtotal is null then 0
else c.runningtotal
end)
)as runningbalance , c.runningtotal as oldbalance
from DetailMonth1 a
inner join DetailMonth1 b
on b.rowid<=a.rowid
and a.memberid=b.memberid
left join IniMonth1 c
on a.memberid=c.memberid
group by a.rowid,a.memberid,a.totalbal,c.runningtotal
) a
group by a.memberid
) /* the loop is only for the records of the specific month only making it much faster */
/* this gets the running balance of specific month ONLY and ADD the sum total of IniMonth1 using join to get the running balance from the beginning of savings to the specific month */
/* I then get the minimum of the output using the min function*/
, OldBalanceWithNoNewSavingsMonth1 as
(
select a.memberid,a.RunningTotal
from
IniMonth1 a
left join
DetailMonth1 b
on a.memberid = b.memberid
where b.totalbal is null
)/*this gets all the savings that is not zero and has no transaction in the specific month making and this will become the default value as the lowest value if the member has no transaction in the specific month. */
,finalComputedMonth1 as
(
select a.memberid,a.runningTotal as MinRunTotal from OldBalanceWithNoNewSavingsMonth1 a
union
select b.memberid,b.MinRunningBal from ComputedDetailMonth1 b
)/*the record with minimum running total with clients that has a transaction in the specific month Unions with the members with no current transaction in the specific month*/
select * from finalComputedMonth1 order by memberid /* display the final output */
I have more than 600k savings record on my savings table
Surprisingly the performance of this code is very efficient.
It takes almost 2hr using my c# program to manually compute every record of all the members.
This code makes only 2 secs and at most 9 secs just to compute everything.
i Just display to c# for another 2secs.
The output of this code was tested and compared with my computation using my c# program.
May be below one is help you
Set Nocount On;
Declare #CashFlow Table
(
savingsid Varchar(50)
,memberid Int
,trdate Date
,deposit Decimal(18,2)
,withdrawal Decimal(18,2)
)
Insert Into #CashFlow(savingsid,memberid,trdate,deposit,withdrawal) Values
('10001-0002',10001,'01/01/2015',1000,0)
,('10001-0003',10001,'01/07/2015',25,0)
,('10001-0004',10001,'01/13/2015',25,0)
,('10001-0005',10001,'01/19/2015',0,900)
,('10001-0006',10001,'01/25/2015',25,0)
,('10001-0007',10001,'01/31/2015',25,0)
,('10001-0008',10001,'02/06/2015',25,0)
,('10001-0009',10001,'02/12/2015',25,0)
,('10001-0010',10001,'02/18/2015',0,200)
,('10002-0001',10002,'01/01/2015',500,0)
,('10002-0002',10002,'01/07/2015',25,0)
,('10002-0003',10002,'01/13/2015',0,200)
,('10002-0004',10002,'01/19/2015',25,0)
,('10002-0005',10002,'01/25/2015',25,0)
,('10002-0006',10002,'01/31/2015',0,200)
,('10002-0007',10002,'02/06/2015',25,0)
,('10002-0008',10002,'02/12/2015',25,0)
,('10002-0009',10002,'02/12/2015',0,200)
;With TrialBalance As
(
Select Row_Number() Over(Partition By cf.memberid Order By cf.trdate Asc) RowNum
,cf.memberid
,cf.deposit
,cf.withdrawal
,cf.trdate
From #CashFlow As cf
)
,RunningBalance As
(
Select tb.RowNum
,tb.memberid
,tb.deposit
,tb.withdrawal
,tb.trdate
From TrialBalance As tb
Where tb.RowNum = 1
Union All
Select tb.RowNum
,rb.memberid
,Cast((rb.deposit + tb.deposit - tb.withdrawal) As Decimal(18,2))
,rb.withdrawal
,tb.trdate
From TrialBalance As tb
Join RunningBalance As rb On tb.RowNum = (rb.Rownum + 1) And tb.memberid = rb.memberid
)
Select rb.memberid
,Min(rb.deposit) As runningBalance
From RunningBalance As rb
Where Year(rb.trdate) = 2015
And Month(rb.trdate) = 1
Group By rb.memberid

Calculating current consecutive days from a table

I have what seems to be a common business request but I can't find no clear solution. I have a daily report (amongst many) that gets generated based on failed criteria and gets saved to a table. Each report has a type id tied to it to signify which report it is, and there is an import event id that signifies the day the imports came in (a date column is added for extra clarification). I've added a sqlfiddle to see the basic schema of the table (renamed for privacy issues).
http://www.sqlfiddle.com/#!3/81945/8
All reports currently generated are working fine, so nothing needs to be modified on the table. However, for one report (type 11), not only I need pull the invoices that showed up today, I also need to add one column that totals the amount of consecutive days from date of run for that invoice (including current day). The result should look like the following, based on the schema provided:
INVOICE MESSAGE EVENT_DATE CONSECUTIVE_DAYS_ON_REPORT
12345 Yes July, 30 2013 6
54355 Yes July, 30 2013 2
644644 Yes July, 30 2013 4
I only need the latest consecutive days, not any other set that may show up. I've tried to run self joins to no avail, and my last attempt is also listed as part of the sqlfiddle file, to no avail. Any suggestions or ideas? I'm quite stuck at the moment.
FYI: I am working in SQL Server 2000! I have seen a lot of neat tricks that have come out in 2005 and 2008, but I can't access them.
Your help is greatly appreciated!
Something like this? http://www.sqlfiddle.com/#!3/81945/14
SELECT
[final].*,
[last].total_rows
FROM
tblEventInfo AS [final]
INNER JOIN
(
SELECT
[first_of_last].type_id,
[first_of_last].invoice,
MAX([all_of_last].event_date) AS event_date,
COUNT(*) AS total_rows
FROM
(
SELECT
[current].type_id,
[current].invoice,
MAX([current].event_date) AS event_date
FROM
tblEventInfo AS [current]
LEFT JOIN
tblEventInfo AS [previous]
ON [previous].type_id = [current].type_id
AND [previous].invoice = [current].invoice
AND [previous].event_date = [current].event_date-1
WHERE
[current].type_id = 11
AND [previous].type_id IS NULL
GROUP BY
[current].type_id,
[current].invoice
)
AS [first_of_last]
INNER JOIN
tblEventInfo AS [all_of_last]
ON [all_of_last].type_id = [first_of_last].type_id
AND [all_of_last].invoice = [first_of_last].invoice
AND [all_of_last].event_date >= [first_of_last].event_date
GROUP BY
[first_of_last].type_id,
[first_of_last].invoice
)
AS [last]
ON [last].type_id = [final].type_id
AND [last].invoice = [final].invoice
AND [last].event_date = [final].event_date
The inner most query looks up the starting record of the last block of consecutive records.
Then that joins on to all the records in that block of consecutive records, giving the final date and the count of rows (consecutive days).
Then that joins on to the row for the last day to get the message, etc.
Make sure that in reality you have an index on (type_id, invoice, event_date).
You have multiple problems. Tackle them separately and build up.
Problems:
1) Identifying consecutive ranges: subtract the row_number from the range column and group by the result
2) No ROW_NUMBER() functions in SQL 2000: Fake it with a correlated subquery.
3) You actually want DENSE_RANK() instead of ROW_NUMBER: Make a list of unique dates first.
Solutions:
3)
SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date
2)
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.event_date > t1.event_date),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
ORDER BY invoice,grp,event_date
1)
SELECT
t3.invoice AS INVOICE,
MAX(t3.event_date) AS EVENT_DATE,
COUNT(t3.event_date) AS CONSECUTIVE_DAYS_ON_REPORT
FROM (
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.id > t1.id),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
) t3
GROUP BY t3.invoice,t3.grp
The rest of your question is a little ambiguous. If two ranges are of equal length, do you want both or just the most recent? Should the output MESSAGE be 'Yes' if any message = 'Yes' or only if the most recent message = 'Yes'?
This should give you enough of a breadcrumb though
I had a similar requirement not long ago getting a "Top 5" ranking with a consecutive number of periods in Top 5. The only solution I found was to do it in a cursor. The cursor has a date = #daybefore and inside the cursor if your data does not match quit the loop, otherwise set #daybefore = datediff(dd, -1, #daybefore).
Let me know if you want an example. There just seem to be a large number of enthusiasts, who hit downvote when they see the word "cursor" even if they don't have a better solution...
Here, try a scalar function like this:
CREATE FUNCTION ConsequtiveDays
(
#invoice bigint, #date datetime
)
RETURNS int
AS
BEGIN
DECLARE #ct int = 0, #Count_Date datetime, #Last_Date datetime
SELECT #Last_Date = #date
DECLARE counter CURSOR LOCAL FAST_FORWARD
FOR
SELECT event_date FROM tblEventInfo
WHERE invoice = #invoice
ORDER BY event_date DESC
FETCH NEXT FROM counter
INTO #Count_Date
WHILE ##FETCH_STATUS = 0 AND DATEDIFF(dd,#Last_Date,#Count_Date) < 2
BEGIN
#ct = #ct + 1
END
CLOSE counter
DEALLOCATE counter
RETURN #ct
END
GO

Datediff between two tables

I have those two tables
1-Add to queue table
TransID , ADD date
10 , 10/10/2012
11 , 14/10/2012
11 , 18/11/2012
11 , 25/12/2012
12 , 1/1/2013
2-Removed from queue table
TransID , Removed Date
10 , 15/1/2013
11 , 12/12/2012
11 , 13/1/2013
11 , 20/1/2013
The TansID is the key between the two tables , and I can't modify those tables, what I want is to query the amount of time each transaction spent in the queue
It's easy when there is one item in each table , but when the item get queued more than once how do I calculate that?
Assuming the order TransIDs are entered into the Add table is the same order they are removed, you can use the following:
WITH OrderedAdds AS
( SELECT TransID,
AddDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY AddDate)
FROM AddTable
), OrderedRemoves AS
( SELECT TransID,
RemovedDate,
[RowNumber] = ROW_NUMBER() OVER(PARTITION BY TransID ORDER BY RemovedDate)
FROM RemoveTable
)
SELECT OrderedAdds.TransID,
OrderedAdds.AddDate,
OrderedRemoves.RemovedDate,
[DaysInQueue] = DATEDIFF(DAY, OrderedAdds.AddDate, ISNULL(OrderedRemoves.RemovedDate, CURRENT_TIMESTAMP))
FROM OrderedAdds
LEFT JOIN OrderedRemoves
ON OrderedAdds.TransID = OrderedRemoves.TransID
AND OrderedAdds.RowNumber = OrderedRemoves.RowNumber;
The key part is that each record gets a rownumber based on the transaction id and the date it was entered, you can then join on both rownumber and transID to stop any cross joining.
Example on SQL Fiddle
DISCLAIMER: There is probably problem with this, but i hope to send you in one possible direction. Make sure to expect problems.
You can try in the following direction (which might work in some way depending on your system, version, etc) :
SELECT transId, (sum(add_date_sum) - sum(remove_date_sum)) / (1000*60*60*24)
FROM
(
SELECT transId, (SUM(UNIX_TIMESTAMP(add_date)) as add_date_sum, 0 as remove_date_sum
FROM add_to_queue
GROUP BY transId
UNION ALL
SELECT transId, 0 as add_date_sum, (SUM(UNIX_TIMESTAMP(remove_date)) as remove_date_sum
FROM remove_from_queue
GROUP BY transId
)
GROUP BY transId;
A bit of explanation: as far as I know, you cannot sum dates, but you can convert them to some sort of timestamps. Check if UNIX_TIMESTAMPS works for you, or figure out something else. Then you can sum in each table, create union by conveniently leaving the other one as zeto and then subtracting the union query.
As for that devision in the end of first SELECT, UNIT_TIMESTAMP throws out miliseconds, you devide to get days - or whatever it is that you want.
This all said - I would probably solve this using a stored procedure or some client script. SQL is not a weapon for every battle. Making two separate queries can be much simpler.
Answer 2: after your comments. (As a side note, some of your dates 15/1/2013,13/1/2013 do not represent proper date formats )
select transId, sum(numberOfDays) totalQueueTime
from (
select a.transId,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate
) X
group by transId
Answer 1: before your comments
Assuming that there won't be a new record added unless it is being removed. Also note following query will bring numberOfDays as zero for unremoved records;
select a.transId, a.addDate, r.removeDate,
datediff(day,a.addDate,isnull(r.removeDate,a.addDate)) numberOfDays
from AddTable a left join RemoveTable r on a.transId = r.transId
order by a.transId, a.addDate, r.removeDate

SQL if breaking number pattern, mark record?

I have the following query:
SELECT AccountNumber, RptPeriod
FROM dbo.Report
ORDER BY AccountNumber, RptPeriod.
I get the following results:
123 200801
123 200802
123 200803
234 200801
344 200801
344 200803
I need to mark the record where the rptperiod doesnt flow concurrently for the account. For example 344 200803 would have an X next to it since it goes from 200801 to 200803.
This is for about 19321 rows and I want it on a company basis so between different companies I dont care what the numbers are, I just want the same company to show where there is breaks in the number pattern.
Any Ideas??
Thanks!
OK, this is kind of ugly (double join + anti-join) but it gets the work done, AND is pure portable SQL:
SELECT *
FROM dbo.Report R1
, dbo.Report R2
WHERE R1.AccountNumber = R2.AccountNumber
AND R2.RptPeriod - R1.RptPeriod > 1
-- subsequent NOT EXISTS ensures that R1,R2 rows found are "next to each other",
-- e.g. no row exists between them in the ordering above
AND NOT EXISTS
(SELECT 1 FROM dbo.Report R3
WHERE R1.AccountNumber = R3.AccountNumber
AND R2.AccountNumber = R3.AccountNumber
AND R1.RptPeriod < R3.RptPeriod
AND R3.RptPeriod < R2.RptPeriod
)
Something like this should do it:
-- cte lists all items by AccountNumber and RptPeriod, assigning an ascending integer
-- to each RptPeriod and restarting at 1 for each new AccountNumber
;WITH cte (AccountNumber, RptPeriod, Ranking)
as (select
AccountNumber
,RptPeriod
,row_number() over (partition by AccountNumber order by AccountNumber, RptPeriod) Ranking
from dbo.Report)
-- and then we join each row with each preceding row based on that "Ranking" number
select
This.AccountNumber
,This.RptPeriod
,case
when Prior.RptPeriod is null then '' -- Catches the first row in a set
when Prior.RptPeriod = This.RptPeriod - 1 then '' -- Preceding row's RptPeriod is one less that This row's RptPeriod
else 'x' -- -- Preceding row's RptPeriod is not less that This row's RptPeriod
end UhOh
from cte This
left outer join cte Prior
on Prior.AccountNumber = This.AccountNumber
and Prior.Ranking = This.Ranking - 1
(Edited to add comments)
WITH T
AS (SELECT *,
/*Each island of contiguous data will have
a unique AccountNumber,Grp combination*/
RptPeriod - ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) Grp,
/*RowNumber will be used to identify first record
per company, this should not be given an 'X'. */
ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) AS RN
FROM Report)
SELECT AccountNumber,
RptPeriod,
/*Check whether first in group but not first over all*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY AccountNumber, Grp
ORDER BY RptPeriod) = 1
AND RN > 1 THEN 'X'
END AS Flag
FROM T
SELECT *
FROM report r
LEFT JOIN report r2
ON r.accountnumber = r.accountnumber
AND {r2.rptperiod is one day after r.rptPeriod}
JOIN report r3
ON r3.accountNumber = r.accountNumber
AND r3.rptperiod > r1.rptPeriod
WHERE r2.rptPeriod IS NULL
AND r3 IS NOT NULL
I'm not sure of sql servers date logic syntax, but hopefully you get the idea. r will be all the records where the next rptPeriod is NULL (r2) and there exists at least one greater rptPeriod (r3). The query isn't super straight forward I guess, but if you have an index on the two columns, it'll probably be the most efficent way to get your data.
Basically, you number rows within every account, then, using the row numbers, compare the RptPeriod values for the neighbouring rows.
It is assumed here that RptPeriod is the year and month encoded, for which case the year transition check has been added.
;WITH Report_sorted AS (
SELECT
AccountNumber,
RptPeriod,
rownum = ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY RptPeriod)
FROM dbo.Report
)
SELECT
AccountNumber,
RptPeriod,
CASE ISNULL(CASE WHEN r1.RptPeriod / 100 < r2.RptPeriod / 100 THEN 12 ELSE 0 END
+ r1.RptPeriod - r2.RptPeriod, 1) AS Chk
WHEN 1 THEN ''
ELSE 'X'
END
FROM Report_sorted r1
LEFT JOIN Report_sorted r2
ON r1.AccountNumber = r2.AccountNumber AND r1.rownum = r2.rownum + 1
It could be complicated further with an additional check for gaps spanning a year and more, if you need that.