SQL Server 2008: Need to do math on the previous row - sql

Working in SQL Server 2008 so the analytical functions are not an option.
Basically I have amount financed and payment made, but need to calculate interest for the first row - which is done, but need for the next row so need to grab the balance from the previous row.

Without any schema context, I can only provide a general structure, but in SQL Server 2008 you should be able to do something like this:
-- This is called a CTE (Common Table Expression)
-- Think of it as a named sub-query
;WITH computed_table AS (
-- The ROW_NUMBER() function produces an ordered computed
-- column ordered by the values in the column specified in
-- the OVER clause
SELECT ROW_NUMBER() OVER(ORDER BY Id) AS row_num
,*
FROM my_table
)
SELECT *
-- perform calculations on t1 and t2
,(t1.amount - t2.amount) AS CalculatedAmt -- example calcuation
FROM computed_table t1
OUTER APPLY (
SELECT *
FROM computed_table t2
WHERE t2.row_num = t1.row_num - 1
) AS prev
The CTE and the ROW_NUMBER() function are necessary to make sure you have a perfectly ordered column with no gaps, something which can't be guaranteed with a primary key field since rows could be deleted. The OUTER APPLY allows you to perform a table-valued operation on the individual values of the rows in the left hand table.
EDIT: To insert the results into a table, rather than just selecting them, you can add a INSERT clause after the SELECT clause:
...(CTE HERE)...
SELECT *
-- perform calculations on t1 and t2
,(t1.amount - t2.amount) AS CalculatedAmt -- example calcuation
-- This INSERT clause will insert the result set into my_table. Make
-- sure the column aliases in the SELECT clause match the column names
-- in my_table.
INTO my_table
FROM computed_table t1
...(REST OF QUERY HERE)...

Try this example
DECLARE #tbl TABLE(ID INT, Test VARCHAR(100),SortKey INT);
INSERT INTO #tbl VALUES(1,'Test 1 3',3),(2,'Test 2 4',4),(3,'Test 3 1',1),(4,'Test 4 2',2);
WITH Sorted AS
(
SELECT ROW_NUMBER() OVER(ORDER BY SortKey) AS Nr
,*
FROM #tbl
)
SELECT s.Test
,(SELECT prev.Test FROM Sorted AS prev WHERE s.Nr=prev.Nr+1) AS PreviousRow
,(SELECT nxt.Test FROM Sorted AS nxt WHERE s.Nr=nxt.Nr-1) AS NextRow
FROM sorted AS s
Attention
ROW_NUMBER() OVER() will only work as expected, if the values you are sorting after are unique!
The result
Test PreviousRow NextRow
Test 3 1 NULL Test 4 2
Test 4 2 Test 3 1 Test 1 3
Test 1 3 Test 4 2 Test 2 4
Test 2 4 Test 1 3 NULL

Related

How to return column changes in a column [duplicate]

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer

NTH row query in Oracle not behaving as expected

Given;
CREATE TABLE T1 (ID INTEGER, DESCRIPTION VARCHAR2(20));
INSERT INTO T1 VALUES (1,'ONE');
INSERT INTO T1 VALUES (2,'TWO');
INSERT INTO T1 VALUES (3,'THREE');
INSERT INTO T1 VALUES (4,'FOUR');
INSERT INTO T1 VALUES (5,'FIVE');
COMMIT;
Why does;
SELECT * FROM
( SELECT ROWNUM, ID, DESCRIPTION
FROM T1)
WHERE MOD(ROWNUM,1)=0;
Return
ROWNUM ID DESCRIPTION
------ -------------------------------------- --------------------
1 1 ONE
2 2 TWO
3 3 THREE
4 4 FOUR
5 5 FIVE
Whereas;
SELECT * FROM
( SELECT ROWNUM, ID, DESCRIPTION
FROM T1)
WHERE MOD(ROWNUM,2)=0;
Return zero rows ???
Confused, expected ROWNUM=(2,4) to be returned...
SELECT B.* FROM
( SELECT ROWNUM a, ID, DESCRIPTION
FROM T1) B
WHERE MOD(A,2)=0;
Reason: Your approach involves running rownum twice. You don't need to; nor really do you want to. Based on order of operations, the where clause will execute before the the outer select; which means the select hasn't determined the values for each row, and the number of rows is not known yet.
Additional:
I would recommend adding an order by to the inline view so the rownumbers are in a expected specific order as opposed to what the engine derives.
You have 2 operations of ROWNUM.
The 1st ROWNUM generates the numbers 1 through 5.
The 2nd ROWNUM doesn't generate anything because for the row the ROWNUM value is 1, but since MOD(1,2)=0 is false, the record is not being outputted and the ROWNUM is not being incremented, failing the condition again and again.
This query, using alias, returns exactly what you have expected:
SELECT * FROM
( SELECT ROWNUM as rn, ID, DESCRIPTION
FROM T1)
WHERE MOD(rn,2)=0;
Some facts about the ROWNUM pseudo column in Oracle:
The ROWNUM assigned to each row is determined by the order in which Oracle retrieves the row from the DB.
The order in which rows are returned is non deterministic, such that running it once may return rows in one ordering, and a second time around may have a different ordering if the base tables have been reorganized, or Oracle uses a different query plan.
The order in which ROWNUMs are assigned to rows is not necessarily correlated with the that of an order by clause (the order by clause may affect the ROWNUM order since it may cause a different query plan to be used, but the ROWNUMbers are unlikely to match the sort order).
ROWNUMbers are assigned after the records are filtered by the WHERE clause, so if you filter out ROWNUM 1 you will never return any records.
Filtering a subquery that returns an aliased ROWNUM column works because the entire subquery is returned to the outer query before the outer query filters the rows, but the ROWNUMs will still have a non deterministic order.
To successfully return a top N or Nth row query in a deterministic fashion you need to assign row numbers in a deterministic way. One such way is to use the the `ROW_NUMBER' analytic function in a subquery:
select * from
(select ROW_NUMBER() over (order by ID) rn
, ID
, DESCRIPTION
from T1)
where rn <= 4 -- Top N
or
where rn = 4 -- 1st Nth row
or even
WHERE MOD(rn,2)=0 -- every Nth row
In either case the ORDER BY clause in the ROW_NUMBER analytic function needs to match the granularity of the data otherwise ties in the ordering will again be non deterministic, most likely matching the current ROWNUM ordering.

Update behaviour

When I made a mistake in update query text, I spotted unpredictable query result. Here is query text for update.
DECLARE #T TABLE (Id int,[Name] nvarchar(100),RNA int)
INSERT INTO #T(Id,[Name])
SELECT [Id],[Name]
FROM (VALUES (1,N'D'),
(2,N'B'),
(3,N'S'),
(4,N'A'),
(5,N'F')
) AS vtable([Id],[Name])
UPDATE #T
SET RNA = T.RN
FROM (
select PP.Name,ROW_NUMBER() OVER(ORDER BY PP.Name) RN,PP.RNA from #T PP
) T
select * from #T
I know where mistake was made:
UPDATE #T
should be
UPDATE T
But why result (with "bad" query) looks like:
Id Name RNA
---- ----- -------
1 D 1
2 B 5
3 S 1
4 A 5
5 F 1
I suspect that 1 and 5 values are MIN(Id) and MAX(Id).
Execution plan look like:
Will this situation be the same in every situation with this kind of mistake?
If yes, has this behaviour any practical value?
The situation will not be the same for every kind of mistake. You have a non-determinisitic update statement, that is to say theoritically any of the values for RN in your subquery T could be applied to any of the values in #T. You are essentially running the UPDATE version of this:
SELECT *
FROM #t a
CROSS JOIN
( SELECT TOP 1
PP.Name,
ROW_NUMBER() OVER(ORDER BY PP.Name) RN,
PP.RNA
FROM #T PP
ORDER BY NEWID()
) T
OPTION (FORCE ORDER);
The online manual states:
The results of an UPDATE statement are undefined if the statement
includes a FROM clause that is not specified in such a way that only
one value is available for each column occurrence that is updated,
that is if the UPDATE statement is not deterministic.
What is slightly interesting is that if you run the above you will get a different result each time (barring the 1/25 chance of getting the same result twice in a row), if you remove the random sorting using NEWID() you will get the same value of RN for each row, but the update consistently returns the same results, with 2 different RNs. I am not surprised the result remains consistent with no random ordering because with no changes to the data, and no random factor introduced I would expect the optimiser to come up with the same execution plan no matter how many times it is run.
Since no explicit ordering is specified in your update query, the order is due to the order of the records on the leaf, if the order of the records is altered the result is altered. This can be shown by inserting the records of #T into a new table with different IDs
DECLARE #T2 TABLE (Id int,[Name] nvarchar(100),RNA int);
INSERT #T2
SELECT id, Name, NULL
FROM #T
ORDER BY ROW_NUMBER() OVER(ORDER BY NEWID())
OPTION (FORCE ORDER);
UPDATE #T2
SET RNA = T.RN
FROM (
select PP.Name,ROW_NUMBER() OVER(ORDER BY PP.Name) RN,PP.RNA from #T2 PP
) T
SELECT *
FROM #T2;
I can see no reason why this is always the min or max value of RN though, I expect you would have to delve deep into the optimiser to find this. Which is probably a new question better suited for the dba stack exchange.

Get row count including column values in sql server

I need to get the row count of a query, and also get the query's columns in one single query. The count should be a part of the result's columns (It should be the same for all rows, since it's the total).
for example, if I do this:
select count(1) from table
I can have the total number of rows.
If I do this:
select a,b,c from table
I'll get the column's values for the query.
What I need is to get the count and the columns values in one query, with a very effective way.
For example:
select Count(1), a,b,c from table
with no group by, since I want the total.
The only way I've found is to do a temp table (using variables), insert the query's result, then count, then returning the join of both. But if the result gets thousands of records, that wouldn't be very efficient.
Any ideas?
#Jim H is almost right, but chooses the wrong ranking function:
create table #T (ID int)
insert into #T (ID)
select 1 union all
select 2 union all
select 3
select ID,COUNT(*) OVER (PARTITION BY 1) as RowCnt from #T
drop table #T
Results:
ID RowCnt
1 3
2 3
3 3
Partitioning by a constant makes it count over the whole resultset.
Using CROSS JOIN:
SELECT a.*, b.numRows
FROM YOUR_TABLE a
CROSS JOIN (SELECT COUNT(*) AS numRows
FROM YOUR_TABLE) b
Look at the Ranking functions of SQL Server.
SELECT ROW_NUMBER() OVER (ORDER BY a) AS 'RowNumber', a, b, c
FROM table;
You could do it like this:
SELECT x.total, a, b, c
FROM
table
JOIN (SELECT total = COUNT(*) FROM table) AS x ON 1=1
which will return the total number of records in the first column, followed by fields a,b & c

Is there a way to access the "previous row" value in a SELECT statement?

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer