SQL Server - Sum of difference between rows in a table - sql

I have a table in the format :
SomeID SomeData
1 3
2 7
3 9
4 10
5 14
6 16
. .
. .
I want to find sum of difference between rows in this table. i.e ( (7-3) + (10-9) + (16-14) + ....)
Which is the best way to do this

Using a self join along with the modulus:
SELECT SUM(t1.SomeData - t2.SomeData) AS total_diff
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.SomeID = t2.SomeID + 1
WHERE t1.SomeID % 2 = 0;
Demo
This answer assumes that the SomeID sequence in fact starts with 1 and increments by 1 with each subsequent row. If not, then we might be able to first apply ROW_NUMBER over SomeID and generate a 1 to N sequence.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY SomeID) rn
FROM yourTable
)
SELECT SUM(t1.SomeData - t2.SomeData) AS total_diff
FROM cte t1
INNER JOIN cte t2
ON t1.SomeID = t2.SomeID + 1
WHERE t1.rn % 2 = 0;

You can try to use ROW_NUMBER window function to make a serial number then MOD by 2 to get your expected group then use condition aggregate function.
Query 1:
SELECT SUM(CASE WHEN rn = 0 THEN SomeData END) - SUM(CASE WHEN rn = 1 THEN SomeData END)
FROM (
SELECT SomeData,ROW_NUMBER() over(order by SomeID) % 2 rn
FROM t t1
) t1
Results:
| |
|---|
| 7 |

Related

Rolling Average in SQL with Partition [duplicate]

declare #t table
(
id int,
SomeNumt int
)
insert into #t
select 1,10
union
select 2,12
union
select 3,3
union
select 4,15
union
select 5,23
select * from #t
the above select returns me the following.
id SomeNumt
1 10
2 12
3 3
4 15
5 23
How do I get the following:
id srome CumSrome
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
select t1.id, t1.SomeNumt, SUM(t2.SomeNumt) as sum
from #t t1
inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.SomeNumt
order by t1.id
SQL Fiddle example
Output
| ID | SOMENUMT | SUM |
-----------------------
| 1 | 10 | 10 |
| 2 | 12 | 22 |
| 3 | 3 | 25 |
| 4 | 15 | 40 |
| 5 | 23 | 63 |
Edit: this is a generalized solution that will work across most db platforms. When there is a better solution available for your specific platform (e.g., gareth's), use it!
The latest version of SQL Server (2012) permits the following.
SELECT
RowID,
Col1,
SUM(Col1) OVER(ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
or
SELECT
GroupID,
RowID,
Col1,
SUM(Col1) OVER(PARTITION BY GroupID ORDER BY RowId ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Col2
FROM tablehh
ORDER BY RowId
This is even faster. Partitioned version completes in 34 seconds over 5 million rows for me.
Thanks to Peso, who commented on the SQL Team thread referred to in another answer.
For SQL Server 2012 onwards it could be easy:
SELECT id, SomeNumt, sum(SomeNumt) OVER (ORDER BY id) as CumSrome FROM #t
because ORDER BY clause for SUM by default means RANGE UNBOUNDED PRECEDING AND CURRENT ROW for window frame ("General Remarks" at https://msdn.microsoft.com/en-us/library/ms189461.aspx)
Let's first create a table with dummy data:
Create Table CUMULATIVESUM (id tinyint , SomeValue tinyint)
Now let's insert some data into the table;
Insert Into CUMULATIVESUM
Select 1, 10 union
Select 2, 2 union
Select 3, 6 union
Select 4, 10
Here I am joining same table (self joining)
Select c1.ID, c1.SomeValue, c2.SomeValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Order By c1.id Asc
Result:
ID SomeValue SomeValue
-------------------------
1 10 10
2 2 10
2 2 2
3 6 10
3 6 2
3 6 6
4 10 10
4 10 2
4 10 6
4 10 10
Here we go now just sum the Somevalue of t2 and we`ll get the answer:
Select c1.ID, c1.SomeValue, Sum(c2.SomeValue) CumulativeSumValue
From CumulativeSum c1, CumulativeSum c2
Where c1.id >= c2.ID
Group By c1.ID, c1.SomeValue
Order By c1.id Asc
For SQL Server 2012 and above (much better performance):
Select
c1.ID, c1.SomeValue,
Sum (SomeValue) Over (Order By c1.ID )
From CumulativeSum c1
Order By c1.id Asc
Desired result:
ID SomeValue CumlativeSumValue
---------------------------------
1 10 10
2 2 12
3 6 18
4 10 28
Drop Table CumulativeSum
A CTE version, just for fun:
;
WITH abcd
AS ( SELECT id
,SomeNumt
,SomeNumt AS MySum
FROM #t
WHERE id = 1
UNION ALL
SELECT t.id
,t.SomeNumt
,t.SomeNumt + a.MySum AS MySum
FROM #t AS t
JOIN abcd AS a ON a.id = t.id - 1
)
SELECT * FROM abcd
OPTION ( MAXRECURSION 1000 ) -- limit recursion here, or 0 for no limit.
Returns:
id SomeNumt MySum
----------- ----------- -----------
1 10 10
2 12 22
3 3 25
4 15 40
5 23 63
Late answer but showing one more possibility...
Cumulative Sum generation can be more optimized with the CROSS APPLY logic.
Works better than the INNER JOIN & OVER Clause when analyzed the actual query plan ...
/* Create table & populate data */
IF OBJECT_ID('tempdb..#TMP') IS NOT NULL
DROP TABLE #TMP
SELECT * INTO #TMP
FROM (
SELECT 1 AS id
UNION
SELECT 2 AS id
UNION
SELECT 3 AS id
UNION
SELECT 4 AS id
UNION
SELECT 5 AS id
) Tab
/* Using CROSS APPLY
Query cost relative to the batch 17%
*/
SELECT T1.id,
T2.CumSum
FROM #TMP T1
CROSS APPLY (
SELECT SUM(T2.id) AS CumSum
FROM #TMP T2
WHERE T1.id >= T2.id
) T2
/* Using INNER JOIN
Query cost relative to the batch 46%
*/
SELECT T1.id,
SUM(T2.id) CumSum
FROM #TMP T1
INNER JOIN #TMP T2
ON T1.id > = T2.id
GROUP BY T1.id
/* Using OVER clause
Query cost relative to the batch 37%
*/
SELECT T1.id,
SUM(T1.id) OVER( PARTITION BY id)
FROM #TMP T1
Output:-
id CumSum
------- -------
1 1
2 3
3 6
4 10
5 15
Select
*,
(Select Sum(SOMENUMT)
From #t S
Where S.id <= M.id)
From #t M
You can use this simple query for progressive calculation :
select
id
,SomeNumt
,sum(SomeNumt) over(order by id ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as CumSrome
from #t
There is a much faster CTE implementation available in this excellent post:
http://weblogs.sqlteam.com/mladenp/archive/2009/07/28/SQL-Server-2005-Fast-Running-Totals.aspx
The problem in this thread can be expressed like this:
DECLARE #RT INT
SELECT #RT = 0
;
WITH abcd
AS ( SELECT TOP 100 percent
id
,SomeNumt
,MySum
order by id
)
update abcd
set #RT = MySum = #RT + SomeNumt
output inserted.*
For Ex: IF you have a table with two columns one is ID and second is number and wants to find out the cumulative sum.
SELECT ID,Number,SUM(Number)OVER(ORDER BY ID) FROM T
Once the table is created -
select
A.id, A.SomeNumt, SUM(B.SomeNumt) as sum
from #t A, #t B where A.id >= B.id
group by A.id, A.SomeNumt
order by A.id
The SQL solution wich combines "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" and "SUM" did exactly what i wanted to achieve.
Thank you so much!
If it can help anyone, here was my case. I wanted to cumulate +1 in a column whenever a maker is found as "Some Maker" (example). If not, no increment but show previous increment result.
So this piece of SQL:
SUM( CASE [rmaker] WHEN 'Some Maker' THEN 1 ELSE 0 END)
OVER
(PARTITION BY UserID ORDER BY UserID,[rrank] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Cumul_CNT
Allowed me to get something like this:
User 1 Rank1 MakerA 0
User 1 Rank2 MakerB 0
User 1 Rank3 Some Maker 1
User 1 Rank4 Some Maker 2
User 1 Rank5 MakerC 2
User 1 Rank6 Some Maker 3
User 2 Rank1 MakerA 0
User 2 Rank2 SomeMaker 1
Explanation of above: It starts the count of "some maker" with 0, Some Maker is found and we do +1. For User 1, MakerC is found so we dont do +1 but instead vertical count of Some Maker is stuck to 2 until next row.
Partitioning is by User so when we change user, cumulative count is back to zero.
I am at work, I dont want any merit on this answer, just say thank you and show my example in case someone is in the same situation. I was trying to combine SUM and PARTITION but the amazing syntax "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" completed the task.
Thanks!
Groaker
Above (Pre-SQL12) we see examples like this:-
SELECT
T1.id, SUM(T2.id) AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < = T1.id
GROUP BY
T1.id
More efficient...
SELECT
T1.id, SUM(T2.id) + T1.id AS CumSum
FROM
#TMP T1
JOIN #TMP T2 ON T2.id < T1.id
GROUP BY
T1.id
Try this
select
t.id,
t.SomeNumt,
sum(t.SomeNumt) Over (Order by t.id asc Rows Between Unbounded Preceding and Current Row) as cum
from
#t t
group by
t.id,
t.SomeNumt
order by
t.id asc;
Try this:
CREATE TABLE #t(
[name] varchar NULL,
[val] [int] NULL,
[ID] [int] NULL
) ON [PRIMARY]
insert into #t (id,name,val) values
(1,'A',10), (2,'B',20), (3,'C',30)
select t1.id, t1.val, SUM(t2.val) as cumSum
from #t t1 inner join #t t2 on t1.id >= t2.id
group by t1.id, t1.val order by t1.id
Without using any type of JOIN cumulative salary for a person fetch by using follow query:
SELECT * , (
SELECT SUM( salary )
FROM `abc` AS table1
WHERE table1.ID <= `abc`.ID
AND table1.name = `abc`.Name
) AS cum
FROM `abc`
ORDER BY Name

How to divide one column multi row into different column in oracle?

I want to divide following column into two different column
Table x
ID
1
2
.
.
10
--Output Should be like this
A B
-- --
1 6
2 7
3 8
4 9
5 10
I tried this, but won't work
SELECT (SELECT * FROM x WHERE id BETWEEN 1 AND 5),
(SELECT * FROM x WHERE id BETWEEN 6 AND 10)
FROM dual;
Also used SUBSTR, that also won't work.
As you did not mention what exactly you want to achieve then i would have write simple query :
select case when id <= 5 then id end as col1,
case when id > 5 and id <= 10 then id end as col2
from table_name
You can achieve it using following considering that you have total ids in multiple of 2:
With cte as
(Select id, max(id) over() / 2 as mx from your_table)
Select t1.id as a, t2.id as b
From cte t1 join cte t2
On mod(t1.id, mx) = mod(t2.id, mx)
And t1.id <= mx and t2.id > mx
Order by t1.id
Cheers!!
I think you could also use the LEAD function to accomplish your task. Here is an example:
WITH x (val) AS
(
SELECT ROWNUM
FROM dual
CONNECT BY ROWNUM < 12
)
SELECT *
FROM (SELECT val AS A,
LEAD(val, (SELECT CEIL(COUNT(*)/2) FROM x)) OVER (ORDER BY val) AS B
FROM x
ORDER BY val)
WHERE ROWNUM <= (SELECT CEIL(COUNT(*)/2) FROM x);
You would just need to replace the WITH clause and its reference in the query with your actual table.
You can use row_number() and aggregation
select max(case when id <= 5 then id end) as col1,
max(case when id > 5 then id end) as col2
from (select x.*,
row_number() over (partition by case when id <= 5 then 1 else 2 end order by id) as seqnum
from x
) x
group by seqnum;

Any other alternative to write this SQL query

I need to select data base upon three conditions
Find the latest date (StorageDate Column) from the table for each record
See if there is more then one entry for date (StorageDate Column) found in first step for same ID (ID Column)
and then see if DuplicateID is = 2
So if table has following data:
ID |StorageDate | DuplicateTypeID
1 |2014-10-22 | 1
1 |2014-10-22 | 2
1 |2014-10-18 | 1
2 |2014-10-12 | 1
3 |2014-10-11 | 1
4 |2014-09-02 | 1
4 |2014-09-02 | 2
Then I should get following results
ID
1
4
I have written following query but it is really slow, I was wondering if anyone has better way to write it.
SELECT DISTINCT(TD.RecordID)
FROM dbo.MyTable TD
JOIN (
SELECT T1.RecordID, T2.MaxDate,COUNT(*) AS RecordCount
FROM MyTable T1 WITH (nolock)
JOIN (
SELECT RecordID, MAX(StorageDate) AS MaxDate
FROM MyTable WITH (nolock)
GROUP BY RecordID)T2
ON T1.RecordID = T2.RecordID AND T1.StorageDate = T2.MaxDate
GROUP BY T1.RecordID, T2.MaxDate
HAVING COUNT(*) > 1
)PT ON TD.RecordID = PT.RecordID AND TD.StorageDate = PT.MaxDate
WHERE TD.DuplicateTypeID = 2
Try this and see how the performance goes:
;WITH
tmp AS
(
SELECT *,
RANK() OVER (PARTITION BY ID ORDER BY StorageDate DESC) AS StorageDateRank,
COUNT(ID) OVER (PARTITION BY ID, StorageDate) AS StorageDateCount
FROM MyTable
)
SELECT DISTINCT ID
FROM tmp
WHERE StorageDateRank = 1 -- latest date for each ID
AND StorageDateCount > 1 -- more than 1 entry for date
AND DuplicateTypeID = 2 -- DuplicateTypeID = 2
You can use analytic function rank , can you try this query ?
Select recordId from
(
select *, rank() over ( partition by recordId order by [StorageDate] desc) as rn
from mytable
) T
where rn =1
group by recordId
having count(*) >1
and sum( case when duplicatetypeid =2 then 1 else 0 end) >=1

SQL - categorize rows

Below is the result set I am working with. What I would like is an additional column that identifies a X number of rows as the same. In my result set, rows 1-4 are the same (would like to mark as 1), rows 5-9 are the same (mark as 2); row 10 (mark as 3)
How is this possible using just SQL? I can't seem to do this using rank or dense_rank functions.
ranking diff bool
-------------------- ----------- -----------
1 0 0
2 0 0
3 0 0
4 0 0
5 54 1
6 0 0
7 0 0
8 0 0
9 0 0
10 62 1
In general case you can do something like this:
select
t.ranking, t.[diff], t.[bool],
dense_rank() over(order by c.cnt) as rnk
from Table1 as t
outer apply (
select count(*) as cnt
from Table1 as t2
where t2.ranking <= t.ranking and t2.[bool] = 1
) as c
In your case you can do it even without dense_rank():
select
t.ranking, t.[diff], t.[bool],
c.cnt + 1 as rnk
from Table1 as t
outer apply (
select count(*) as cnt
from Table1 as t2
where t2.ranking <= t.ranking and t2.[bool] = 1
) as c;
Unfortunately, in SQL Server 2008 you cannot do running total with window function, in SQL Server 2012 it'd be possible to do it with sum([bool]) over(order by ranking).
If you have really big number of rows and your ranking column is unique/primary key, you can use recursive cte approach - like one in this answer, it's fastest one in SQL Server 2008 R2:
;with cte as
(
select t.ranking, t.[diff], t.[bool], t.[bool] as rnk
from Table1 as t
where t.ranking = 1
union all
select t.ranking, t.[diff], t.[bool], t.[bool] + c.rnk as rnk
from cte as c
inner join Table1 as t on t.ranking = c.ranking + 1
)
select t.ranking, t.[diff], t.[bool], 1 + t.rnk
from cte as t
option (maxrecursion 0)
sql fiddle demo

how to detect a faulty sequence column with sql?

I have this table
ID | Seq
------------
A 1
A 2
A 3
B 1
B 2
B 3
B 3 <--duplicate seq where ID=B
C 1
C 2
C 4 <--missing seq id number 3
D 1
D 2
. .
. .
Is there a way to detect if/when there is an error in the logic of the Seq column, specifically if there are jumps and/or duplicates.
try this:
this should work both in sql server as well as Oracle
select ID,seq
from(
select ID,seq,
row_number() over (partition by id order by seq ) rn
from t_seq)a
where a.seq<>a.rn
SQL fiddle demo for SQL server
SQL Fiddle demo for Oracle
These are both SQL agnostic so should work in just about any rdbms.
This will check for a break in the sequence:
select t1.id, t1.seq
from t_seq t1
where
t1.seq <> 1
and not exists (
select *
from t_seq t2
where t2.id = t1.id
and t2.seq = t1. seq - 1
)
This will check for duplicates:
select t1.id, t1.seq
from mytable t1
group by t1.id, t1.se1
having count(*) > 1
To get the duplicates you can use the following T-SQL.
SELECT ID, Seq FROM MyTable GROUP BY ID, Seq HAVING COUNT(Seq) > 1
Edit
To find out the missing sequence numbers I have updated the code provided by njr101 to follows:
SELECT ID, Seq FROM MyTable t1 WHERE ID IN (
SELECT ID FROM MyTable
GROUP BY ID
HAVING COUNT(DISTINCT Seq) <> MAX(Seq)
) AND t1.seq <> 1 AND NOT EXISTS (
SELECT * FROM MyTable t2 WHERE t2.id=t1.id AND t2.seq = t1.seq - 1
)
ORDER BY ID
The first sub query counts the number of distinct rows for that ID (ignores duplicates). If that number is the same is the maximum number contained in the result set, the values should be fine for that ID. If it is not equal, the results will be available in the sub query.
The second part (with the help of njr101's query), filters the result set to only contain the last ID and seq where missing values should be inserted. Results below:
My Data
=========
A 1
A 2
A 3
A 20 <--- Missing (displayed in results)
B 1
B 2
B 3
B 3
B 4
C 1
C 2
C 4 <--- Missing (displayed in results)
C 5
C 15 <--- Missing (displayed in results)
C 16
Results
=======
A 20
C 4
C 15