SQL - categorize rows - sql

Below is the result set I am working with. What I would like is an additional column that identifies a X number of rows as the same. In my result set, rows 1-4 are the same (would like to mark as 1), rows 5-9 are the same (mark as 2); row 10 (mark as 3)
How is this possible using just SQL? I can't seem to do this using rank or dense_rank functions.
ranking diff bool
-------------------- ----------- -----------
1 0 0
2 0 0
3 0 0
4 0 0
5 54 1
6 0 0
7 0 0
8 0 0
9 0 0
10 62 1

In general case you can do something like this:
select
t.ranking, t.[diff], t.[bool],
dense_rank() over(order by c.cnt) as rnk
from Table1 as t
outer apply (
select count(*) as cnt
from Table1 as t2
where t2.ranking <= t.ranking and t2.[bool] = 1
) as c
In your case you can do it even without dense_rank():
select
t.ranking, t.[diff], t.[bool],
c.cnt + 1 as rnk
from Table1 as t
outer apply (
select count(*) as cnt
from Table1 as t2
where t2.ranking <= t.ranking and t2.[bool] = 1
) as c;
Unfortunately, in SQL Server 2008 you cannot do running total with window function, in SQL Server 2012 it'd be possible to do it with sum([bool]) over(order by ranking).
If you have really big number of rows and your ranking column is unique/primary key, you can use recursive cte approach - like one in this answer, it's fastest one in SQL Server 2008 R2:
;with cte as
(
select t.ranking, t.[diff], t.[bool], t.[bool] as rnk
from Table1 as t
where t.ranking = 1
union all
select t.ranking, t.[diff], t.[bool], t.[bool] + c.rnk as rnk
from cte as c
inner join Table1 as t on t.ranking = c.ranking + 1
)
select t.ranking, t.[diff], t.[bool], 1 + t.rnk
from cte as t
option (maxrecursion 0)
sql fiddle demo

Related

SQL Server - Sum of difference between rows in a table

I have a table in the format :
SomeID SomeData
1 3
2 7
3 9
4 10
5 14
6 16
. .
. .
I want to find sum of difference between rows in this table. i.e ( (7-3) + (10-9) + (16-14) + ....)
Which is the best way to do this
Using a self join along with the modulus:
SELECT SUM(t1.SomeData - t2.SomeData) AS total_diff
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.SomeID = t2.SomeID + 1
WHERE t1.SomeID % 2 = 0;
Demo
This answer assumes that the SomeID sequence in fact starts with 1 and increments by 1 with each subsequent row. If not, then we might be able to first apply ROW_NUMBER over SomeID and generate a 1 to N sequence.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY SomeID) rn
FROM yourTable
)
SELECT SUM(t1.SomeData - t2.SomeData) AS total_diff
FROM cte t1
INNER JOIN cte t2
ON t1.SomeID = t2.SomeID + 1
WHERE t1.rn % 2 = 0;
You can try to use ROW_NUMBER window function to make a serial number then MOD by 2 to get your expected group then use condition aggregate function.
Query 1:
SELECT SUM(CASE WHEN rn = 0 THEN SomeData END) - SUM(CASE WHEN rn = 1 THEN SomeData END)
FROM (
SELECT SomeData,ROW_NUMBER() over(order by SomeID) % 2 rn
FROM t t1
) t1
Results:
| |
|---|
| 7 |

ROW_Number with Custom Group

I am trying to have row_number based on custom grouping but I am not able to produce it.
Below is my Query
CREATE TABLE mytbl (wid INT, id INT)
INSERT INTO mytbl Values(1,1),(2,1),(3,0),(4,2),(5,3)
Current Output
wid id
1 1
2 1
3 0
4 2
5 3
Query
SELECT *, RANK() OVER(PARTITION BY wid, CASE WHEN id = 0 THEN 0 ELSE 1 END ORDER BY ID)
FROM mytbl
I would like to rank the rows based on custom condition like if ID is 0 then I have start new group until I have non 0 ID.
Expected Output
wid id RN
1 1 1
2 1 1
3 0 1
4 2 2
5 3 2
Guessing here, as we don't have much clarification, but perhaps this:
SELECT wid,
id,
COUNT(CASE id WHEN 0 THEN 1 END) OVER (ORDER BY wid ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) +1 AS [Rank]
FROM mytbl ;
If I understand you correctly, you may use the next approach. Note, that you need to have an ordering column (I assume this is wid column):
Statement:
;WITH ChangesCTE AS (
SELECT
*,
CASE WHEN LAG(id) OVER (ORDER BY wid) = 0 THEN 1 ELSE 0 END AS ChangeIndex
FROM mytbl
), GroupsCTE AS (
SELECT
*,
SUM(ChangeIndex) OVER (ORDER BY wid) AS GroupIndex
FROM ChangesCTE
)
SELECT
wid,
id,
DENSE_RANK() OVER (ORDER BY GroupIndex) AS Rank
FROM GroupsCTE
Result:
wid id Rank
1 1 1
2 1 1
3 0 1
4 2 2
5 3 2
without much clarification on the logic required, my understanding is you want to increase the Rank by 1 whenever id = 0
select wid, id,
[Rank] = sum(case when id = 0 then 1 else 0 end) over(order by wid)
+ case when id <> 0 then 1 else 0 end
from mytbl
Try this,
CREATE TABLE #mytbl (wid INT, id INT)
INSERT INTO #mytbl Values(1,1),(2,1),(3,0)
,(4,2),(5,3),(6,0),(7,4),(8,5),(9,6)
;with CTE as
(
select *,ROW_NUMBER()over(order by wid)rn
from #mytbl where id=0
)
,CTE1 as
(
select max(rn)+1 ExtraRN from CTE
)
select a.* ,isnull(ca.rn,ca1.ExtraRN) from #mytbl a
outer apply(select top 1 * from CTE b
where a.wid<=b.wid )ca
cross apply(select ExtraRN from CTE1)ca1
drop table #mytbl
Here both OUTER APPLY and CROSS APPLY will not increase cardianility estimate.It will always return only one rows.

Running total (COUNT) SQL Server

I currently have this result
ID Code
1 AAA12
2 F5
3 GOFK568
4 G77
5 JLKJ4
6 FOG0
Now what i want to do is to create a third column that keeps a running total for codes that are above 4 in length.
Now, i have this code that gives me the sum of the code with above 4 in length.
SELECT * ,
SUM(CASE WHEN LENGTH(CODE) > 4 THEN 1 ELSE 0 END) AS [Count]
FROM Table1;
But this gives me this result
ID Code Count
1 AAA12 3
I am looking for a result like this
ID Code Running_Total
1 AAA12 1
2 F5 1
3 GOFK568 2
4 G77 2
5 JLKJ4 3
6 FOG0 3
I was working on something similar to this
SELECT * ,
CASE WHEN LENGTH(CODE) > 4 THEN (SUM(Code) OVER (PARTITION BY ID)) ELSE END
AS [Count]
FROM Table1;
But it still doesn't give me a running total.
I have an SQL Fiddle page
http://sqlfiddle.com/#!9/2746c/18
Any help would be great
Put the case in the sum:
SELECT Table1.* ,
SUM(case when len(Code) > 4 then 1 else 0 end) OVER (order BY ID) as counted
FROM Table1;
In Sql Server 2012+ you can use Sum() Over(Order by) function
SELECT Sum(CASE WHEN Len(code) > 4 THEN 1 ELSE 0 END)
OVER(ORDER BY id)
FROM Yourtable
for older versions
SELECT *
FROM Yourtable a
CROSS apply (SELECT Count(*)
FROM Yourtable b
WHERE a.ID >= b.ID
AND Len(code) > 4) cs (runn)
ANSI SQL method
SELECT ID,Code,
(SELECT count(*)
FROM Yourtable b
WHERE a.ID >= b.ID and char_length(code) > 4) AS runn
FROM Yourtable a
There are some good and efficient answers here.
But in case you want to try different approach then try following query:
SELECT
t1.*,
(Select sum(r.cnt) from
(SELECT COUNT(t2.code) as cnt FROM table1 AS t2
WHERE t2.Id <= t1.Id
group by t2.code
having len(t11.code) > 4) r
) AS Count
FROM table1 AS t1;
Here is the DEMO
Hope it helps!

SQL Select first missing value in a series

I was asked the following: I have a table (lets call it tbl) and it has one column of type int (let call it num) and it had sequential numbers in it:
num
---
1
2
4
5
6
8
9
11
Now you need to write a query that returns the first missing number (in this example that answer would be 3).
here's my answer (works):
select top 1 (num + 1)
from tbl
where (num + 1) not in (select num from tbl)
after writing this, I was asked, what if tbl contained 10 million records - how would you improve performence (because obviously myinner query would cause a full table scan).
My thoughts were about an index in on the num field and doing a not exists. but I would love to hear some alternatives.
In SQL Server 2012+, I would simply use lead():
select num
from (select num, lead(num) over (order by num) as next_num
from tbl
) t
where next_num <> num + 1;
However, I suspect that your version would have the best performance if you have an index on tbl(num). The not exists version is worth testing:
select top 1 (num + 1)
from tbl t
where not exists (select 1 from tbl t2 where t2.num = t.num + 1);
The only issue is getting the first number. You are not guaranteed that the table is read "in order". So, this will return one number. With an index (or better yet, clustered index) on num, the following should be fast and guaranteed to return the first number:
select top 1 (num + 1)
from tbl t
where not exists (select 1 from tbl t2 where t2.num = t.num + 1)
order by num;
Here is another approach using ROW_NUMBER:
SQL Fiddle
;WITH CteRN AS(
SELECT *,
RN = num - ROW_NUMBER() OVER(ORDER BY num)
FROM tbl
)
SELECT TOP 1 num - RN
FROM CteRN
WHERE RN > 0
ORDER BY num ASC
With a proper INDEX on num, here is the stats under a million row test harness.
Original - NOT IN : CPU time = 296 ms, elapsed time = 289 ms
wewesthemenace : CPU time = 0 ms, elapsed time = 0 ms
notulysses(NOT EXISTS): CPU time = 687 ms, elapsed time = 679 ms.
All the above answers are accurate alternatively if we can make an table like date dimension that has all sequential number starting from 1 to n.
insert into sequenceNumber
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11 union all
select 12 union all
select 13
select * from sequenceNumber
RESULT:
1
2
3
4
5
6
7
8
9
10
11
12
13
Now we can use below query to find out missing number.
select a.num,b.num
from sequenceNumber a
left outer join tbl b on a.num = b.num
where b.num is null
Hope it will be useful.
You can try with not exists:
select top 1 t.num + 1
from tbl t
where not exists (select * from tbl where num = t.num + 1) order by t.num
SQLFiddle
Or use a row_number:
select top 1 t.r
from (
select num
, row_number() over (order by num) as r
from tbl) t
where t.r <> t.num
order by t.num
SQLFiddle
from my side also one example
-- test data
DECLARE #Sequence AS TABLE ( Num INT )
INSERT INTO #Sequence
( Num )
VALUES ( 1 ),
( 2 ),
( 4 ),
( 5 ),
( 6 ),
( 8 ),
( 9 ),
( 11 )
--Final query
SELECT TOP 1
S.RN AS [Missing]
FROM ( SELECT RN = ROW_NUMBER() OVER ( ORDER BY num )
FROM #Sequence
) AS S
LEFT JOIN #Sequence AS S2 ON S.RN = S2.Num
WHERE S2.Num IS NULL
ORDER BY S.RN

MS Sql Server, same column with a different row neighbors

I need a little help on a SQL query. I could not get the result that I wanted.
ID I10 H 10NS HNS CC NSCC
0 1 1 1 1 14 14
1 0 1 0 1 6 2
1 0 2 0 2 12 2
1 0 3 0 3 17 4
1 0 3 0 3 18 4
1 0 3 0 3 19 4
1 0 3 0 3 20 4
What I want to have is one from each ID with highest CC
For example,
ID I10 H 10NS HNS CC NSCC
0 1 1 1 1 14 14
1 0 3 0 3 20 4
I tried with this code:
SELECT a.ID, b.name, a.i10 as[i-10-index], a.h as[h-index], 10ns as[i-10-index based on non-self-citation], a.hns as [h-index based on non-self-citation],
max(a.[Citation Count]), (a.[Non-Self-Citation Count])
FROM tbl_lpNumerical as a
join tbl_lpAcademician as b
on a.ID= (b.ID-1)
GROUP BY a.ID, b.name, a.i10, a.h, a.10ns, a.hns,
a.[Non-Self-Citation Count]
order by a.ID desc
However, I could not get the desired results.
Thank you for your time.
You can simply get all the row where not exist another row with an higher CC
SELECT n.*
FROM tbl_lpNumerical n
WHERE NOT EXISTS ( SELECT 'b'
FROM tbl_lpNumerical n2
WHERE n2.ID = n.ID
AND n2.CC > n.CC
)
In SQL Server, you can use row_number() for this. Based on your sample data`, something like:
select sd.*
from (select sd.*, row_number() over (partition by id order by cc desc) as seqnum
from sampledata sd
) sd
where seqnum = 1;
I have no idea what your query has to do with the sample data. If it generates the data, then you can use a CTE:
with sampledata as (
<some query here>
)
select sd.*
from (select sd.*, row_number() over (partition by id order by cc desc) as seqnum
from sampledata sd
) sd
where seqnum = 1;
The following query will select a single row from each ID partition: the one with the highest CC value:
SELECT *
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CC DESC) AS rn
FROM mytable) t
WHERE t.rn = 1
If there can be multiple rows having the same CC max value and you want all of them selected, then you can replace ROW_NUMBER() with RANK().