Sql ranking groups based on a values in a field - sql

I have a table in which one column's values will be like this.
C
H
C
H
H
H
H
C
H
H
H
It will have one "C" and one or more "H" records following.
I am trying to give a group number for each set of "C" and one or more "H".
C 1
H 1
C 2
H 2
H 2
H 2
H 2
C 3
H 3
H 3
H 3
I don't want to use cursors for the fear of performance. How can I give a unique no for each subset of "C" and one or more "H" records?

As long as your data is clean and consistent, this isn't too hard on platforms that support standard SQL window functions. You do need another column that you can meaningfully order by.
Let's build it up one piece at a time. (Written in PostgreSQL 9.3.)
create table test (
test_id serial primary key,
test_val char(1)
);
insert into test(test_val) values
('C'), ('H'),
('C'),('H'),('H'),('H'),('H'),
('C'),('H'),('H'),('H');
We can tell when a group starts by looking at the next row.
select test_id, test_val,
lead(test_val) over (order by test_id) next_test_val
from test;
First three rows from that query.
test_id test_val next_test_val
--
1 C H
2 H C
3 C H
...
By checking for the "C" then "H" combination, we can identify the start of a group. (The previous query becomes a common table expression.)
with next_vals as (
select test_id, test_val,
lead(test_val) over (order by test_id) next_test_val
from test
)
select *, case when test_val = 'C' and next_test_val = 'H' then test_id
end as grp
from next_vals;
Here are the first four rows from that result set. The id numbers are convenient for identifying a group.
test_id test_val next_test_val grp
--
1 C H 1
2 H C
3 C H 3
4 H H
...
Another window function fills in the blanks. Again, the previous query becomes a CTE. The WHERE clause guards against a "C" row followed by another "C" row.
with next_vals as (
select test_id, test_val,
lead(test_val) over (order by test_id) next_test_val
from test
), group_starts as (
select *
, case when test_val = 'C' and next_test_val = 'H' then test_id
end as grp
from next_vals
)
select test_id, test_val, max(grp) over (order by test_id) as ch_group
from group_starts
where not (test_val = 'C' and next_test_val = 'C')
order by test_id;
test_id test_val ch_group
--
1 C 1
2 H 1
3 C 3
4 H 3
5 H 3
6 H 3
7 H 3
8 C 8
9 H 8
10 H 8
11 H 8
I added some line breaks to make it easier to read.
I don't know whether this will perform better than a cursor.
For sequential group numbers . . .
with next_vals as (
select test_id, test_val,
lead(test_val) over (order by test_id) next_test_val
from test
), group_starts as (
select *
, case when test_val = 'C' and next_test_val = 'H' then test_id
end as grp
from next_vals
), grouped_values as (
select test_id, test_val, max(grp) over (order by test_id) as ch_group
from group_starts
where not (test_val = 'C' and next_test_val = 'C')
)
select test_id, test_val,
dense_rank() over (order by ch_group)
from grouped_values
order by test_id;

Here is one possible solution that works in MS SQL Server 2008, which doesn't have LEAD function (it was added in later versions). Also, this solution numbers groups sequentially without gaps, as shown in the desired output.
It uses only ROW_NUMBER() function and CROSS APPLY.
It is necessary to have ID column that uniquely identifies each row and which we can use to sort results by.
Create a test table with sample data:
DECLARE #TT TABLE (ID int IDENTITY(1,1) PRIMARY KEY, Val char(1));
INSERT INTO #TT VALUES('C');
INSERT INTO #TT VALUES('H');
INSERT INTO #TT VALUES('C');
INSERT INTO #TT VALUES('H');
INSERT INTO #TT VALUES('H');
INSERT INTO #TT VALUES('H');
INSERT INTO #TT VALUES('H');
INSERT INTO #TT VALUES('C');
INSERT INTO #TT VALUES('H');
INSERT INTO #TT VALUES('H');
INSERT INTO #TT VALUES('H');
Get a list of all rows with C value. Each group starts with C, so there will be so many groups as we have Cs in the data. It doesn't matter what other values there are in this column, there can be other values as well, not just H. There is no hard-coded H in the query, only C.
WITH
CTE_C
AS
(
SELECT ID, Val, ROW_NUMBER() OVER(ORDER BY ID) AS rn
FROM #TT AS T
WHERE Val = 'C'
)
The output of this CTE is: (SELECT * FROM CTE_C)
ID Val rn
1 C 1
3 C 2
8 C 3
Now all we need is for each row of the original data find a suitable row in the CTE, that has a suitable ID and consequently suitable rn. We use CROSS APPLY for it.
WITH
CTE_C
AS
(
SELECT ID, Val, ROW_NUMBER() OVER(ORDER BY ID) AS rn
FROM #TT AS T
WHERE Val = 'C'
)
SELECT T.ID, T.Val, CTE_rn.rn
FROM
#TT AS T
CROSS APPLY
(
SELECT TOP(1) CTE_C.rn
FROM CTE_C
WHERE CTE_C.ID <= T.ID
ORDER BY CTE_C.ID DESC
) AS CTE_rn
ORDER BY T.ID;
This is the final result:
ID Val rn
1 C 1
2 H 1
3 C 2
4 H 2
5 H 2
6 H 2
7 H 2
8 C 3
9 H 3
10 H 3
11 H 3
In terms of performance you need to test various solutions with your actual data and your actual system. ID should have unique index. Most likely an index on Val would be beneficial as well.

Related

Reorder the rows of a table according to the numbers of similar cells in a specific column using SQL

I have a table like this:
D
S
2
1
2
3
4
2
4
3
4
5
6
1
in which the code of symptoms(S) of three diseases(D) are shown. I want to rearrange this table (D-S) such that the diseases with more symptoms come up i.e. order it by decreasing the numbers of symptoms as below:
D
S
4
2
4
3
4
5
2
1
2
3
6
1
Can anyone help me to write a SQL code for it in SQL server?
I had tried to do this as the following but this doesn't work:
SELECT *
FROM (
select D, Count(S) cnt
from [D-S]
group by D
) Q
order by Q.cnt desc
select
D,
S
from
D-S
order by
count(*) over(partition by D) desc,
D,
S;
Two easy ways to approach this:
--==== Sample Data
DECLARE #t TABLE (D INT, S INT);
INSERT #t VALUES(2,1),(2,3),(4,2),(4,3),(4,5),(6,1);
--==== Using Window Function
SELECT t.D, t.S
FROM (SELECT t.*, Rnk = COUNT(*) OVER (PARTITION BY t.D) FROM #t AS t) AS t
ORDER BY t.Rnk DESC;
--==== Using standard GROUP BY
SELECT t.*
FROM #t AS t
JOIN
(
SELECT t2.D, Cnt = COUNT(*)
FROM #t AS t2
GROUP BY t2.D
) AS t2 ON t.D = t2.D
ORDER BY t2.Cnt DESC;
Results:
D S
----------- -----------
4 2
4 3
4 5
2 1
2 3
6 1

Perform ranking depend on category

I Have a table looks like this:
RowNum category Rank4A Rank4B
-------------------------------------------
1 A
2 A
3 B
5 A
6 B
9 B
My requirement is based on the RowNum order, Make two new ranking columns depend on category. Rank4A works like the DENSERANK() by category = A, but if the row is for category B, it derives the latest appeared rank for category A order by RowNum. Rank4B have similar logic, but it orders by RowNum in DESC order. So the result would like this (W means this cell I don't care its value):
RowNum category Rank4A Rank4B
-------------------------------------------
1 A 1 W
2 A 2 W
3 B 2 3
5 A 3 2
6 B W 2
9 B W 1
One more additional requirement is that CROSS APPLY or CURSOR is not allowed due to dataset being large. Any neat solutions?
Edit: Also no CTE (due to MAX 32767 limit)
You can use the following query:
SELECT RowNum, category,
SUM(CASE
WHEN category = 'A' THEN 1
ELSE 0
END) OVER (ORDER BY RowNum) AS Rank4A,
SUM(CASE
WHEN category = 'B' THEN 1
ELSE 0
END) OVER (ORDER BY RowNum DESC) AS Rank4B
FROM mytable
ORDER BY RowNum
Giorgos Betsos' answer is better, please read it first.
Try this out. I believe each CTE is clear enough to show the steps.
IF OBJECT_ID('tempdb..#Data') IS NOT NULL
DROP TABLE #Data
CREATE TABLE #Data (
RowNum INT,
Category CHAR(1))
INSERT INTO #Data (
RowNum,
Category)
VALUES
(1, 'A'),
(2, 'A'),
(3, 'B'),
(5, 'A'),
(6, 'B'),
(9, 'B')
;WITH AscendentDenseRanking AS
(
SELECT
D.RowNum,
D.Category,
AscendentDenseRanking = DENSE_RANK() OVER (ORDER BY D.Rownum ASC)
FROM
#Data AS D
WHERE
D.Category = 'A'
),
LaggedRankingA AS
(
SELECT
D.RowNum,
AscendentDenseRankingA = MAX(A.AscendentDenseRanking)
FROM
#Data AS D
INNER JOIN AscendentDenseRanking AS A ON D.RowNum > A.RowNum
WHERE
D.Category = 'B'
GROUP BY
D.RowNum
),
DescendantDenseRanking AS
(
SELECT
D.RowNum,
D.Category,
DescendantDenseRanking = DENSE_RANK() OVER (ORDER BY D.Rownum DESC)
FROM
#Data AS D
WHERE
D.Category = 'B'
),
LaggedRankingB AS
(
SELECT
D.RowNum,
AscendentDenseRankingB = MAX(A.DescendantDenseRanking)
FROM
#Data AS D
INNER JOIN DescendantDenseRanking AS A ON D.RowNum < A.RowNum
WHERE
D.Category = 'A'
GROUP BY
D.RowNum
)
SELECT
D.RowNum,
D.Category,
Rank4A = ISNULL(RA.AscendentDenseRanking, LA.AscendentDenseRankingA),
Rank4B = ISNULL(RB.DescendantDenseRanking, LB.AscendentDenseRankingB)
FROM
#Data AS D
LEFT JOIN AscendentDenseRanking AS RA ON D.RowNum = RA.RowNum
LEFT JOIN LaggedRankingA AS LA ON D.RowNum = LA.RowNum
LEFT JOIN DescendantDenseRanking AS RB ON D.RowNum = RB.RowNum
LEFT JOIN LaggedRankingB AS LB ON D.RowNum = LB.RowNum
/*
Results:
RowNum Category Rank4A Rank4B
----------- -------- -------------------- --------------------
1 A 1 3
2 A 2 3
3 B 2 3
5 A 3 2
6 B 3 2
9 B 3 1
*/
This isn't a recursive CTE, so the limit 32k doesn't apply.

Find the Row count of each value

I have table like
Name
A
B
B
C
C
C
A
A
B
B
I need Query to return output like
Name count
A 1
B 2
C 3
A 2
B 2
I tried with rank(),dense_Rank().but i am not able to get output
In order to group the names and get the counts of names in every separated group, a simple count function with a group by or window functions does not solve your problem, I prefer to use two helper fields one for row number, the other holds a value for group number, you'll iterate through your table and increase the value of group field for the next name if it's not the same as current:
Assuming your table is:
create table tblN (Name varchar(10))
insert into tblN values
('A'),
('B'),
('B'),
('C'),
('C'),
('C'),
('A'),
('A'),
('B'),
('B');
following query is for above explanation:
;with cte1 as(
select 1 gp,name -- add gp for group number
from tblN
),
cte2 as(
select gp,name,
row_number() over(order by gp) rn --add rn for evaluating groups
from cte1
),
cte3 as(
select gp,name,rn from cte2 where rn=1
union all
select case --evaluate groups
when c2.name=c3.name then c3.gp
else c3.gp+1
end gp,
c2.name,c2.rn
from cte3 c3
join cte2 c2 on c3.rn+1=c2.rn
)
select gp,name from cte3 --[1]
Result:
gp name
1 A
2 B
2 B
3 C
3 C
3 C
4 A
4 A
5 B
5 B
now in above query instead of line [1] just use below query:
select name , count from(
select top 1000 gp,name,
count(name) count
from cte3
group by gp,name
order by gp) q
Result:
name count
A 1
B 2
C 3
A 2
B 2
you can achieve your requirement very easy with following query
SELECT * FROM (SELECT ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY NAME) AS COUNT,NAME FROM TABLENAME )T ORDER BY 1,2

SELECT records until new value SQL

I have a table
Val | Number
08 | 1
09 | 1
10 | 1
11 | 3
12 | 0
13 | 1
14 | 1
15 | 1
I need to return the last values where Number = 1 (however many that may be) until Number changes, but do not need the first instances where Number = 1. Essentially I need to select back until Number changes to 0 (15, 14, 13)
Is there a proper way to do this in MSSQL?
Based on following:
I need to return the last values where Number = 1
Essentially I need to select back until Number changes to 0 (15, 14,
13)
Try (Fiddle demo ):
select val, number
from T
where val > (select max(val)
from T
where number<>1)
EDIT: to address all possible combinations (Fiddle demo 2)
;with cte1 as
(
select 1 id, max(val) maxOne
from T
where number=1
),
cte2 as
(
select 1 id, isnull(max(val),0) maxOther
from T
where val < (select maxOne from cte1) and number<>1
)
select val, number
from T cross join
(select maxOne, maxOther
from cte1 join cte2 on cte1.id = cte2.id
) X
where val>maxOther and val<=maxOne
I think you can use window functions, something like this:
with cte as (
-- generate two row_number to enumerate distinct groups
select
Val, Number,
row_number() over(partition by Number order by Val) as rn1,
row_number() over(order by Val) as rn2
from Table1
), cte2 as (
-- get groups with Number = 1 and last group
select
Val, Number,
rn2 - rn1 as rn1, max(rn2 - rn1) over() as rn2
from cte
where Number = 1
)
select Val, Number
from cte2
where rn1 = rn2
sql fiddle demo
DEMO: http://sqlfiddle.com/#!3/e7d54/23
DDL
create table T(val int identity(8,1), number int)
insert into T values
(1),(1),(1),(3),(0),(1),(1),(1),(0),(2)
DML
; WITH last_1 AS (
SELECT Max(val) As val
FROM t
WHERE number = 1
)
, last_non_1 AS (
SELECT Coalesce(Max(val), -937) As val
FROM t
WHERE EXISTS (
SELECT val
FROM last_1
WHERE last_1.val > t.val
)
AND number <> 1
)
SELECT t.val
, t.number
FROM t
CROSS
JOIN last_1
CROSS
JOIN last_non_1
WHERE t.val <= last_1.val
AND t.val > last_non_1.val
I know it's a little verbose but I've deliberately kept it that way to illustrate the methodolgy.
Find the highest val where number=1.
For all values where the val is less than the number found in step 1, find the largest val where the number<>1
Finally, find the rows that fall within the values we uncovered in steps 1 & 2.
select val, count (number) from
yourtable
group by val
having count(number) > 1
The having clause is the key here, giving you all the vals that have more than one value of 1.
This is a common approach for getting rows until some value changes. For your specific case use desc in proper spots.
Create sample table
select * into #tmp from
(select 1 as id, 'Alpha' as value union all
select 2 as id, 'Alpha' as value union all
select 3 as id, 'Alpha' as value union all
select 4 as id, 'Beta' as value union all
select 5 as id, 'Alpha' as value union all
select 6 as id, 'Gamma' as value union all
select 7 as id, 'Alpha' as value) t
Pull top rows until value changes:
with cte as (select * from #tmp t)
select * from
(select cte.*, ROW_NUMBER() over (order by id) rn from cte) OriginTable
inner join
(
select cte.*, ROW_NUMBER() over (order by id) rn from cte
where cte.value = (select top 1 cte.value from cte order by cte.id)
) OnlyFirstValueRecords
on OriginTable.rn = OnlyFirstValueRecords.rn and OriginTable.id = OnlyFirstValueRecords.id
On the left side we put an original table. On the right side we put only rows whose value is equal to the value in first line.
Records in both tables will be same until target value changes. After line #3 row numbers will get different IDs associated because of the offset and will never be joined with original table:
LEFT RIGHT
ID Value RN ID Value RN
1 Alpha 1 | 1 Alpha 1
2 Alpha 2 | 2 Alpha 2
3 Alpha 3 | 3 Alpha 3
----------------------- result set ends here
4 Beta 4 | 5 Alpha 4
5 Alpha 5 | 7 Alpha 5
6 Gamma 6 |
7 Alpha 7 |
The ID must be unique. Ordering by this ID must be same in both ROW_NUMBER() functions.

Getting all consecutive rows differing by certain value?

I am trying to get my head around doing this as it involves comparison of consecutive rows. I am trying to group values that differ by a certain number. For instance, let us say I have this table:
CREATE TABLE #TEMP (A int, B int)
-- Sample table
INSERT INTO #TEMP VALUES
(3,1),
(3,2),
(3,3),
(3,4),
(5,1),
(6,1),
(7,2),
(8,3),
(8,4),
(8,5),
(8,6)
SELECT * FROM #TEMP
DROP TABLE #TEMP
And let us say I have to group all values that differ by 1 having the same value for A. Then I am trying to get an output like this:
A B GroupNo
3 1 1
3 2 1
3 3 1
3 4 1
5 1 2
6 1 3
7 2 4
8 3 5
8 4 5
8 5 5
8 6 5
(3,1) (3,2) (3,3) (3,4) and (8,3) (8,4) (8,5) (8,6) have been put into the same group because they differ by a value 1. I will first show my attempt:
CREATE TABLE #TEMP (A int, B int)
-- Sample table
INSERT INTO #TEMP VALUES
(3,1), (3,2), (3,3), (3,4), (5,1), (6,1), (7,2),
(8,3), (8,4), (8,5), (8,6)
-- Assign row numbers and perform a left join
-- so that we can compare consecutive rows
SELECT ROW_NUMBER() OVER (ORDER BY A ASC) ID, *
INTO #TEMP2
FROM #TEMP
;WITH CTE AS
(
SELECT X.A XA, X.B XB, Y.A YA, Y.B YB
FROM #TEMP2 X
LEFT JOIN #TEMP2 Y
ON X.ID = Y.ID - 1
WHERE X.A = Y.A AND
X.B = Y.B - 1
)
SELECT XA, XB
INTO #GROUPS
FROM CTE
UNION
SELECT YA, YB
FROM CTE
ORDER BY XA ASC
-- Finally assign group numbers
SELECT X.XA, X.XB, Y.GID
FROM #GROUPS X
INNER JOIN
(SELECT XA, ROW_NUMBER() OVER (ORDER BY XA ASC) GID
FROM #GROUPS Y
GROUP BY XA
) Y
ON X.XA = Y.XA
DROP TABLE #TEMP
DROP TABLE #TEMP2
DROP TABLE #GROUPS
I will be doing this on a large table (about 30 million rows) so I was hoping there is a better way of doing this for arbitrary values (for instance, not just differing by 1, but it could be 2 or 3 which I will incorporate later into a procedure). Any suggestions on whether my approach is bug-free and if it can be improved?
For the case where they differ by one you can use
;WITH T AS
(
SELECT *,
B - DENSE_RANK() OVER (PARTITION BY A ORDER BY B) AS Grp
FROM #TEMP
)
SELECT A,
B,
DENSE_RANK() OVER (ORDER BY A,Grp) AS GroupNo
FROM T
ORDER BY A, Grp
And more generally
DECLARE #Interval INT = 2
;WITH T AS
(
SELECT *,
B/#Interval - DENSE_RANK() OVER (PARTITION BY A, B%#Interval ORDER BY B) AS Grp
FROM #TEMP
)
SELECT A,
B,
DENSE_RANK() OVER (ORDER BY A, B%#Interval,Grp) AS GroupNo
FROM T
ORDER BY A, GroupNo
declare #Diff int = 1
;with C as
(
select A,
B,
row_number() over(partition by A order by B) as rn
from #TEMP
),
R as
(
select C.A,
C.B,
1 as G,
C.rn
from C
where C.rn = 1
union all
select C.A,
C.B,
G + case when C.B-R.B <= #Diff
then 0
else 1
end,
C.rn
from C
inner join R
on R.rn + 1 = C.rn and
R.A = C.A
)
select A,
B,
dense_rank() over(order by A, G) as G
from R
order by A, G