SQL rank grouping variation - sql

I'm trying to achieve the following "rank" result given the original dataset composed by the column ID and CODE.
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 4
10 A 4
Using the RANK_DENSE instruction over the CODE column i get the following result (with the A code getting the same rank value also after "the break" between the rows)
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 1
10 A 1
Is it possible to achieve the results as shown in the first (example) table, with the A code changing rank when there is a separation between the group formed by id: 1-2-3 and the one formed by id: 9-10 without using a cursor?
Thanks

You want to find sequences of values and give them a rank. You can do this with a difference of row numbers approach. The following assigns a different number to each grouping:
select o.*, dense_rank() over (order by grp, code)
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o;
If you want the assignment in the same order as the original data, then you can order by the id, but that requires an additional window function:
select o.*, dense_rank() over (order by minid) as therank
from (select o.*, min(id) over (partition by grp, code) as minid
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o
) o;

SUM by if current is the same as previous row. Works from SQL Server 2012.
WITH CTE AS (
SELECT id, code,
CASE Code WHEN LAG(CODE) OVER (ORDER BY id) THEN 0 ELSE 1 END AS Diff
FROM Table1)
SELECT id, code, SUM(Diff) OVER (ORDER BY id) FROM CTE
Please also see similar question at How to make row numbering with ordering, partitioning and grouping

Related

Grouping of PARTITION BY / GROUP BY only until next section to obtain a list of sections

I have a table like this:
id
section
1
6
2
6
3
7
4
7
5
6
and would like to obtain a grouped list that says
section
section_nr
first_id
6
1
1
7
2
3
6
3
5
Using ROW_NUMBER twice I am able to obtain something close:
SELECT section, ROW_NUMBER() OVER (ORDER BY id) AS section_nr, id as first_id
FROM (
SELECT id, section, ROW_NUMBER() OVER (PARTITION BY section ORDER BY id) AS nr_within
FROM X
)
WHERE nr_within = 1
section
section_nr
first_id
6
1
1
7
2
3
... but of course the second section 6 is missing, since PARTITION BY groups all section=6 together. Is it somehow possible to only group until the next section?
More generally (regarding GROUP BY instead of PARTITION BY), is there a simple solution to group (1,1,2,2,1) to (1,2,1) instead of (1,2)?
This is a typical gaps and islands problem that can be solved like this:
with u as
(select id, section,
case when section = lag(section) over(order by id) then 0 else 1 end as grp
from X),
v as
(select id,
section,
sum(grp) over(order by id) as section_nr
from u)
select section,
section_nr,
min(id) as first_id
from v
group by section, section_nr;
Basically you keep tabs in a column where there is a change in section by comparing current section to section from the row above (ordered by id). Whenever there is a change, set this column to 1, when no change set it to 0. The rolling sum of this column will be the section number. Getting first_id is a simple matter of using group by.
Fiddle
That's a classic.
P.S.
If id is indeed a series of integers without gaps, we can use it instead of rn
select section
,row_number() over (order by min(id)) as section_nr
,min(id) as first_id
from (select id
,section
,row_number() over (order by id) as rn
,row_number() over (partition by section order by id) as rn_section
from X
)
group by section
,rn - rn_section
SECTION
SECTION_NR
FIRST_ID
6
1
1
7
2
3
6
3
5
Fiddle

increment if not same value of next column in SQL

I am trying to use the Row Number in SQL. However, it's not giving desired output.
Data :
ID Name Output should be
111 A 1
111 B 2
111 C 3
111 C 3
111 A 4
222 A 1
222 A 1
222 B 2
222 C 3
222 B 4
222 B 4
This is a gaps-and-islands problem. As a starter: for the question to just make sense, you need a column that defines the ordering of the rows - I assumed ordering_id. Then, I would recommend lag() to get the "previous" name, and a cumulative sum() that increases everytime the name changes in adjacent rows:
select id, name,
sum(case when name = lag_name then 0 else 1 end) over(partition by id order by ordering_id) as rn
from (
select t.*, lag(name) over(partition by id order by ordering_id) lag_name
from mytable t
) t
SQL Server 2008 makes this much trickier. You can identify the adjacent rows using a difference of rows numbers. Then you can assign the minimum id in each island and use dense_rank():
select t.*,
dense_rank() over (partition by name order by min_ordcol) as output
from (select t.*,
min(<ordcol>) over (partition by name, seqnum - seqnum_2) as min_ordcol
from (select t.*,
row_number() over (partition by name order by <ordcol>) as seqnum,
row_number() over (partition by name, id order by <ordcol>) as seqnum_2
from t
) t
) t;

Group by data based with same group occuring multiple times

Input data
id group
1 a
1 a
1 b
1 b
1 a
1 a
1 a
expected result
id group row_number
1 a 1
1 a 1
1 b 2
1 b 2
1 a 4
1 a 4
1 a 4
I require the rwo_number based on the above result. If the same group occurring the second time generates different row_number for that? I have one more column sequence of date top to end.
This is an example of a gaps-and-islands problem. Solving it, though, requires that the data be ordered -- and SQL tables represent unordered sets.
Let me assume you have such a column. Then the difference of row numbers can be used:
select t.*,
dense_rank() over (partition by id order by grp, (seqnum - seqnum_g)) as grouping
from (select t.*,
row_number() over (partition by id order by ?) as seqnum,
row_number() over (partition by id, grp order by ?) as seqnum_g
from t
) t;
This does not produce the values that you specifically request, but it does identify each group.

Select TOP 2 values for each group

I'm having problem with getting only TOP 2 values for each group (groups are in column).
Example :
ID Group Value
1 A 30
2 A 150
3 A 40
4 A 70
5 B 0
6 B 100
7 B 90
I expect my output to be
ID Group Value
1 A 150
2 A 70
3 B 100
4 B 90
Simply, for each group I want just 2 rows with the highest Value
Most databases support the ANSI standard row_number() function. You would use it as:
select group, value
from (select t.*,
row_number() over (partition by group order by value desc) as seqnum
from t
) t
where seqnum <= 2;
To set the id you can use row_number() in the outer query:
select row_number() over (order by group, value) as id,
group, value
from (select t.*,
row_number() over (partition by group order by value desc) as seqnum
from t
) t
where seqnum <= 2;
However, changing the id seems suspicious.
You can use CTE with rank function ROW_NUMBER() .
Here is query to get your result.
;WITH cte AS
( SELECT Group, value,
ROW_NUMBER() OVER (PARTITION BY Group ORDER BY value DESC) AS rn
FROM test
)
SELECT Group, value FROM cte
WHERE rn <= 2
ORDER BY value

SQL grouping based on order and value

I have a table
loctype order
ACUTE 1
ACUTE 2
COM 3
COM 4
ACUTE 5
COM 6
I want a query that will apply rankings to groups in order, so my desired outcome is:
loctype order group_order
ACUTE 1 1
ACUTE 2 1
COM 3 2
COM 4 2
ACUTE 5 3
COM 6 4
Is there a way to do this as a SQL query without resorting to cursors?
One method for achieving this is a difference of row_number() to identify the groups and then dense_rank() on the minimum value. The code looks like:
select t.*, dense_rank(minid) over (order by minid) as group_order
from (select t.*, min(id) over (partition by loctype, grp) as minid
from (select t.*
(row_number() over (order by [order]) -
row_number() over (partition by loctype order by [order])
) as grp
from t
) t
) t;
Another method (for SQL Server 2012+) is to use lag() with a cumulative sum:
select t.*,
sum(case when loctype = prev_loctype then 0 else 1 end) over
(order by id) as group_order
from (select t.*, lag(loctype) over (order by id) as prev_loctype
from t
) t
I tried the given solution for SQL Server 2008 (that's what I have to work with). Unfortunately it didn't give quite the correct results, however working from Gordon's example, I came up with this, which does give exactly the desired result.
SELECT
*
FROM
(
SELECT
*,
DENSE_RANK() over(order by (SELECT ISNULL(MAX(#tmp.[order]),0) FROM #tmp WHERE #tmp.[order]<t.[order] AND #tmp.loctype <> t.loctype)) as intorder
FROM
#tmp AS t
) AS u
This gives
loctype order group_order
ACUTE 1 1
ACUTE 2 1
COM 3 2
COM 4 2
ACUTE 5 3
COM 6 4
Essentially it hides an initial ordering inside the DENSE_RANK(). Without the DENSE_RANK() it looks like this:
SELECT
*
FROM
(
SELECT
*,
(SELECT ISNULL(MAX(#tmp.[order]),0) FROM #tmp WHERE #tmp.[order] t.loctype) as intgroup
FROM
#tmp AS t
) AS u
And gives this result:
loctype order intgroup
ACUTE 1 0
ACUTE 2 0
COM 3 2
COM 4 2
ACUTE 5 4
COM 6 5
The interim group order can then be DENSE_RANKed to give the desired outcome.