SQL grouping based on order and value - sql

I have a table
loctype order
ACUTE 1
ACUTE 2
COM 3
COM 4
ACUTE 5
COM 6
I want a query that will apply rankings to groups in order, so my desired outcome is:
loctype order group_order
ACUTE 1 1
ACUTE 2 1
COM 3 2
COM 4 2
ACUTE 5 3
COM 6 4
Is there a way to do this as a SQL query without resorting to cursors?

One method for achieving this is a difference of row_number() to identify the groups and then dense_rank() on the minimum value. The code looks like:
select t.*, dense_rank(minid) over (order by minid) as group_order
from (select t.*, min(id) over (partition by loctype, grp) as minid
from (select t.*
(row_number() over (order by [order]) -
row_number() over (partition by loctype order by [order])
) as grp
from t
) t
) t;
Another method (for SQL Server 2012+) is to use lag() with a cumulative sum:
select t.*,
sum(case when loctype = prev_loctype then 0 else 1 end) over
(order by id) as group_order
from (select t.*, lag(loctype) over (order by id) as prev_loctype
from t
) t

I tried the given solution for SQL Server 2008 (that's what I have to work with). Unfortunately it didn't give quite the correct results, however working from Gordon's example, I came up with this, which does give exactly the desired result.
SELECT
*
FROM
(
SELECT
*,
DENSE_RANK() over(order by (SELECT ISNULL(MAX(#tmp.[order]),0) FROM #tmp WHERE #tmp.[order]<t.[order] AND #tmp.loctype <> t.loctype)) as intorder
FROM
#tmp AS t
) AS u
This gives
loctype order group_order
ACUTE 1 1
ACUTE 2 1
COM 3 2
COM 4 2
ACUTE 5 3
COM 6 4
Essentially it hides an initial ordering inside the DENSE_RANK(). Without the DENSE_RANK() it looks like this:
SELECT
*
FROM
(
SELECT
*,
(SELECT ISNULL(MAX(#tmp.[order]),0) FROM #tmp WHERE #tmp.[order] t.loctype) as intgroup
FROM
#tmp AS t
) AS u
And gives this result:
loctype order intgroup
ACUTE 1 0
ACUTE 2 0
COM 3 2
COM 4 2
ACUTE 5 4
COM 6 5
The interim group order can then be DENSE_RANKed to give the desired outcome.

Related

Grouping of PARTITION BY / GROUP BY only until next section to obtain a list of sections

I have a table like this:
id
section
1
6
2
6
3
7
4
7
5
6
and would like to obtain a grouped list that says
section
section_nr
first_id
6
1
1
7
2
3
6
3
5
Using ROW_NUMBER twice I am able to obtain something close:
SELECT section, ROW_NUMBER() OVER (ORDER BY id) AS section_nr, id as first_id
FROM (
SELECT id, section, ROW_NUMBER() OVER (PARTITION BY section ORDER BY id) AS nr_within
FROM X
)
WHERE nr_within = 1
section
section_nr
first_id
6
1
1
7
2
3
... but of course the second section 6 is missing, since PARTITION BY groups all section=6 together. Is it somehow possible to only group until the next section?
More generally (regarding GROUP BY instead of PARTITION BY), is there a simple solution to group (1,1,2,2,1) to (1,2,1) instead of (1,2)?
This is a typical gaps and islands problem that can be solved like this:
with u as
(select id, section,
case when section = lag(section) over(order by id) then 0 else 1 end as grp
from X),
v as
(select id,
section,
sum(grp) over(order by id) as section_nr
from u)
select section,
section_nr,
min(id) as first_id
from v
group by section, section_nr;
Basically you keep tabs in a column where there is a change in section by comparing current section to section from the row above (ordered by id). Whenever there is a change, set this column to 1, when no change set it to 0. The rolling sum of this column will be the section number. Getting first_id is a simple matter of using group by.
Fiddle
That's a classic.
P.S.
If id is indeed a series of integers without gaps, we can use it instead of rn
select section
,row_number() over (order by min(id)) as section_nr
,min(id) as first_id
from (select id
,section
,row_number() over (order by id) as rn
,row_number() over (partition by section order by id) as rn_section
from X
)
group by section
,rn - rn_section
SECTION
SECTION_NR
FIRST_ID
6
1
1
7
2
3
6
3
5
Fiddle

increment if not same value of next column in SQL

I am trying to use the Row Number in SQL. However, it's not giving desired output.
Data :
ID Name Output should be
111 A 1
111 B 2
111 C 3
111 C 3
111 A 4
222 A 1
222 A 1
222 B 2
222 C 3
222 B 4
222 B 4
This is a gaps-and-islands problem. As a starter: for the question to just make sense, you need a column that defines the ordering of the rows - I assumed ordering_id. Then, I would recommend lag() to get the "previous" name, and a cumulative sum() that increases everytime the name changes in adjacent rows:
select id, name,
sum(case when name = lag_name then 0 else 1 end) over(partition by id order by ordering_id) as rn
from (
select t.*, lag(name) over(partition by id order by ordering_id) lag_name
from mytable t
) t
SQL Server 2008 makes this much trickier. You can identify the adjacent rows using a difference of rows numbers. Then you can assign the minimum id in each island and use dense_rank():
select t.*,
dense_rank() over (partition by name order by min_ordcol) as output
from (select t.*,
min(<ordcol>) over (partition by name, seqnum - seqnum_2) as min_ordcol
from (select t.*,
row_number() over (partition by name order by <ordcol>) as seqnum,
row_number() over (partition by name, id order by <ordcol>) as seqnum_2
from t
) t
) t;

Group by data based with same group occuring multiple times

Input data
id group
1 a
1 a
1 b
1 b
1 a
1 a
1 a
expected result
id group row_number
1 a 1
1 a 1
1 b 2
1 b 2
1 a 4
1 a 4
1 a 4
I require the rwo_number based on the above result. If the same group occurring the second time generates different row_number for that? I have one more column sequence of date top to end.
This is an example of a gaps-and-islands problem. Solving it, though, requires that the data be ordered -- and SQL tables represent unordered sets.
Let me assume you have such a column. Then the difference of row numbers can be used:
select t.*,
dense_rank() over (partition by id order by grp, (seqnum - seqnum_g)) as grouping
from (select t.*,
row_number() over (partition by id order by ?) as seqnum,
row_number() over (partition by id, grp order by ?) as seqnum_g
from t
) t;
This does not produce the values that you specifically request, but it does identify each group.

SQL: How to select some (but not all) records in a query multiple times

We have a bunch of records and we assign a random number to each record whose value is between 1 and the total number of records in the following manner:
SELECT personID, ROW_NUMBER()
OVER(ORDER BY NEWID()) as RowNumber
FROM folks
Easy like pie. Let's assume that a LOWER (edit: NOT higher, sorry!) number is better for Customer's purposes, and that they like how the 'random' element here works. Trouble is, customer now says 'some people are special and we want them to get three chances, and then save their best result as their number.'
Since we don't hand out numbers serially but all at once, the approach here seems to be to select special people three times in this query, and then grab their highest row number.
This is similar to, but one step more involved than this question (and others like it):
Select Records multiple times from table
I don't want to select ALL records three times; but I do want to do everything in one go; that is, I can't assign special people numbers, and then assign everyone else numbers - it has to be one query.
How would I construct a JOIN (and/or a CTE) to model this, assuming we can rely on a field like isSpecial = 1 on each record?
How would I then grab the 'lowest number' (i.e. first row_number appearance of that record) from the result in my SELECT statement?
Platform: Microsoft SQL 2012
SAMPLE DATA (including isSpecial in the output query just for demonstration's sake) - also, we want the minimum number here for business purposes, not the maximum
personID isSpecial
1 1
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
Current output:
SELECT personID, isSpecial, row_number
OVER(ORDER BY NEWID()) as RowNumber
FROM folks
personID RowNumber isSpecial
8 1 0
2 2 0
10 3 0
1 4 1
9 5 0
3 6 0
4 7 0
6 8 0
5 9 0
7 10 0
DESIRED OUTPUT:
personID MinRowNumber isSpecial rowNumber1 rowNumber2 rowNumber3
8 1 0 1
2 2 0 2
1 3 1 4 7 3
9 5 0 5
3 6 0 6
6 8 0 8
5 9 0 9
7 10 0 10
4 11 0 11
10 12 0 12
You could do this using a tally table and some aggregation. Something along these lines.
WITH
cteTally(N) AS (select n from (values (1),(2),(3))dt(n))
select personID
, MAX(RowNumber)
from
(
SELECT personID
, ROW_NUMBER() OVER(ORDER BY NEWID()) as RowNumber
FROM folks f
join cteTally t on t.N <= case when f.IsSpecial = 1 then 3 else 1 end
) x
group by x.personID
--EDIT--
You stated you might want all rows not just the MAX one. Here is how you could do that.
WITH
cteTally(N) AS (select n from (values (1),(2),(3))dt(n))
SELECT personID
, ROW_NUMBER() OVER(ORDER BY NEWID()) as RowNumber
FROM folks f
join cteTally t on t.N <= case when f.IsSpecial = 1 then 3 else 1 end
I think you can use the UNION approach, but only apply NEWID() once:
create table folks (personID int, isSpecial int)
insert into folks values (1,1);
insert into folks values (2,0);
insert into folks values (3,0);
insert into folks values (4,0);
insert into folks values (5,0);
insert into folks values (6,0);
insert into folks values (7,0);
insert into folks values (8,0);
insert into folks values (9,0);
insert into folks values (10,0);
select * from folks;
select
personID,
min(rownumber) as min_rownumber
from
(SELECT
personID,
ROW_NUMBER() OVER(ORDER BY NEWID()) as RowNumber
FROM
(select personID from folks
union all
select personID from folks where isSpecial = 1
union all
select personID from folks where isSpecial = 1) u
) r
group by
personID
SQLFiddle
A correct way to solve the task is this.
Let we have O ordinary people plus S special people. Each ordinary person has one chance, each special person has 3 chances. We should generate O plus S * 3 random numbers evenly distributed in the range of [1 .. O+S*3], then order all people according to the numbers that they got. Special people will appear 3 times in this ordered list, ordinary people will appear only once.
Here is the query that does it. The code for creating the table with sample data is shown below in my first variant. CTE_Numbers is just a table with three numbers. If you want to give a different number of chances to special people, alter this query. CTE lists all ordinary people once plus all special people three times. CTE_rn assigns a random number to each row. Each special person gets three random numbers. As each special person has three rows in CTE_rn, final query groups by PersonID and leaves only one row for each special person with the minimum number. To get a better understanding how it works, examine the intermediate results of CTE_rn.
WITH
CTE_Numbers
AS
(
SELECT Number
FROM (VALUES (1),(2),(3)) AS N(Number)
)
,CTE
AS
(
-- list ordinary people only once
SELECT PersonID,IsSpecial
FROM #T
WHERE IsSpecial = 0
UNION ALL
-- list each special person three times
SELECT PersonID,IsSpecial
FROM #T CROSS JOIN CTE_Numbers
WHERE IsSpecial = 1
)
,CTE_rn
AS
(
SELECT
PersonID,IsSpecial
,ROW_NUMBER() OVER(ORDER BY CRYPT_GEN_RANDOM(4)) AS rn
FROM CTE
)
SELECT
PersonID,IsSpecial
,MIN(rn) AS FinalRank
FROM CTE_rn
GROUP BY PersonID,IsSpecial
ORDER BY FinalRank;
result
PersonID IsSpecial FinalRank
9 0 1
2 0 2
1 1 3
10 0 4
8 0 5
5 0 6
3 0 7
7 0 9
4 0 10
6 0 12
Note, how FinalRank has values from 1 to 12 (not 10) and values 8 and 11 are not shown. The special person had them. Special person got random numbers 3, 8, 11 and the final result contains only minimum out of these three.
The first variant. It works, but results are skewed.
Very straight-forward. Generate random row numbers three times, join them together and for ordinary people pick the result of the first random number, for special people pick the minimum of three runs.
Nobody promised any particular distribution of random numbers for NEWID, so you'd better not use it in this case. In this example I used CRYPT_GEN_RANDOM.
I put the same query to get random numbers in three separate CTEs, rather than using the same CTE in the join, to make sure that it is calculated three times. If you use a single CTE, the server may be smart enough to calculate random numbers only once, rather than three times and this not what we need here. We do need 30 calls to CRYPT_GEN_RANDOM here.
DECLARE #T TABLE (PersonID int, IsSpecial bit);
INSERT INTO #T(PersonID, IsSpecial) VALUES
(1 , 1),
(2 , 0),
(3 , 0),
(4 , 0),
(5 , 0),
(6 , 0),
(7 , 0),
(8 , 0),
(9 , 0),
(10, 0);
WITH
CTE1
AS
(
SELECT PersonID, IsSpecial,
ROW_NUMBER() OVER(ORDER BY CRYPT_GEN_RANDOM(4)) AS rn
FROM #T
)
,CTE2
AS
(
SELECT PersonID, IsSpecial,
ROW_NUMBER() OVER(ORDER BY CRYPT_GEN_RANDOM(4)) AS rn
FROM #T
)
,CTE3
AS
(
SELECT PersonID, IsSpecial,
ROW_NUMBER() OVER(ORDER BY CRYPT_GEN_RANDOM(4)) AS rn
FROM #T
)
,CTE_All
AS
(
SELECT
CTE1.PersonID
,CTE1.IsSpecial
,CTE1.rn AS rn1
,CTE2.rn AS rn2
,CTE3.rn AS rn3
,CA.MinRN
FROM
CTE1
INNER JOIN CTE2 ON CTE2.PersonID = CTE1.PersonID
INNER JOIN CTE3 ON CTE3.PersonID = CTE1.PersonID
CROSS APPLY
(
SELECT MIN(A.rn) AS MinRN
FROM (VALUES (CTE1.rn), (CTE2.rn), (CTE3.rn)) AS A(rn)
) AS CA
)
SELECT
PersonID
,IsSpecial
,CASE WHEN IsSpecial = 0
THEN rn1 -- a person is not special, he gets random rank from the first run only
ELSE MinRN -- a special person, he gets a rank that is minimum of three runs
END AS FinalRank
,rn1
,rn2
,rn3
,MinRN
FROM CTE_All
ORDER BY FinalRank;
result set
PersonID IsSpecial FinalRank rn1 rn2 rn3 MinRN
8 0 1 1 1 1 1
6 0 2 2 7 2 2
5 0 3 3 5 6 3
1 1 3 9 3 4 3
4 0 4 4 6 3 3
7 0 5 5 9 10 5
3 0 6 6 8 9 6
2 0 7 7 2 8 2
10 0 8 8 10 5 5
9 0 10 10 4 7 4
You can see that special people can (by chance) get the same rank as ordinary people. You can favor special people further and make sure that they appear before ordinary people in this case. Just alter ORDER BY to be ORDER BY FinalRank, IsSpecial DESC.
How about using UNION?
SELECT personID, ROW_NUMBER()
OVER(ORDER BY NEWID()) as RowNumber
FROM folks
WHERE isSpecial = 0
UNION ALL
SELECT personID, MAX(RN)
FROM (
SELECT personID, ROW_NUMBER() AS 'RN'
OVER(ORDER BY NEWID()) as RowNumber
FROM folks
WHERE isSpecial = 1
UNION ALL
SELECT personID, ROW_NUMBER()
OVER(ORDER BY NEWID()) as RowNumber
FROM folks
WHERE isSpecial = 1
UNION ALL
SELECT personID, ROW_NUMBER()
OVER(ORDER BY NEWID()) as RowNumber
FROM folks
WHERE isSpecial = 1
)
GROUP BY personID

SQL rank grouping variation

I'm trying to achieve the following "rank" result given the original dataset composed by the column ID and CODE.
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 4
10 A 4
Using the RANK_DENSE instruction over the CODE column i get the following result (with the A code getting the same rank value also after "the break" between the rows)
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 1
10 A 1
Is it possible to achieve the results as shown in the first (example) table, with the A code changing rank when there is a separation between the group formed by id: 1-2-3 and the one formed by id: 9-10 without using a cursor?
Thanks
You want to find sequences of values and give them a rank. You can do this with a difference of row numbers approach. The following assigns a different number to each grouping:
select o.*, dense_rank() over (order by grp, code)
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o;
If you want the assignment in the same order as the original data, then you can order by the id, but that requires an additional window function:
select o.*, dense_rank() over (order by minid) as therank
from (select o.*, min(id) over (partition by grp, code) as minid
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o
) o;
SUM by if current is the same as previous row. Works from SQL Server 2012.
WITH CTE AS (
SELECT id, code,
CASE Code WHEN LAG(CODE) OVER (ORDER BY id) THEN 0 ELSE 1 END AS Diff
FROM Table1)
SELECT id, code, SUM(Diff) OVER (ORDER BY id) FROM CTE
Please also see similar question at How to make row numbering with ordering, partitioning and grouping