Sequence grouping in TSQL - sql

I'm trying to group data in sequence order. Say I have the following table:
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 1 | C |
| 1 | B |
I need the SQL query to output the following:
| 1 | A | 1 |
| 1 | A | 1 |
| 1 | B | 2 |
| 1 | B | 2 |
| 1 | C | 3 |
| 1 | B | 4 |
The last column is a group number that is incremented in each group. The important thing to note is that rows 3, 4 and 5 contain the same data which should be grouped into 2 groups not 1.

For MSSQL2008:
Suppose you have a SampleStatuses table:
Status Date
A 2014-06-11
A 2014-06-14
B 2014-06-25
B 2014-07-01
A 2014-07-06
A 2014-07-19
B 2014-07-21
B 2014-08-13
C 2014-08-19
you write the following:
;with
cte as (
select top 1 RowNumber, 1 as GroupNumber, [Status], [Date] from SampleStatuses order by RowNumber
union all
select c1.RowNumber,
case when c2.Status <> c1.Status then c2.GroupNumber + 1 else c2.GroupNumber end as GroupNumber, c1.[Status], c1.[Date]
from cte c2 join SampleStatuses c1 on c1.RowNumber = c2.RowNumber + 1
)
select * from cte;
you get this result:
RowNumber GroupNumber Status Date
1 1 A 2014-06-11
2 1 A 2014-06-14
3 2 B 2014-06-25
4 2 B 2014-07-01
5 3 A 2014-07-06
6 3 A 2014-07-19
7 4 B 2014-07-21
8 4 B 2014-08-13
9 5 C 2014-08-19

The normal way you would do what you want is the dense_rank function:
select key, val,
dense_rank() over (order by key, val)
from t
However, this does not address the problem of separating the last groups.
To handle this, I have to assume there is an "id" column. Tables, in SQL, do not have an ordering, so I need the ordering. If you are using SQL Server 2012, then you can use the lag() function to get what you need. Use the lag to see if the key, val pair is the same on consecutive rows:
with t1 as (
select id, key, val,
(case when key = lead(key, 1) over (order by id) and
val = lead(val, 1) over (order by id)
then 1
else 0
end) as SameAsNext
from t
)
select id, key, val,
sum(SameAsNext) over (order by id) as GroupNum
from t
Without SQL Server 2012 (which has cumulative sums), you have to do a self-join to identify the beginning of each group:
select t.*,
from t left outer join
t tprev
on t.id = t2.id + 1 and t.key = t2.key and t.val = t2.val
where t2.id is null
With this, assign the group as the minimum id using a join:
select t.id, t.key, t.val,
min(tgrp.id) as GroupId
from t left outer join
(select t.*,
from t left outer join
t tprev
on t.id = t2.id + 1 and t.key = t2.key and t.val = t2.val
where t2.id is null
) tgrp
on t.id >= tgrp.id
If you want these to be consecutive numbers, then put them in a subquery and use dense_rank().

This will give you rankings on your columns.
It will not give you 1,2,3 however.
It will give you 1,3,6 etc based on how many in each grouping
select
a,
b,
rank() over (order by a,b)
from
table1
See this SQLFiddle for a clearer idea of what I mean: http://sqlfiddle.com/#!3/0f201/2/0

Related

select all rows by distinct values with limit on distinct values

Let's say we have 2 tables:
Table1: Table2:
id | t2id id | col
---------- ----------
1 | 1 1 | a
2 | 2 2 | b
3 | 2 3 | c
4 | 1 4 | d
5 | 3 5 | e
6 | 3 6 | f
7 | 4 7 | g
8 | 5 8 | h
9 | 1 9 | i
10 | 4 10 | j
My question is:
Is there any short way to put limit for distinct results of Table1.t2id column?
For example: if limit = 2 then all rows with t2id from 1 to 2 (or any other values) are selected.
Expected result (with limit = 2):
Res:
id | t2id
----------
1 | 1
2 | 2
3 | 2
4 | 1
9 | 1
Note:
Any information or suggestion are accepted
You could use just the where clause
Select id,t2id
from table1
where t2id<=2
Or you can use where .. between
Select id,t2id
from table1
where t2id between 1 and 2
I believe you want to:
Create a subquery with all the columns you need + this one: DENSE_RANK() OVER (ORDER BY Table1.t2id) AS MyRank
outside of the sub-query, add a where on MyRank
Complete solution:
SELECT id, tb2id
FROM (
SELECT id, tb2id, DENSE_RANK() OVER (ORDER BY Table1.t2id) AS MyRank
FROM table1
) MySubQuery
WHERE MyRank <= 2
This will adapt to JOINs with table2 (with potential multiplicity increase) and non-consecutive values in tb2id.
You can also use in:
select t1.*
from table1 t1
where t1.t2_id in (select t2.id from table2 t2 limit 2);
The advantage of this approach is that it is easy to make it random:
select t1.*
from table1 t1
where t1.t2_id in (select t2.id from table2 t2 order by random() limit 2);

Select row with max value saving distinct column

I have values
- id|type|Status|Comment
- 1 | P | 1 | AAA
- 2 | P | 2 | BBB
- 3 | P | 3 | CCC
- 4 | S | 1 | DDD
- 5 | S | 2 | EEE
I wan to get values for each type with max status and with comment from the row with max status:
- id|type|Status|Comment
- 3 | P | 3 | CCC
- 5 | S | 2 | EEE
All the existing questions on SO do not care about the right correspondence of Max type and value.
This gives you one row per type, which have max status
select * from (
select your_table.*, row_number() over(partition by type order by Status desc) as rn from your_table
) tt
where rn = 1
Corrected: The below will use a subquery to figure out each type and what the max status is, then it joins that onto the original table and uses the where clause to only select those rows where the status equals the max status. Of note, if you have multiple records with the same max status, you will get both of them to come up.
WITH T1 AS (SELECT type, MAX(STATUS) AS max_status FROM table_name GROUP BY type)
SELECT t2.id, t2.type, t2.status, t2.comment
FROM T1 LEFT JOIN table_name t2 ON t2.type= T1.type
WHERE t2.status = T1.max_status

how to query range?

Raw Data
| ID | STATUS |
| 1 | A |
| 2 | A |
| 3 | B |
| 4 | B |
| 5 | B |
| 6 | A |
| 7 | A |
| 8 | A |
| 9 | C |
Result
| START | END |
| 1 | 2 |
| 6 | 8 |
Range of STATUS A
How to query ?
This should give you the correct ranges:
SELECT
STATUS,
MIN(ID),
max_id
FROM (
SELECT
t1.STATUS,
t1.ID,
COALESCE(MAX(t2.ID), t1.ID) max_id
FROM
yourtable t1 LEFT JOIN yourtable t2
ON t1.STATUS=t2.STATUS AND t1.ID<t2.ID
WHERE
NOT EXISTS (SELECT NULL
FROM yourtable t3
WHERE
t3.STATUS!=t1.STATUS
AND t3.ID>t1.ID AND t3.ID<t2.ID)
GROUP BY
t1.ID,
t1.STATUS
) s
WHERE
status = 'A'
GROUP BY
STATUS,
max_id
Please see fiddle here.
You are probably better off with a cursor-based solution or a client-side function.
However, if you were using Oracle - the following would work.
WITH LOWER_VALS AS
( -- All the Ids with no immediate predecessor
SELECT ROWNUM AS RN, STATUS, ID AS LOWER FROM
(
SELECT STATUS, ID
FROM RAWDATA RD1
WHERE RD1.ID -1 NOT IN
(SELECT ID FROM RAWDATA PRED_TABLE WHERE PRED_TABLE.STATUS = RD1.STATUS)
ORDER BY STATUS, ID
)
) ,
UPPER_VALS AS
( -- All the Ids with no immediate successor
SELECT ROWNUM AS RN, STATUS, ID AS UPPER FROM
(
SELECT STATUS, ID
FROM RAWDATA RD2
WHERE RD2.ID +1 NOT IN
(SELECT ID FROM RAWDATA SUCC_TABLE WHERE SUCC_TABLE.STATUS = RD2.STATUS)
ORDER BY STATUS, ID
)
)
SELECT
L.STATUS, L.LOWER, U.UPPER
FROM
LOWER_VALS L
JOIN UPPER_VALS U ON
U.RN = L.RN;
Results in the set
A 1 2
A 6 8
B 3 5
C 9 9
http://sqlfiddle.com/#!4/10184/2
There is not a lot to go on from what you put, but I think this might work. I am using T-SQL because I don't know what you are using?
SELECT
min(ID)
, max(ID)
FROM RawData
WHERE [Status] = 'A'

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

MSSQL: Only last entry in GROUP BY (with id)

Following / copying computhomas's question, but adding some twists...
I have the following table in MSSQL2008
id | business_key | result | date
1 | 1 | 0 | 9
2 | 1 | 1 | 8
3 | 2 | 1 | 7
4 | 3 | n | 6
5 | 4 | 1 | 5
6 | 4 | 0 | 4
And now i want to group based on the business_key returning the complete entry with the newest date.
So my expected result is:
id | business_key | result | date
1 | 1 | 0 | 9
3 | 2 | 1 | 7
4 | 3 | n | 6
5 | 4 | 1 | 5
I also bet that there is a way to achieve that, i just can't find / see / think of it at the moment.
edit: sorry about this, I actually meant something else from original question I did. I felt like editing this might be better than accepting a solution and making another question. my original problem was that I am not filtering by id.
SELECT t.*
FROM
(
SELECT *, ROW_NUMBER() OVER
(
PARTITION BY [business_key]
ORDER BY [date] DESC
) AS [RowNum]
FROM yourTable
) AS t
WHERE t.[RowNum] = 1
SELECT
*
FROM
mytable
WHERE
ID IN (SELECT MAX(ID) FROM mytable GROUP BY business_key)
SELECT
MAX(T1.id) AS [id],
T1.business_key,
T1.result
FROM
dbo.My_Table T1
LEFT OUTER JOIN dbo.My_Table T2 ON
T2.business_key = T1.business_key AND
T2.id > T1.id
WHERE
T2.id IS NULL
GROUP BY T1.business_key,
T1.result
ORDER BY MAX(T1.id)
Edited based on clarifications
SELECT M1.*
FROM My_Table M1
INNER JOIN
(
SELECT [business_key], MAX([date]) as MaxDate
FROM My_Table
GROUP BY [business_key]
) M2 ON M1.business_key = M2.business_key AND M1.[date] = M2.MaxDate
ORDER BY M1.[id]
Assuming the combination of business_key & date is unique then....
Working example (3rd time is a charm):
declare #src as table(id int, business_key int,result int,[date] int)
insert into #src
SELECT 1,1,0,9
UNION SELECT 2,1,1,8
UNION SELECT 3,2,1,7
UNION SELECT 4,3,1,6
UNION SELECT 5,4,1,5
UNION SELECT 6,4,0,4
;with bkdate(business_key,[date])
AS
(
select business_key,MAX([date])
from #src
group by business_key
)
select src.* from #src src
inner join bkdate
ON src.[date] = bkdate.date
and src.business_key = bkdate.business_key
order by id
How about (edited after question change):
with latestdate as (
select business_key, maxdate=max(date)
from the_table
group by business_key
), latest as (
select ID = max(id)
from the_table
inner join latestdate
on the_table.business_key=latestdate.business_key
and the_table.date=latestdate.maxdate
group by the_table.business_key
)
select the_table.*
from the_table
inner join latest
on latest.id=the_table.id