Consolidate Rows with rank

Consolidate Rows with rank - sql

Current:
When the RNK is 1 then consolidate the ID as shown else if RNK is 0 then keep it as it is .
Please help how to do .
Required:

This is a gaps-and-islands problem. However, you really only care about the islands when rnk = 1. So, a convenient way to calculate them is the cumulative sum of rnk = 0. Then the rest is aggregation and combining the ids:
select (case when min(id) = max(id) then min(id)
else min(id) || '-' || max(id)
end) as id,
rnk
from (select t.*, sum(1 - rnk) over (order by id) as grp
from t
) t
group by grp, rnk
order by min(id);
Here is a db<>fiddle.

This is a gaps-and-island problem. You want to group together adjacent rows where rnk has value 1.
Here is an approach using row_number() and conditional expressions:
select
case when min(id) <> max(id) then concat(min(id), '-', max(id)) else min(id) end id,
min(rnk) rnk
from (
select
t.*,
row_number() over(order by id) rn1,
row_number() over(partition by rnk order by id) rn2
from mytable t
) t
group by case when rnk = 1 then rn1 - rn2 else rn1 + rn2 end
order by min(id)
Demo on DB Fiddle:
id | rnk
:-------- | --:
A100-A102 | 1
A103 | 0
A104 | 0
A105-A106 | 1

Related

PostgreSQL group by column with aggregate

I need to group by id and select the task with min/max seq as start and end
id | task | seq
----+------+-----
1 | aaa | 1
1 | bbb | 2
1 | ccc | 3
SELECT
id,
CASE WHEN seq = MIN(seq) THEN task AS start,
CASE WHEN seq = MAX(seq) THEN task AS end
FROM table
GROUP BY id;
But this results in
ERROR: column "seq" must appear in the GROUP BY clause or be used in an aggregate function
But I do not want group by seq

One method uses arrays:
SELECT id,
(ARRAY_AGG(task ORDER BY seq ASC))[1] as start_task,
(ARRAY_AGG(task ORDER BY seq DESC))[1] as end_task
FROM table
GROUP BY id;
Another method uses window functions with SELECT DISTINCT:
select distinct id,
first_value(task) over (partition by id order by seq) as start_task,
first_value(task) over (partition by id order by seq desc) as end_task
from t;

You can use window functions with a derived table:
select id, task, min_seq as start, max_seq as "end"
from (
select id, task, seq,
max(seq) over (partition by id) as max_seq,
min(seq) over (partition by id) as min_seq
from the_table
) t
where seq in (max_seq, min_seq)

One option here would be to use ROW_NUMBER along with aggregation and pivoting logic:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY seq) rn_min,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY seq DESC) rn_max
FROM yourTable
)
SELECT
id,
MAX(CASE WHEN rn_min = 1 THEN task END) AS start,
MAX(CASE WHEN rn_max = 1 THEN task END) AS end
FROM cte
GROUP BY
id;
Demo

Group sequential repeated values sqlite

I have data that repeated sequentially..
A
A
A
B
B
B
A
A
A
I need to group them like this
A
B
A
What is the best approach to do so using sqlite?

Assuming that you have a column that defines the ordering of the rows, say id, you can address this gaps-and-island problem with window functions:
select col, count(*) cnt, min(id) first_id, max(id) last_id
from (
select t.*,
row_number() over(order by id) rn1,
row_number() over(partition by col order by id) rn2
from mytable t
) t
group by col, rn1 - rn2
order by min(id)
I added a few columns to the resultset that give more information about the content of each group.

If you have defined a column that defines the order of the rows, like an id, you can use window function LEAD():
select col
from (
select col, lead(col, 1, '') over (order by id) next_col
from tablename
)
where col <> next_col
See the demo.
Results:
| col |
| --- |
| A |
| B |
| A |

SQL The largest number of consecutive values for each value

I have Tabel MatchResults
id | player_win_id
------------------
1 | 1
2 | 1
3 | 3
4 | 1
5 | 2
6 | 3
7 | 3
8 | 1
9 | 1
10 | 1
I need to find out for each player ID the highest number of consecutive victories. I use MS SQL Server.
Expected Result
PLAYER_ID | WIN_COUNT
------------------
1 | 3
2 | 1
3 | 2

This is a type of gaps-and-islands problem. One solution uses the difference of row numbers. So, to get all streaks:
select player_win_id, count(*)
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by player_win_id order by id) as seqnum_p
from MatchResults t
) t
group by player_win_id, (seqnum - seqnum_p);
Why this works is a little tricky to explain. But if you look at the results of the subquery, you'll probably see how the difference between the row number values captures adjacent rows with the same player win id.
For the maximum, the simplest is probably just an aggregation query:
select player_win_id, max(cnt)
from (select player_win_id, count(*) as cnt
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by player_win_id order by id) as seqnum_p
from MatchResults t
) t
group by player_win_id, (seqnum - seqnum_p)
) p
group by player_win_id;

Now I understand the previous comment. The code for my table is:
select player_win_id, max(cnt)
from (select player_win_id, count(*) as cnt
from (select *,
row_number() over (order by id) as seqnum,
row_number() over (partition by player_win_id order by id) as seqnum_p
from MatchResults ) t
group by player_win_id, (seqnum - seqnum_p)
) p
group by player_win_id;

How to count repeating values in a column in PostgreSQL?

Hi I have a table like below, and I want to count the repeating values in the status column. I don't want to calculate the overall duplicate values. For example, I just want to count how many "Offline" appears until the value changes to "Idle".
This is the result I wanted. Thank you.

This is often called gaps-and-islands.
One way to do it is with two sequences of row numbers.
Examine each intermediate result of the query to understand how it works.
WITH
CTE_rn
AS
(
SELECT
status
,dt
,ROW_NUMBER() OVER (ORDER BY dt) as rn1
,ROW_NUMBER() OVER (PARTITION BY status ORDER BY dt) as rn2
FROM
T
)
SELECT
status
,COUNT(*) AS cnt
FROM
CTE_rn
GROUP BY
status
,rn1-rn2
ORDER BY
min(dt)
;
Result
| status | cnt |
|---------|-----|
| offline | 2 |
| idle | 1 |
| offline | 2 |
| idle | 1 |

WITH
cte1 AS ( SELECT status,
"date",
workstation,
CASE WHEN status = LAG(status) OVER (PARTITION BY workstation ORDER BY "date")
THEN 0
ELSE 1 END changed
FROM test ),
cte2 AS ( SELECT status,
"date",
workstation,
SUM(changed) OVER (PARTITION BY workstation ORDER BY "date") group_num
FROM cte1 )
SELECT status, COUNT(*) "count", workstation, MIN("date") "from", MAX("date") "till"
FROM cte2
GROUP BY group_num, status, workstation;
fiddle

SQL Server : create group of N rows each and give group number for each group

I want to create a SQL query that SELECT a ID column and adds an extra column to the query which is a group number as shown in the output below.
Each group consists of 3 rows and should have the MIN(ID) as a GroupID for each group. The order by should be ASC on the ID column.
ID GroupNr
------------
100 100
101 100
102 100
103 103
104 103
105 103
106 106
107 106
108 106
I've tried solutions with ROW_NUMBER() and DENSE_RANK(). And also this query:
SELECT
*, MIN(ID) OVER (ORDER BY ID ASC ROWS 2 PRECEDING) AS Groupnr
FROM
Table
ORDER BY
ID ASC

Use row_number() to enumerate the rows, arithmetic to assign the group and then take the minimum of the id:
SELECT t.*, MIN(ID) OVER (PARTITION BY grp) as groupnumber
FROM (SELECT t.*,
( (ROW_NUMBER() OVER (ORDER BY ID) - 1) / 3) as grp
FROM Table
) t
ORDER BY ID ASC;
It is possible to do this without a subquery, but the logic is rather messy:
select t.*,
(case when row_number() over (order by id) % 3 = 0
then lag(id, 2) over (order by id)
when row_number() over (order by id) % 3 = 2
then lag(id, 1) over (order by id)
else id
end) as groupnumber
from table t
order by id;

Assuming you want the lowest value in the group, and they are always groups of 3, rather than the NTILE (as Saravantn suggests, which splits the data into that many even(ish) groups), you could use a couple of window functions:
WITH Grps AS(
SELECT V.ID,
(ROW_NUMBER() OVER (ORDER BY V.ID) -1) / 3 AS Grp
FROM (VALUES(100),
(101),
(102),
(103),
(104),
(105),
(106),
(107),
(108))V(ID))
SELECT G.ID,
MIN(G.ID) OVER (PARTITION BY G.Grp) AS GroupNr
FROM Grps G;

SELECT T2.ID, T1.ID
FROM (
SELECT MIN(ID) AS ID, GroupNr
FROM
(
SELECT ID, ( Row_number()OVER(ORDER BY ID) - 1 ) / 3 + 1 AS GroupNr
FROM Table
) AS T1
GROUP BY GroupNr
) AS T1
INNER JOIN (
SELECT ID, ( Row_number()OVER(ORDER BY ID) - 1 ) / 3 + 1 AS GroupNr
FROM Table
) T2 ON T1.GroupNr = T2.GroupNr

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Consolidate Rows with rank - sql

Current: When the RNK is 1 then consolidate the ID as shown else if RNK is 0 then keep it as it is . Please help how to do . Required:

Related

PostgreSQL group by column with aggregate

Group sequential repeated values sqlite

SQL The largest number of consecutive values for each value

How to count repeating values in a column in PostgreSQL?

SQL Server : create group of N rows each and give group number for each group

Categories

Resources