Group by data based with same group occuring multiple times

Group by data based with same group occuring multiple times - sql

Input data
id group
1 a
1 a
1 b
1 b
1 a
1 a
1 a
expected result
id group row_number
1 a 1
1 a 1
1 b 2
1 b 2
1 a 4
1 a 4
1 a 4
I require the rwo_number based on the above result. If the same group occurring the second time generates different row_number for that? I have one more column sequence of date top to end.

This is an example of a gaps-and-islands problem. Solving it, though, requires that the data be ordered -- and SQL tables represent unordered sets.
Let me assume you have such a column. Then the difference of row numbers can be used:
select t.*,
dense_rank() over (partition by id order by grp, (seqnum - seqnum_g)) as grouping
from (select t.*,
row_number() over (partition by id order by ?) as seqnum,
row_number() over (partition by id, grp order by ?) as seqnum_g
from t
) t;
This does not produce the values that you specifically request, but it does identify each group.

Related

How to use LIMIT to sample rows dynamically

I have a table as follows:
SampleReq
Group
ID
2
1
_001
2
1
_002
2
1
_003
1
2
_004
1
2
_005
1
2
_006
I want my query to IDs based on the column SampleReq, resulting in the following output:
Group
ID
1
_001
1
_003
2
_006
The query should pick any 2 IDs from group 1, any 1 IDs from group 2 and so on (depending on the column SampleReq).
I tried the query using LIMIT, but this gives me an error saying column names cannot be parsed to a limit.
SELECT Group, ID
FROM Table
LIMIT SampleReq
ORDER BY RAND()

One method is row_number():
select t.*
from (select t.*,
row_number() over (partition by samplereq order by random()) as seqnum
from t
) t
where seqnum <= 2 and id = 1 or
seqnum <= 1 and id = 2;

increment if not same value of next column in SQL

I am trying to use the Row Number in SQL. However, it's not giving desired output.
Data :
ID Name Output should be
111 A 1
111 B 2
111 C 3
111 C 3
111 A 4
222 A 1
222 A 1
222 B 2
222 C 3
222 B 4
222 B 4

This is a gaps-and-islands problem. As a starter: for the question to just make sense, you need a column that defines the ordering of the rows - I assumed ordering_id. Then, I would recommend lag() to get the "previous" name, and a cumulative sum() that increases everytime the name changes in adjacent rows:
select id, name,
sum(case when name = lag_name then 0 else 1 end) over(partition by id order by ordering_id) as rn
from (
select t.*, lag(name) over(partition by id order by ordering_id) lag_name
from mytable t
) t

SQL Server 2008 makes this much trickier. You can identify the adjacent rows using a difference of rows numbers. Then you can assign the minimum id in each island and use dense_rank():
select t.*,
dense_rank() over (partition by name order by min_ordcol) as output
from (select t.*,
min(<ordcol>) over (partition by name, seqnum - seqnum_2) as min_ordcol
from (select t.*,
row_number() over (partition by name order by <ordcol>) as seqnum,
row_number() over (partition by name, id order by <ordcol>) as seqnum_2
from t
) t
) t;

how to find the number has more than two consecutive appearences?

The source table:
id num
-------------------
1 1
2 1
3 1
4 2
5 2
6 1
The output:(appear at least 2 times)
num times
--------------
1 3
2 2

Based on the addition logic defined in the comments it appears this is what you're after:
WITH YourTable AS(
SELECT V.id,
V.num
FROM (VALUES(1,1),
(2,1),
(3,1),
(4,2),
(5,2),
(6,1),
(7,1))V(id,num)), --Added extra row due to logic defined in comments
Grps AS(
SELECT YT.id,
YT.num,
ROW_NUMBER() OVER (ORDER BY id) -
ROW_NUMBER() OVER (PARTITION BY Num ORDER BY id) AS Grp
FROM YourTable YT),
Counts AS(
SELECT num,
COUNT(num) AS Times
FROM grps
GROUP BY grp,
num)
SELECT num,
MAX(times) AS times
FROM Counts
GROUP BY num;
This uses a CTE and ROW_NUMBER to define the groups, and then an additional CTE to get the COUNT per group. Finally you can then get the MAX COUNT per num.

I would adress this with a gaps-and-islands technique:
select num, max(cnt)
from (
select num, count(*) cnt
from (
select
id,
num,
row_number() over(order by id) rn1,
row_number() over(partition by num order by id) rn2
from mytable
) t
group by num, rn1 - rn2
) t
group by num
The most inner query computes row numbers over the whole table and within num groups; the difference between the row numbers gives you the group of adjacent records that each record belong to (you can run that subquery independently and follow how the difference evolves to understand more).
Then, the next level count the number of records in each group of adjacent records. The most outer query takes the maximum count of adjacent records in for each num.
Demo on DB Fiddle:
num | (No column name)
--: | ---------------:
1 | 3
2 | 2

this will work for you
select num,count(num) times from Tabl
group by num

Query to group based on the sorted table result

Below is my table
a 1
a 2
a 1
b 1
a 2
a 2
b 3
b 2
a 1
My Expected output is
a 4
b 1
a 4
b 5
a 1
I want them to be grouped if they are in sequence.

If your dbms supports window functions, you can use the row_number difference to assign the same group to consecutive values (which are the same) in one column. After assigning the groups, it is easy to sum the values for each group.
select col1,sum(col2)
from (select t.*,
row_number() over(order by someid)
- row_number() over(partition by col1 order by someid) as grp
from tablename t
) x
group by col1,grp
Replace tablename, col1,col2,someid with the appropriate column names. someid should be the column to be ordered by.

SQL rank grouping variation

I'm trying to achieve the following "rank" result given the original dataset composed by the column ID and CODE.
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 4
10 A 4
Using the RANK_DENSE instruction over the CODE column i get the following result (with the A code getting the same rank value also after "the break" between the rows)
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 1
10 A 1
Is it possible to achieve the results as shown in the first (example) table, with the A code changing rank when there is a separation between the group formed by id: 1-2-3 and the one formed by id: 9-10 without using a cursor?
Thanks

You want to find sequences of values and give them a rank. You can do this with a difference of row numbers approach. The following assigns a different number to each grouping:
select o.*, dense_rank() over (order by grp, code)
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o;
If you want the assignment in the same order as the original data, then you can order by the id, but that requires an additional window function:
select o.*, dense_rank() over (order by minid) as therank
from (select o.*, min(id) over (partition by grp, code) as minid
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o
) o;

SUM by if current is the same as previous row. Works from SQL Server 2012.
WITH CTE AS (
SELECT id, code,
CASE Code WHEN LAG(CODE) OVER (ORDER BY id) THEN 0 ELSE 1 END AS Diff
FROM Table1)
SELECT id, code, SUM(Diff) OVER (ORDER BY id) FROM CTE
Please also see similar question at How to make row numbering with ordering, partitioning and grouping

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group by data based with same group occuring multiple times - sql

Related

How to use LIMIT to sample rows dynamically

increment if not same value of next column in SQL

how to find the number has more than two consecutive appearences?

Query to group based on the sorted table result

SQL rank grouping variation

Categories

Resources