Numbering rows from 1 to N based on a column value - sql

Sample:
id value
1 a
1 b
1 c
1 d
1 a
1 b
1 d
1 a
Expected outcome:
id value outcome
1 a 1
1 b 1
1 c 1
1 d 1
1 a 2
1 b 2
1 d 2
1 a 3
So the basic idea is that I need to number the rows I have based on the value column - whenever it reaches "d", the count starts over. Not sure which kind of window function I'd use do to that, so any help is appreciated! Thanks in advance!

Use row_number window function with partition by value or by id and value (based on desired output):
-- sample data
with dataset(id, value) as(
values (1, 'a'),
(1, 'b'),
(1, 'c'),
(1, 'd'),
(1, 'a'),
(1, 'b'),
(1, 'd'),
(1, 'a')
)
-- query
select *,
row_number() over (partition by id, value) -- or (partition by value)
from dataset;
Note that if there is no column which will allow "natural" ordering for the over clause (i.e. over (partition by id, value order by some_column_like_timestamp)) then the actual order is not guaranteed between queries (you will be able to observe it if there are other columns present which has different values in the same partition).

Use row_number to give them a unique number, then order by row_number and value.
select
*,
row_number() over ( partition by (val) ) as rn
from stuff
order by rn, val;
Demonstration

Related

GROUP by Largest String for all the substrings

I have a table like this where some rows have the same grp but different names. I want to group them by name such that all the substrings after removing nonalphanumeric characters are aggregated together and grouped by the largest string. The null value is considered the substring of all the strings.
grp
name
value
1
ab&c
10
1
abc d e
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
Desired result
grp
name
value
1
abcde
111
1
xy
34
2
fgh
87
My query-
Select grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g') name, sum(value) value
from table
group by grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g');
Result
grp
name
value
1
abc
10
1
abcde
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
What changes should I make in my query?
To solve this problem, I did the following (all of the code below is available on the fiddle here).
CREATE TABLE test
(
grp SMALLINT NOT NULL,
name TEXT NULL,
value SMALLINT NOT NULL
);
and populate it using your data + extra for testing:
INSERT INTO test VALUES
(1, 'ab&c', 10),
(1, 'abc d e', 56),
(1, 'ab', 21),
(1, 'a', 23),
(1, NULL, 1000000),
(1, 'r*&%$s', 100), -- added for testing.
(1, 'rs__t', 101),
(1, 'rs__tu', 101),
(1, 'xy', 1111),
(1, NULL, 1000000),
(2, 'fgh', 87),
(2, 'fgh', 13), -- For Charlieface
(2, NULL, 1000000),
(2, 'x', 50),
(2, 'x', 150),
(2, 'x----y', 100);
Then, you can use this query:
WITH t1 AS
(
SELECT
grp, n_str,
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str),
CASE
WHEN
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str) IS NULL
OR
POSITION
(
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str)
IN
n_str
) = 0
THEN 1
ELSE 0
END AS change,
value
FROM
test t1
CROSS JOIN LATERAL
(
VALUES
(
REGEXP_REPLACE(name,'[^a-zA-Z0-9]+', '', 'g')
)
) AS v(n_str)
WHERE n_str IS NOT NULL
), t2 AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY grp, s_change ORDER BY grp, n_str DESC) AS rn,
grp, n_str,
SUM(value) OVER (PARTITION BY grp, s_change) AS s_val,
MAX(LENGTH(n_str)) OVER (PARTITION BY grp) AS max_nom
FROM
(
SELECT
grp, n_str, change,
SUM(change) OVER (ORDER BY grp, n_str) AS s_change,
value
FROM
t1
ORDER BY grp, n_str DESC
) AS sub1
), t3 AS
(
SELECT
grp, SUM(value) AS null_sum
FROM
test
WHERE name IS NULL
GROUP BY grp
)
SELECT x.grp, x.n_str, x.s_val + y.null_sum
FROM t2 x
JOIN t3 y
ON x.max_nom = LENGTH(x.n_str) AND x.grp = y.grp
UNION
SELECT grp, n_str, s_val
FROM
t2 WHERE max_nom != LENGTH(n_str) AND rn = 1
ORDER BY grp, n_str;
Result:
grp n_str ?column?
1 abcde 2000110
1 rstu 302
1 xy 1111
2 fgh 1000100
2 xy 300
A few points to note:
Please always provide a fiddle when you ask questions such as this one with tables and data - it provides a single source of truth for the question and eliminates duplication of effort on the part of those trying to help you!
You haven't been very clear about what, exactly, should happen with NULLs - do the values count towards the SUM()? You can vary the CASE statement as required.
What happens when there's a tie in the number of characters in the string? I've included an example in the fiddle, where you get the draws - but you may wish to sort alphabetically (or some other method)?
There appears to be an error in your provided sums for the values (even taking account of counting or not values for NULL for the name field).
Finally, you don't want to GROUP BY the largest string - you want to GROUP BY the grp fields + the SUM() of the values in the the given grp records and then pick out the longest alphanumeric string in that grouping. It would be interesting to know why you want to do this?

SQL Occurrence of Sequence Number

I want to find if any Name has straight 4 or more occurrences of SeqNo in consecutive sequence only.
If there is a break in seqNo but 4 or more rows are consecutive then also i need that Name.
Example:
SeqNo Name
10 | A
15 | A
16 | A
17 | A
18 | A
9 | B
10 | B
13 | B
14 | B
6 | C
7 | C
9 | C
10 | C
OUTPUT:
A
BELOW IS SCRIPT FOR ANYONE HELPING.
create table testseq (Id int, Name char)
INSERT into testseq values
(10, 'A'),
(15, 'A'),
(16, 'A'),
(17, 'A'),
(18, 'A'),
(9, 'B'),
(10, 'B'),
(13, 'B'),
(14, 'B'),
(6, 'C'),
(7, 'C'),
(9, 'C'),
(10, 'C')
SELECT * FROM testseq
You can use some gaps-and-islands techniques for this.
If you want names that have at least 4 consecutive records where seqno is increasing by 1, then you can use the difference between seqno androw_number()` to define the groups, and then aggregate:
select distinct name
from (
select t.*, row_number() over(partition by name order by seqno) rn
from testseq t
) t
group by name, rn - seqno
having count(*) >= 4
Note that for your sample data, this returns no rows. A has 3 consecutive records where seqno is incrementing by 1, B and C have two.
I don't really view this as a "gaps-and-islands" problem. You are just looking for a minimum number of adjacent rows. This is easily handled using lag() or lead():
select t.*
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3
from t
) t
where seqno_name_3 = seqno + 3;
This checks the third sequence number on the same name. The third one after means that four names are the same in a row.
If you just want the name and to handle duplicates:
select distinct name
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3
from t
) t
where seqno_name_3 = seqno + 3;
If the sequence numbers can have gaps (but are otherwise adjacent):
select distinct name
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3,
lead(seqno, 3) over (order by seqno) as seqno_3
from t
) t
where seqno_name_3 = seqno_3;
A solution in plain SQL, no LAG() or LEAD() or ROW_NUMBER():
SELECT t1.Name
FROM testseq t1
WHERE (
SELECT count(t2.Id)
FROM testseq t2
WHERE t2.Name=t1.Name
and t2.Id between t1.Id and t1.Id+3
GROUP BY t2.Name)>=4
GROUP BY t1.Name;

How to get row number for each null value?

I need to get row number for each record of null by sequence. Restart number when get a value in the row.
I have tried so far
select *
, ROW_NUMBER() over (order by id) rn
from #tbl
select *
, ROW_NUMBER() over (partition by value order by id) rn
from #tbl
declare #tbl table(id int, value int)
insert into #tbl values
(1, null), (2, null), (3, null), (4, 1),(5, null), (6, null), (7, 1), (8, null), (9, null), (10, null)
select *
, ROW_NUMBER() over (partition by value order by id) rn
from #tbl
I'm getting this:
id, value, rn
1 NULL 1
2 NULL 2
3 NULL 3
4 1 4
5 NULL 5
6 NULL 6
7 1 7
8 NULL 8
9 NULL 9
10 NULL 10
I want a result like this
id, value, rn
1 NULL 1
2 NULL 2
3 NULL 3
4 1 1
5 NULL 1
6 NULL 2
7 1 1
8 NULL 1
9 NULL 2
10 NULL 3
How can I get desired result with sql query?
This approach uses COUNT as an analytic function over the value column to generate "groups" for each block of NULL values. To see how this works, just run SELECT * FROM cte using the code below. Then, using this computed group, we use ROW_NUMBER to generate the sequences for the NULL values. We order ascending by the value, which would mean that each NULL row number sequence would always begin with 1, which is the behavior we want. For records with a non NULL value, we just pull that value across into the rn column.
WITH cte AS (
SELECT *, COUNT(value) OVER (ORDER BY id) vals
FROM #tbl
)
SELECT id, value,
CASE WHEN value IS NULL
THEN ROW_NUMBER() OVER (PARTITION BY vals ORDER BY value)
ELSE value END AS rn
FROM cte
ORDER BY id;
Demo

Return Nth Percentile row value of a column that is an aggregate value

Related to SQL-Server
I need to return the value for a column in the Nth percentile associated to multiple unique IDs in another column. For example, for the dataset below, I need the value in the 80th Percentile in COL B for each unique value in COL A:
COL A COL B
--------- --------
A 2
A 4
A 6
A 8
A 10
B 2
B 2
B 3
B 5
B 7
B 8
B 11
B 13
B 17
B 18
The desired output would be:
COL A COL B
-------- --------
A 8
B 13
This is based on the logic that:
the 80th Percentile value for COL B is the 4th row value of 8 for value A in COL A;
and that the 80th Percentile value for COL B is the 8th row value of 13 for value B in COL A
If you're on SQL 2012 or greater, you can use percentile_disc()
WITH cte AS (
SELECT * FROM (VALUES
('A', 2 ),
('A', 4 ),
('A', 6 ),
('A', 8 ),
('A', 10 ),
('B', 2 ),
('B', 2 ),
('B', 3 ),
('B', 5 ),
('B', 7 ),
('B', 8 ),
('B', 11 ),
('B', 13 ),
('B', 17 ),
('B', 18 )
) AS x(a, v)
)
SELECT DISTINCT a
, PERCENTILE_DISC(0.8) WITHIN GROUP (ORDER BY v) OVER (PARTITION BY a)
FROM cte
Here is the absolutely wretched query:
select r.t1, MIN(r.t2) FROM (SELECT TOP 20 PERCENT t1, t2 FROM tempTable where t1 = 'A' ORDER BY t2 desc ) as r
group by r.t1
union
SELECT s.t1, MIN(s.t2) FROM ( SELECT TOP 20 PERCENT t1, t2 FROM tempTable ORDER BY t2 DESC ) as s
group by s.t1
Where t1 is Col A, t2 is Col B, and tempTable is your table.
This is solely based on your table provided and is by no means generic.
EDIT: I figured out how to apply it to the OP's question by using ntile
SELECT colA, colB,
NTILE(5) OVER(PARTITION BY colA ORDER BY colB DESC) AS 'tileN'
FROM tempTable t
group by colA, colB ) as n
where n.tileN = 2
What it does:
NTile basically creates partitions of 100 / a where a is NTILE(a). By dividing by 5 we get partitions of 20 percent. Therefore 2 is the 80th percentile. Then we select the top 20 percent from that query to eliminate values that would be the same.

How can I group a set split by change in a field with respect to an order?

I have a set of records.
ID Value
1 a
2 b
3 b
4 b
5 a
6 a
7 b
8 b
And I would like to group them like so.
MIN(ID) MAX(ID) Value
1 1 a
2 4 b
5 6 a
7 8 b
I'm vaguely aware of oracle over() analytical function which looks to be the right direction, but I don't know what this problem is called much less how to solve it.
Probably an easier way, but this may help to start. I ran it on Postgres, but should work (maybe with a minor tweak) on Oracle. The inner most query puts the previous value on each row. We can use that to detect a grouping change (when value does not equal previous value). Every time there is a group change, we flag it with a "1". Sum these group changes and we now have a group id which increments every time there is a value change. Then we can perform our normal group by function.
create table x(id int, value varchar(1));
insert into x values(1, 'a');
insert into x values(2, 'b');
insert into x values(3, 'b');
insert into x values(4, 'b');
insert into x values(5, 'a');
insert into x values(6, 'a');
insert into x values(7, 'b');
insert into x values(8, 'b');
SELECT MIN(id), MAX(id), value
FROM ( SELECT id
,value
,previous_value
,SUM( CASE WHEN value = previous_value THEN 0 ELSE 1 END ) OVER(ORDER BY id) AS group_id
FROM ( SELECT id
,value
,COALESCE( LAG(value) OVER(ORDER BY id), value ) previous_value
FROM x
ORDER BY id
) y
) z
GROUP BY group_id, value
ORDER BY 1, 2;
min | max | value
-----+-----+-------
1 | 1 | a
2 | 4 | b
5 | 6 | a
7 | 8 | b
(4 rows)