SQL Occurrence of Sequence Number - sql

I want to find if any Name has straight 4 or more occurrences of SeqNo in consecutive sequence only.
If there is a break in seqNo but 4 or more rows are consecutive then also i need that Name.
Example:
SeqNo Name
10 | A
15 | A
16 | A
17 | A
18 | A
9 | B
10 | B
13 | B
14 | B
6 | C
7 | C
9 | C
10 | C
OUTPUT:
A
BELOW IS SCRIPT FOR ANYONE HELPING.
create table testseq (Id int, Name char)
INSERT into testseq values
(10, 'A'),
(15, 'A'),
(16, 'A'),
(17, 'A'),
(18, 'A'),
(9, 'B'),
(10, 'B'),
(13, 'B'),
(14, 'B'),
(6, 'C'),
(7, 'C'),
(9, 'C'),
(10, 'C')
SELECT * FROM testseq

You can use some gaps-and-islands techniques for this.
If you want names that have at least 4 consecutive records where seqno is increasing by 1, then you can use the difference between seqno androw_number()` to define the groups, and then aggregate:
select distinct name
from (
select t.*, row_number() over(partition by name order by seqno) rn
from testseq t
) t
group by name, rn - seqno
having count(*) >= 4
Note that for your sample data, this returns no rows. A has 3 consecutive records where seqno is incrementing by 1, B and C have two.

I don't really view this as a "gaps-and-islands" problem. You are just looking for a minimum number of adjacent rows. This is easily handled using lag() or lead():
select t.*
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3
from t
) t
where seqno_name_3 = seqno + 3;
This checks the third sequence number on the same name. The third one after means that four names are the same in a row.
If you just want the name and to handle duplicates:
select distinct name
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3
from t
) t
where seqno_name_3 = seqno + 3;
If the sequence numbers can have gaps (but are otherwise adjacent):
select distinct name
from (select t.*,
lead(seqno, 3) over (partition by name order by seqno) as seqno_name_3,
lead(seqno, 3) over (order by seqno) as seqno_3
from t
) t
where seqno_name_3 = seqno_3;

A solution in plain SQL, no LAG() or LEAD() or ROW_NUMBER():
SELECT t1.Name
FROM testseq t1
WHERE (
SELECT count(t2.Id)
FROM testseq t2
WHERE t2.Name=t1.Name
and t2.Id between t1.Id and t1.Id+3
GROUP BY t2.Name)>=4
GROUP BY t1.Name;

Related

Numbering rows from 1 to N based on a column value

Sample:
id value
1 a
1 b
1 c
1 d
1 a
1 b
1 d
1 a
Expected outcome:
id value outcome
1 a 1
1 b 1
1 c 1
1 d 1
1 a 2
1 b 2
1 d 2
1 a 3
So the basic idea is that I need to number the rows I have based on the value column - whenever it reaches "d", the count starts over. Not sure which kind of window function I'd use do to that, so any help is appreciated! Thanks in advance!
Use row_number window function with partition by value or by id and value (based on desired output):
-- sample data
with dataset(id, value) as(
values (1, 'a'),
(1, 'b'),
(1, 'c'),
(1, 'd'),
(1, 'a'),
(1, 'b'),
(1, 'd'),
(1, 'a')
)
-- query
select *,
row_number() over (partition by id, value) -- or (partition by value)
from dataset;
Note that if there is no column which will allow "natural" ordering for the over clause (i.e. over (partition by id, value order by some_column_like_timestamp)) then the actual order is not guaranteed between queries (you will be able to observe it if there are other columns present which has different values in the same partition).
Use row_number to give them a unique number, then order by row_number and value.
select
*,
row_number() over ( partition by (val) ) as rn
from stuff
order by rn, val;
Demonstration

GROUP by Largest String for all the substrings

I have a table like this where some rows have the same grp but different names. I want to group them by name such that all the substrings after removing nonalphanumeric characters are aggregated together and grouped by the largest string. The null value is considered the substring of all the strings.
grp
name
value
1
ab&c
10
1
abc d e
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
Desired result
grp
name
value
1
abcde
111
1
xy
34
2
fgh
87
My query-
Select grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g') name, sum(value) value
from table
group by grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g');
Result
grp
name
value
1
abc
10
1
abcde
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
What changes should I make in my query?
To solve this problem, I did the following (all of the code below is available on the fiddle here).
CREATE TABLE test
(
grp SMALLINT NOT NULL,
name TEXT NULL,
value SMALLINT NOT NULL
);
and populate it using your data + extra for testing:
INSERT INTO test VALUES
(1, 'ab&c', 10),
(1, 'abc d e', 56),
(1, 'ab', 21),
(1, 'a', 23),
(1, NULL, 1000000),
(1, 'r*&%$s', 100), -- added for testing.
(1, 'rs__t', 101),
(1, 'rs__tu', 101),
(1, 'xy', 1111),
(1, NULL, 1000000),
(2, 'fgh', 87),
(2, 'fgh', 13), -- For Charlieface
(2, NULL, 1000000),
(2, 'x', 50),
(2, 'x', 150),
(2, 'x----y', 100);
Then, you can use this query:
WITH t1 AS
(
SELECT
grp, n_str,
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str),
CASE
WHEN
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str) IS NULL
OR
POSITION
(
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str)
IN
n_str
) = 0
THEN 1
ELSE 0
END AS change,
value
FROM
test t1
CROSS JOIN LATERAL
(
VALUES
(
REGEXP_REPLACE(name,'[^a-zA-Z0-9]+', '', 'g')
)
) AS v(n_str)
WHERE n_str IS NOT NULL
), t2 AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY grp, s_change ORDER BY grp, n_str DESC) AS rn,
grp, n_str,
SUM(value) OVER (PARTITION BY grp, s_change) AS s_val,
MAX(LENGTH(n_str)) OVER (PARTITION BY grp) AS max_nom
FROM
(
SELECT
grp, n_str, change,
SUM(change) OVER (ORDER BY grp, n_str) AS s_change,
value
FROM
t1
ORDER BY grp, n_str DESC
) AS sub1
), t3 AS
(
SELECT
grp, SUM(value) AS null_sum
FROM
test
WHERE name IS NULL
GROUP BY grp
)
SELECT x.grp, x.n_str, x.s_val + y.null_sum
FROM t2 x
JOIN t3 y
ON x.max_nom = LENGTH(x.n_str) AND x.grp = y.grp
UNION
SELECT grp, n_str, s_val
FROM
t2 WHERE max_nom != LENGTH(n_str) AND rn = 1
ORDER BY grp, n_str;
Result:
grp n_str ?column?
1 abcde 2000110
1 rstu 302
1 xy 1111
2 fgh 1000100
2 xy 300
A few points to note:
Please always provide a fiddle when you ask questions such as this one with tables and data - it provides a single source of truth for the question and eliminates duplication of effort on the part of those trying to help you!
You haven't been very clear about what, exactly, should happen with NULLs - do the values count towards the SUM()? You can vary the CASE statement as required.
What happens when there's a tie in the number of characters in the string? I've included an example in the fiddle, where you get the draws - but you may wish to sort alphabetically (or some other method)?
There appears to be an error in your provided sums for the values (even taking account of counting or not values for NULL for the name field).
Finally, you don't want to GROUP BY the largest string - you want to GROUP BY the grp fields + the SUM() of the values in the the given grp records and then pick out the longest alphanumeric string in that grouping. It would be interesting to know why you want to do this?

Unexpected behavior of window function first_value

I have 2 columns - order no, value. Table value constructor:
(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)
I need to get
(1, 5) -- i.e. first nonnull Value if I go from current row and order by OrderNo
,(2, 5)
,(3, 2) -- i.e. first nonnull Value if I go from current row and order by OrderNo
,(4, 2) -- analogous
,(5, 2)
,(6, 1)
This is query that I think should work.
;with SourceTable as (
select *
from (values
(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)
) as T(OrderNo, Value)
)
select
*
,first_value(Value) over (
order by
case when Value is not null then 0 else 1 end
, OrderNo
rows between current row and unbounded following
) as X
from SourceTable
order by OrderNo
The issue is that it returns exactly same resultset as SourceTable. I don't understand why. E.g., if first row is processed (OrderNo = 1) I'd expect column X returns 5 because frame should include all rows (current row and unbound following) and it orders by Value - nonnulls first, then by OrderNo. So first row in frame should be OrderNo=2. Obviously it doesn't work like that but I don't get why.
Much appreciated if someone explains how is constructed the first frame. I need this for SQL Server and also Postgresql.
Many thanks
Although probably more expensive than two window functions, you can do this without a subquery using arrays:
with SourceTable as (
select *
from (values (1, null),
(2, 5),
(3, null),
(4, null),
(5, 2),
(6, 1)
) T(OrderNo, Value)
)
select st.*,
(array_remove(array_agg(value) over (order by orderno rows between current row and unbounded following), null))[1] as x
from SourceTable st
order by OrderNo;
Here is the db<>fiddle.
Or using a lateral join:
select st.*, st2.value
from SourceTable st left join lateral
(select st2.*
from SourceTable st2
where st2.value is not null and st2.orderno >= st.orderno
order by st2.orderno asc
limit 1
) st2
on 1=1
order by OrderNo;
With the right indexes on the source table, the lateral join might be the best solution from a performance perspective (I have been surprised by the performance of lateral joins under the right circumstances).
It's pretty easy to see why first_value doesn't work if you order the results by case when Value is not null then 0 else 1 end, orderno
orderno | value | x
---------+-------+---
2 | 5 | 5
5 | 2 | 2
6 | 1 | 1
1 | |
3 | |
4 | |
(6 rows)
For orderno=1, there's nothing after it in the frame that would be not-null.
Instead, we can arrange the orders into groups using count as a window function in a sub-query. We then use max as a window function over that group (this is arbitrary, min would work just as well) to get the one non-null value in that group:
with SourceTable as (
select *
from (values
(1, null)
,(2, 5)
,(3, null)
,(4, null)
,(5, 2)
,(6, 1)
) as T(OrderNo, Value)
)
select orderno, order_group, max(value) OVER (PARTITION BY order_group) FROM (
SELECT *,
count(value) OVER (ORDER BY orderno DESC) as order_group
from SourceTable
) as sub
order by orderno;
orderno | order_group | max
---------+-------------+-----
1 | 3 | 5
2 | 3 | 5
3 | 2 | 2
4 | 2 | 2
5 | 2 | 2
6 | 1 | 1
(6 rows)

SELECT TOP 20 Percent SQL

I have a query which can select TOP 20 percent of TOP highest with GrandTotal. But there is something is not fair. For example, in between the Top 20 out of 10 People is 2. So the out put is show this:
EmpName GrandTotal
Kelvin 50
Gem 40
But the grand total of the 3rd and 4th people also having 40 as Grand Total. I need some idea and advice, how i going to do solve this problem?
SELECT TOP 20 PERCENT
EmpName,
SUM(Scoring) AS GrandTotal
FROM
[masterView]
GROUP BY
EmpName
ORDER BY
GrandTotal DESC, EmpName ASC
On SQL server you can use WITH TIES in order to include ties
SELECT TOP 20 PERCENT WITH TIES Id, sum(Score) as GrandTotal
FROM myTable GROUP BY Id
ORDER BY GrandTotal DESC
SQL Fiddle Demo
Test Data
CREATE TABLE Table1
([ID] int, [Score] int)
;
INSERT INTO Table1
([ID], [Score])
VALUES
(1, 10), (2, 20),
(3, 30), (4, 20),
(5, 10), (6, 40),
(7, 40), (8, 50),
(9, 10), (10, 5);
Query
with ranked as (
select
id,
rank() over (order by Score desc) as rnk
from Table1
),
total as (
select count(*) as total
from Table1
)
SELECT *
FROM ranked
CROSS JOIN total
WHERE ranked.rnk <= 0.2 * total.total
OUTPUT
| id | rnk | total |
|----|-----|-------|
| 8 | 1 | 10 |
| 6 | 2 | 10 |
| 7 | 2 | 10 |

How to GROUP entries BY uninterrupted sequence?

CREATE TABLE entries (
id serial NOT NULL,
title character varying,
load_sequence integer
);
and data
INSERT INTO entries(title, load_sequence) VALUES ('A', 1);
INSERT INTO entries(title, load_sequence) VALUES ('A', 2);
INSERT INTO entries(title, load_sequence) VALUES ('A', 3);
INSERT INTO entries(title, load_sequence) VALUES ('A', 6);
INSERT INTO entries(title, load_sequence) VALUES ('B', 4);
INSERT INTO entries(title, load_sequence) VALUES ('B', 5);
INSERT INTO entries(title, load_sequence) VALUES ('B', 7);
INSERT INTO entries(title, load_sequence) VALUES ('B', 8);
Is there a way in PostgreSQL to write SQL that groups data by same title segments after ordering them by load_sequence.
I mean:
=# SELECT id, title, load_sequence FROM entries ORDER BY load_sequence;
id | title | load_sequence
----+-------+---------------
9 | A | 1
10 | A | 2
11 | A | 3
13 | B | 4
14 | B | 5
12 | A | 6
15 | B | 7
16 | B | 8
AND I want groups:
=# SELECT title, string_agg(id::text, ',' ORDER BY id) FROM entries ???????????;
so result would be:
title | string_agg
-------+-------------
A | 9,10,11
B | 13,14
A | 12
B | 15,16
You can use the following query:
SELECT title, string_agg(id::text, ',' ORDER BY id)
FROM (
SELECT id, title,
ROW_NUMBER() OVER (ORDER BY load_sequence) -
ROW_NUMBER() OVER (PARTITION BY title
ORDER BY load_sequence) AS grp
FROM entries ) AS t
GROUP BY title, grp
Calculated grp field serves to identify slices of title records having consecutive load_sequence values. Using this field in the GROUP BY clause we can achieve the required aggregation over id values.
Demo here
There's a trick you can use with sum as a window function running over a lagged window for this.
The idea is that when you hit an edge/discontinuity you return 1, otherwise you return 0. You detect the discontinuities using the lag window function.
SELECT title, string_agg(id::text, ', ') FROM (
SELECT
id, title, load_sequence,
sum(title_changed) OVER (ORDER BY load_sequence) AS partition_no
FROM (
SELECT
id, title, load_sequence,
CASE WHEN title = lag(title, 1) OVER (ORDER BY load_sequence) THEN 0 ELSE 1 END AS title_changed FROM entries
) x
) y
GROUP BY partition_no, title;