SQL Server 2008: find number of contiguous rows with equal values - sql

I have a table with multiple Ids. Each Id has values arranged by a sequential index.
create table myValues
(
id int,
ind int,
val int
)
insert into myValues
values
(21, 5, 300),
(21, 4, 310),
(21, 3, 300),
(21, 2, 300),
(21, 1, 345),
(21, 0, 300),
(22, 5, 300),
(22, 4, 300),
(22, 3, 300),
(22, 2, 300),
(22, 1, 395),
(22, 0, 300)
I am trying to find the number of consecutive values that are the same.
The value field represents some data that should be change on each entry (but need not be unique overall).
The problem is to find out when there are more than two consecutive rows with the same value (given the same id).
Thus I'm looking for an output like this:
id ind val count
21 5 300 1
21 4 310 1
21 3 300 2
21 2 300 2
21 1 345 1
21 0 300 1
22 5 300 4
22 4 300 4
22 3 300 4
22 2 300 4
22 1 395 1
22 0 300 1
I'm aware this is similar to the island and gaps problem discussed here.
However, those solutions all hinge on the ability to use a partition statement with values that are supposed to be consecutively increasing.
A solution that generates the ranges of "islands" as an intermediary would work as well, e.g.
id startind endind
21 3 2
22 5 2
Note that there can be many islands for each id.
I'm sure there is a simple adaptation of the island solution, but for the life of me I can't think of it.

find the continuous group and then do a count() partition by that
select id, ind, val, count(*) over (partition by id, val, grp)
from
(
select *, grp = dense_rank() over (partition by id, val order by ind) - ind
from myValues
) d
order by id, ind desc

The other solution is obviously more elegant. I'll have to study it a little closer myself.
with agg(id, min_ind, max_ind, cnt) as (
select id, min(ind), max(ind), count(*)
from
(
select id, ind, val, sum(brk) over (partition by id order by ind desc) as grp
from
(
select
id, ind, val,
coalesce(sign(lag(ind) over (partition by id, val order by ind desc) - ind - 1), 1) as brk
from myValues
) as d
) as d
group by id, grp
)
select v.id, v.ind, v.val, a.cnt
from myValues v inner join agg a on a.id = v.id and v.ind between min_ind and max_ind
order by v.id, v.ind desc;

Related

GROUP by Largest String for all the substrings

I have a table like this where some rows have the same grp but different names. I want to group them by name such that all the substrings after removing nonalphanumeric characters are aggregated together and grouped by the largest string. The null value is considered the substring of all the strings.
grp
name
value
1
ab&c
10
1
abc d e
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
Desired result
grp
name
value
1
abcde
111
1
xy
34
2
fgh
87
My query-
Select grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g') name, sum(value) value
from table
group by grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g');
Result
grp
name
value
1
abc
10
1
abcde
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
What changes should I make in my query?
To solve this problem, I did the following (all of the code below is available on the fiddle here).
CREATE TABLE test
(
grp SMALLINT NOT NULL,
name TEXT NULL,
value SMALLINT NOT NULL
);
and populate it using your data + extra for testing:
INSERT INTO test VALUES
(1, 'ab&c', 10),
(1, 'abc d e', 56),
(1, 'ab', 21),
(1, 'a', 23),
(1, NULL, 1000000),
(1, 'r*&%$s', 100), -- added for testing.
(1, 'rs__t', 101),
(1, 'rs__tu', 101),
(1, 'xy', 1111),
(1, NULL, 1000000),
(2, 'fgh', 87),
(2, 'fgh', 13), -- For Charlieface
(2, NULL, 1000000),
(2, 'x', 50),
(2, 'x', 150),
(2, 'x----y', 100);
Then, you can use this query:
WITH t1 AS
(
SELECT
grp, n_str,
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str),
CASE
WHEN
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str) IS NULL
OR
POSITION
(
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str)
IN
n_str
) = 0
THEN 1
ELSE 0
END AS change,
value
FROM
test t1
CROSS JOIN LATERAL
(
VALUES
(
REGEXP_REPLACE(name,'[^a-zA-Z0-9]+', '', 'g')
)
) AS v(n_str)
WHERE n_str IS NOT NULL
), t2 AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY grp, s_change ORDER BY grp, n_str DESC) AS rn,
grp, n_str,
SUM(value) OVER (PARTITION BY grp, s_change) AS s_val,
MAX(LENGTH(n_str)) OVER (PARTITION BY grp) AS max_nom
FROM
(
SELECT
grp, n_str, change,
SUM(change) OVER (ORDER BY grp, n_str) AS s_change,
value
FROM
t1
ORDER BY grp, n_str DESC
) AS sub1
), t3 AS
(
SELECT
grp, SUM(value) AS null_sum
FROM
test
WHERE name IS NULL
GROUP BY grp
)
SELECT x.grp, x.n_str, x.s_val + y.null_sum
FROM t2 x
JOIN t3 y
ON x.max_nom = LENGTH(x.n_str) AND x.grp = y.grp
UNION
SELECT grp, n_str, s_val
FROM
t2 WHERE max_nom != LENGTH(n_str) AND rn = 1
ORDER BY grp, n_str;
Result:
grp n_str ?column?
1 abcde 2000110
1 rstu 302
1 xy 1111
2 fgh 1000100
2 xy 300
A few points to note:
Please always provide a fiddle when you ask questions such as this one with tables and data - it provides a single source of truth for the question and eliminates duplication of effort on the part of those trying to help you!
You haven't been very clear about what, exactly, should happen with NULLs - do the values count towards the SUM()? You can vary the CASE statement as required.
What happens when there's a tie in the number of characters in the string? I've included an example in the fiddle, where you get the draws - but you may wish to sort alphabetically (or some other method)?
There appears to be an error in your provided sums for the values (even taking account of counting or not values for NULL for the name field).
Finally, you don't want to GROUP BY the largest string - you want to GROUP BY the grp fields + the SUM() of the values in the the given grp records and then pick out the longest alphanumeric string in that grouping. It would be interesting to know why you want to do this?

Select rows using group by and in each group get column values based on highest of another column value

I need to get latest field based on another field in group by
we have
Table "SchoolReview"
Id
SchoolId
Review
Point
1
1
rv1
8
2
1
rv2
7
3
2
rv3
4
4
2
rv4
7
5
3
rv5
2
6
3
rv6
8
I need to group by SchoolId and the inside group I need to get Review and Point from highest "Id" column.
I dont need "Id" coulmn but even if I get it for this solution its okay.
Result I am looking for shall look like this.
SchoolId
Review
Point
1
rv2
7
2
rv4
7
3
rv6
8
Any one experienced in MS SQL Server can help in this regard?
Using sample data from other answer
SELECT *
INTO #Data
FROM (VALUES
(1, 1, 'rv1', 8),
(2, 1, 'rv2', 7),
(3, 2, 'rv3', 4),
(4, 2, 'rv4', 7),
(5, 3, 'rv5', 2),
(6, 3, 'rv6', 8)
) v (Id, SchoolId, Review, Point)
SELECT S.SchoolId,
S.Review,
S.Point
FROM #Data S
INNER JOIN
(
SELECT Id = MAX(S1.Id),
S1.SchoolId
FROM #Data S1
GROUP BY SchoolId
) X ON X.Id = S.Id AND X.schoolId = S.SchoolId
ORDER BY X.SchoolId
;
output
You do not need to group the rows, you simply need to select the appropriate rows from the table. In this case, using ROW_NUMBER() is an option:
Table:
SELECT *
INTO Data
FROM (VALUES
(1, 1, 'rv1', 8),
(2, 1, 'rv2', 7),
(3, 2, 'rv3', 4),
(4, 2, 'rv4', 7),
(5, 3, 'rv5', 2),
(6, 3, 'rv6', 8)
) v (Id, SchoolId, Review, Point)
Statement:
SELECT SchoolId, Review, Point
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY SchoolId ORDER BY Id DESC) AS Rn
FROM Data
) t
WHERE Rn = 1
Result:
SchoolId Review Point
---------------------
1 rv2 7
2 rv4 7
3 rv6 8

Create groups for INSERT based on IDs of source table

I'm trying to regroup some Ids, and create unique ID in a grouping table (acting as a bridge table in my data warehouse).
I got in a the source table transactionId, UserId and OrganisationId.
What I want to do is to create "groups" of OrganisationId, depending on User association, in order to use this group in a joint.
See below:
Original datas
TransaID UserId OrgaId
---------- -------- --------
24011035 1 180
24011035 1 19
24011040 2 89
24011064 3 89
24011070 4 19
24011082 4 180
24011106 5 89
24011106 5 180
24011107 6 180
Desired output
OrgaGroupId OrgaId
------------- --------
1 180
1 19
2 89
3 180
3 89
4 180
I've created 1 group for combination of Orga 180 and 19, as 2 users got it.
Two simple groups for OrgaId 89 and 180, as they appear only once associated to a user.
And finally, another group for 180 and 89, as it's a new combination.
What would be the T-SQL statements to achieve output exposed above?
declare #t table(TransaID int, UserId int, OrgaId int);
insert into #t(TransaID, UserId, OrgaId)
values
(24011035, 1, 180),
(24011035, 1, 19),
(24011040, 2, 89),
(24011064, 3, 89),
(24011070, 4, 19),
(24011082, 4, 180),
(24011106, 5, 89),
(24011106, 5, 180),
(24011107, 6, 180);
select /*straggr,*/ OrgaId, dense_rank() over(order by min(UserId)) as grpid
from
(
select a.*,
(select distinct concat(b.OrgaId, ',') from #t as b where b.UserId = a.UserId order by concat(b.OrgaId, ',') for xml path('')) as straggr
from #t as a
) as t
group by straggr, OrgaId;

Remove duplicates values when all values are the same

I am using SQL workbench/J connecting to amazon redshift.
I have the following data in a table (there are more columns that need to be kept but are all the exact same values for each unique claim_id regardless of line number):
Member ID | Claim_ID | Line_Number |
1 100 1
1 100 2
1 100 1
1 100 2
2 101 13
2 101 13
2 101 13
2 101 13
3 102 12
3 102 12
1 103 2
1 103 2
I want it to become the following which will remove any duplicates based on claim_id (it does not matter which line number is kept):
Member ID | Claim_ID | Line_Number |
1 100 1
2 101 13
3 102 12
1 103 2
I have tried the following:
select er_main.member_id, er_main.claim_id, er_main.line_number,
temp.claim_id, temp.line_number
from OK_ER_30 er_main
inner join (
select row_number() over (partition by claim_id order by line_number desc) as seqnum
from
OK_ER_30 temp) temp
ON er_main.claim_id = temp.claim_id and seqnum = 1
Order by er_main.claim_id, temp.line_number
and this:
select * from ok_er_30
where claim_id in
(select distinct claim_id
from ok_er_30
group by claim_id
)
order by claim_id desc
I have checked many other ways of pulling only one row per distinct claim_id but nothing has worked.
try this
select Distant(Member_ID,Claim_ID,max(Line_Number)) group by Member_ID,Claim_ID
Check out the following code.
declare #OK_ER_30 table(Member_ID int, Claim_ID int, Line_Number int);
insert #OK_ER_30 values
(1, 100, 1),
(1, 100, 2),
(1, 100, 1),
(1, 100, 2),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(2, 101, 13),
(3, 102, 12),
(3, 102, 12),
(1, 103, 2),
(1, 103, 2);
with
t as(
select *, row_number() over(
partition by Member_ID, Claim_ID order by (select 0)
) rn
from #OK_ER_30
)
delete from t where rn > 1;
select * from #OK_ER_30;
Try this,
select Member_ID,Claim_ID,max(Line_Number) group by Member_ID,Claim_ID

How to filter rows on a complex filter

I have these rows in a table
ID Name Price Delivery
== ==== ===== ========
1 apple 1 1
2 apple 3 2
3 apple 6 3
4 apple 9 4
5 orange 4 6
6 orange 5 7
I want to have the price at the third delivery (Delivery=3) or the last price if there's no third delivery.
It would give me this :
ID Name Price Delivery
== ==== ===== ========
3 apple 6 3
6 orange 5 7
I don't necessary want a full solution but an idea of what to look for would be greatly appreciated.
SQL> create table t (id,name,price,delivery)
2 as
3 select 1, 'apple', 1, 1 from dual union all
4 select 2, 'apple', 3, 2 from dual union all
5 select 3, 'apple', 6, 3 from dual union all
6 select 4, 'apple', 9, 4 from dual union all
7 select 5, 'orange', 4, 6 from dual union all
8 select 6, 'orange', 5, 7 from dual
9 /
Table created.
SQL> select max(id) keep (dense_rank last order by nullif(delivery,3) nulls last) id
2 , name
3 , max(price) keep (dense_rank last order by nullif(delivery,3) nulls last) price
4 , max(delivery) keep (dense_rank last order by nullif(delivery,3) nulls last) delivery
5 from t
6 group by name
7 /
ID NAME PRICE DELIVERY
---------- ------ ---------- ----------
3 apple 6 3
6 orange 5 7
2 rows selected.
EDIT: Since you want "an idea of what to look for", here is an description of why I think this solution is the best, besides being the query with the least amount of lines. Your expected result set indicates that you want to group your data per fruit name ("group by name"). And of each group you want to keep the values of the records with delivery = 3 or when that number doesn't exists, the last one ("keep (dense_rank last order by nullif(delivery,3) nulls last"). In my opinion, the query above just reads like that. And it uses only one table access to get the result, although my query is not unique in that.
Regards,
Rob.
Use ROW_NUMBER twice - once to filter the rows away that are after the third delivery, and the second time to find the last row remaining (i.e. a typical max per group query).
I've implemented this using CTEs. I tested it in SQL Server but I believe that Oracle supports the same syntax.
WITH T1 AS (
SELECT
ID, Name, Price, Delivery,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Delivery) AS rn
FROM Table1
), T2 AS (
SELECT
t1.*,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Delivery DESC) AS rn2
FROM T1
WHERE rn <= 3
)
SELECT ID, Name, Price, Delivery
FROM T2
WHERE rn2 = 1
Result:
ID Name Price Delivery
3 apple 6 3
6 orange 5 7
select t3.ID, t3.Name, t3.Price, t3.Delivery
from (
select Name, max(Delivery) as MaxDelivery
from MyTable
group by Name
) t1
left outer join MyTable t2 on t1.Name = t2.Name and Delivery = 3
inner join MyTable t3 on t1.Name = t3.name
and t3.Delivery = coalesce(t2.Delivery, t1.MaxDelivery)
Mark's and APC's answers work if you meant the third delivery, regardless of the Delivery number. Here's a solution using analytic functions that specifically searches for a record with Delivery = 3.
CREATE TABLE FRUITS (
ID NUMBER,
Name VARCHAR2(10),
Price INTEGER,
Delivery INTEGER);
INSERT INTO FRUITS VALUES (1, 'apple', 1, 1);
INSERT INTO FRUITS VALUES (2, 'apple', 3, 2);
INSERT INTO FRUITS VALUES (3, 'apple', 6, 3);
INSERT INTO FRUITS VALUES (4, 'apple', 9, 4);
INSERT INTO FRUITS VALUES (5, 'orange', 4, 6);
INSERT INTO FRUITS VALUES (6, 'orange', 5, 7);
INSERT INTO FRUITS VALUES (7, 'pear', 2, 5);
INSERT INTO FRUITS VALUES (8, 'pear', 4, 6);
INSERT INTO FRUITS VALUES (9, 'pear', 6, 7);
INSERT INTO FRUITS VALUES (10, 'pear', 8, 8);
SELECT ID,
Name,
Price,
Delivery
FROM (SELECT ID,
Name,
Price,
Delivery,
SUM(CASE WHEN Delivery = 3 THEN 1 ELSE 0 END)
OVER (PARTITION BY Name) AS ThreeCount,
ROW_NUMBER()
OVER (PARTITION BY Name ORDER BY Delivery DESC) AS rn
FROM FRUITS)
WHERE (ThreeCount <> 0 AND Delivery = 3) OR
(ThreeCount = 0 AND rn = 1)
ORDER BY ID;
DROP TABLE FRUITS;
And the results from Oracle XE 10g:
ID Name Price Delivery
---- ---------- ------- ----------
3 apple 6 3
6 orange 5 7
10 pear 8 8
I included a third fruit in the sample data to illustrate the effect of different interpretations of the question. The other solutions would pick ID=9 for the pear.