Oracle: enumerate groups of similar rows - sql

I have the following table:
ID | X
1 | 1
2 | 2
3 | 5
4 | 6
5 | 7
6 | 9
I need to enumerate groups of rows in such way that if row i and i-1 differ in column X by less than 2 they should have the same group number N. See example below.
ID | X | N
1 | 1 | 1
2 | 2 | 1
3 | 5 | 2
4 | 6 | 2
5 | 7 | 2
6 | 9 | 3
Note that rows X(2)-X(1)=1 so they are grouped in the first group. Than X(3)-X(2)=3 so the 3rd row goes to 2nd group with 3rd and 4th row. X(6)-X(5)=2 so 6th row is in the 3rd group.
Can anybody help me with writing SQL query that will return the second table?

This should do it:
select id, x, sum(new_group) over (order by id) as group_no
from
( select id, x, case when x-prev_x = 1 then 0 else 1 end new_group
from
( select id, x, lag(x) over (order by id) prev_x
from mytable
)
);
I get the correct answer for your data with that query.

SQL> create table mytable (id,x)
2 as
3 select 1, 1 from dual union all
4 select 2, 2 from dual union all
5 select 3, 5 from dual union all
6 select 4, 6 from dual union all
7 select 5, 7 from dual union all
8 select 6, 9 from dual
9 /
Table created.
SQL> select id
2 , x
3 , sum(y) over (order by id) n
4 from ( select id
5 , x
6 , case x - lag(x) over (order by id)
7 when 1 then 0
8 else 1
9 end y
10 from mytable
11 )
12 order by id
13 /
ID X N
---------- ---------- ----------
1 1 1
2 2 1
3 5 2
4 6 2
5 7 2
6 9 3
6 rows selected.
Which is essentially the same as Tony's answer, only one inline view less.
Regards,
Rob.

Using basic operations only:
create table test(id int, x int);
insert into test values(1, 1), (2, 2), (3, 5), (4, 6), (5, 7), (6, 9);
create table temp as
select rownum() r, 0 min, x max
from test t
where not exists(select * from test t2 where t2.x = t.x + 1);
update temp t set min = select max + 1 from temp t2 where t2.r = t.r - 1;
update temp t set min = 0 where min is null;
select * from temp order by r;
select t.id, t.x, x.r from test t, temp x where t.x between x.min and x.max;
drop table test;
drop table temp;

Related

Only return rows with local max [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Select First Row of Every Group in sql [duplicate]
(2 answers)
Return row with the max value of one column per group [duplicate]
(3 answers)
Get value based on max of a different column grouped by another column [duplicate]
(1 answer)
Closed 11 months ago.
The code below gives me something similar to the table below. What I am looking to do is only return the PROVID that has the max count per PATID.
SELECT PAT_ID AS PATID
, VISIT_PROV_ID AS PROVID
, COUNT(*) AS PROVCOUNT
FROM PAT_ENC
GROUP BY PAT_ID, VISIT_PROV_ID
ORDER BY PAT_ID, COUNT(*) DESC
PATID
PROVID
PROVCOUNT
1
3
1
2
4
6
2
3
2
2
8
1
3
4
6
4
1
8
4
2
3
The table below would be the desired result based on the same data from the previous table.
PATID
PROVID
PROVCOUNT
1
3
1
2
4
6
3
4
6
4
1
8
We can user RANK() OVER (PARTITION BY PATID ORDER BY PROVCOUNT DESC), which gives a ranking of 1 to the row with largest value of PROVCOUNT for each PATID, in a CTE and then use WHERE ranking = 1 to only show these largest values
create table PAT_ENC (
PATID int,
PROVID int,
PROVCOUNT int);
INSERT INTO PAT_ENC (PATID, PROVID, PROVCOUNT)
SELECT 1, 3, 1 FROM DUAL UNION ALL
SELECT 2, 4, 6 FROM DUAL UNION ALL
SELECT 2, 3, 2 FROM DUAL UNION ALL
SELECT 2, 8, 1 FROM DUAL UNION ALL
SELECT 3, 4, 6 FROM DUAL UNION ALL
SELECT 4, 1, 8 FROM DUAL UNION ALL
SELECT 4, 2, 3 FROM DUAL;
WITH COUNTED AS
(SELECT
PATID,
PROVID,
PROVCOUNT,
RANK() OVER (PARTITION BY PATID ORDER BY PROVCOUNT DESC) ranking
FROM PAT_ENC)
SELECT
PATID,
PROVID,
PROVCOUNT
FROM COUNTED
WHERE ranking = 1;
PATID | PROVID | PROVCOUNT
----: | -----: | --------:
1 | 3 | 1
2 | 4 | 6
3 | 4 | 6
4 | 1 | 8
db<>fiddle here

create sequence of numbers on grouped column in Oracle

Consider below table with column a,b,c.
a b c
3 4 5
3 4 5
6 4 1
1 1 8
1 1 8
1 1 0
1 1 0
I need a select statement to get below output. i.e. increment column 'rn' based on group of column a,b,c.
a b c rn
3 4 5 1
3 4 5 1
6 4 1 2
1 1 8 3
1 1 8 3
1 1 0 4
1 1 0 4
You can use the DENSE_RANK analytic function to get a unique ID for each combination of A, B, and C. Just note that if a new value is inserted into the table, the IDs of each combination of A, B, and C will shift and may not be the same.
Query
WITH
my_table (a, b, c)
AS
(SELECT 3, 4, 5 FROM DUAL
UNION ALL
SELECT 3, 4, 5 FROM DUAL
UNION ALL
SELECT 6, 4, 1 FROM DUAL
UNION ALL
SELECT 1, 1, 8 FROM DUAL
UNION ALL
SELECT 1, 1, 8 FROM DUAL
UNION ALL
SELECT 1, 1, 0 FROM DUAL
UNION ALL
SELECT 1, 1, 0 FROM DUAL)
SELECT t.*, DENSE_RANK () OVER (ORDER BY b desc, c desc, a) as rn
FROM my_table t;
Result
A B C RN
____ ____ ____ _____
3 4 5 1
3 4 5 1
6 4 1 2
1 1 8 3
1 1 8 3
1 1 0 4
1 1 0 4
As a starter: for your answer to make sense at all, you need a column that defines the ordering of the rows. Let me assume that you have such column, called id.
Then, you can use window functions:
select a, b, c,
sum(case when a = lag_a and b = lag_b and c = lag_c then 0 else 1 end) over(order by id) rn
from (
select t.*,
lag(a) over(order by id) lag_a,
lag(b) over(order by id) lag_b,
lag(c) over(order by id) lag_c
from mytable t
) t
Assuming you have some way of ordering your rows, then you can use MATCH_RECOGNIZE:
SELECT a, b, c, rn
FROM table_name
MATCH_RECOGNIZE (
ORDER BY id
MEASURES MATCH_NUMBER() AS rn
ALL ROWS PER MATCH
PATTERN ( FIRST_ROW EQUAL_ROWS* )
DEFINE EQUAL_ROWS AS (
EQUAL_ROWS.a = PREV( EQUAL_ROWS.a )
AND EQUAL_ROWS.b = PREV( EQUAL_ROWS.b )
AND EQUAL_ROWS.c = PREV( EQUAL_ROWS.c )
)
)
So, for your test data:
CREATE TABLE table_name ( id, a, b, c ) AS
SELECT 1, 3, 4, 5 FROM DUAL UNION ALL
SELECT 2, 3, 4, 5 FROM DUAL UNION ALL
SELECT 3, 6, 4, 1 FROM DUAL UNION ALL
SELECT 4, 1, 1, 8 FROM DUAL UNION ALL
SELECT 5, 1, 1, 8 FROM DUAL UNION ALL
SELECT 6, 1, 1, 0 FROM DUAL UNION ALL
SELECT 7, 1, 1, 0 FROM DUAL;
Outputs:
A | B | C | RN
-: | -: | -: | -:
3 | 4 | 5 | 1
3 | 4 | 5 | 1
6 | 4 | 1 | 2
1 | 1 | 8 | 3
1 | 1 | 8 | 3
1 | 1 | 0 | 4
1 | 1 | 0 | 4
db<>fiddle here
It can also be done without any ordering, by getting the distinct groups and numbering each group. Borrowing the first part from EJ Egjed:
WITH my_table (a, b, c) AS
(SELECT 3, 4, 5 FROM DUAL
UNION ALL
SELECT 3, 4, 5 FROM DUAL
UNION ALL
SELECT 6, 4, 1 FROM DUAL
UNION ALL
SELECT 1, 1, 8 FROM DUAL
UNION ALL
SELECT 1, 1, 8 FROM DUAL
UNION ALL
SELECT 1, 1, 0 FROM DUAL
UNION ALL
SELECT 1, 1, 0 FROM DUAL)
, groups as (select distinct a, b, c
from my_table)
, groupnums as (select rownum as num, a, b, c
from groups)
select a, b, c, num
from my_table join groupnums using(a,b,c);

BigQuery: select the nth smallest value in window, ordered by another value

My table has two integer columns: a and b. For each row, I want to select the nth smallest value of b among the rows with smaller a values. Here's a sample input/output, with n=2.
Input:
a | b
-------
1 | 4
2 | 2
3 | 5
4 | 3
5 | 9
6 | 1
7 | 7
8 | 6
9 | 0
Output:
a | 2th min b
-------------
1 | null ← only 1 element in [4], no 2nd min
2 | 4 ← 2nd min between [4,2]
3 | 4 ← 2nd min between [4,2,5]
4 | 3 ← 2nd min between [4,2,5,3]
5 | 3 ← etc.
6 | 2
7 | 2
8 | 2
9 | 1
I used n=2 here to keep it simple, but in practice, I want the 2000th smallest value (or some other large-ish constant). The column a can be assumed to contain distinct integers (and even 1, 2, 3, … if that's easier).
The problem is that if I use ORDER BY b in my window clause and NTH_VALUE, it just computes the answer on the wrong set of values:
WITH data AS (
SELECT 1 AS a, 4 AS b
UNION ALL SELECT 2 AS a, 2 AS b
UNION ALL SELECT 3 AS a, 5 AS b
UNION ALL SELECT 4 AS a, 3 AS b
UNION ALL SELECT 5 AS a, 9 AS b
UNION ALL SELECT 6 AS a, 1 AS b
)
SELECT nth_value(b, 2) over (order by a)
from data
returns [null, 2, 2, 2, 2, 2]: the values are ordered by a (so in the same order than they appear), so the value b=2 is always the one in second place. I want to order by a and then take the nth smallest value of b. Any idea how to write this in BigQuery (preferably Standard SQL)?
Below is for BigQuery Standard SQL and produces correct result for given example.
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 a, 4 b UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5 UNION ALL
SELECT 4, 3 UNION ALL
SELECT 5, 9 UNION ALL
SELECT 6, 1 UNION ALL
SELECT 7, 7 UNION ALL
SELECT 8, 6 UNION ALL
SELECT 9, 0
)
SELECT
a,
(SELECT b FROM
(SELECT b FROM UNNEST(c) b ORDER BY b LIMIT 2)
ORDER BY b DESC LIMIT 1
) b2
FROM (
SELECT a, IF(ARRAY_LENGTH(c) > 1, c, [NULL]) c
FROM (
SELECT a, ARRAY_AGG(b) OVER (ORDER BY a) c
FROM `project.dataset.table`
)
)
-- ORDER BY a
with expected result as below
Row a b2
1 1 null
2 2 4
3 3 4
4 4 3
5 5 3
6 6 2
7 7 2
8 8 2
9 9 1
Note: to make it work for 2000th element you might change 2 to 2000 in LIMIT 2
meantime, i can admit it looks a little ugly/messy to me and not sure about scalability but you can give it a shot
Quick Update
Below is a little less ugly looking version (same output of course)
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 a, 4 b UNION ALL
SELECT 2, 2 UNION ALL
SELECT 3, 5 UNION ALL
SELECT 4, 3 UNION ALL
SELECT 5, 9 UNION ALL
SELECT 6, 1 UNION ALL
SELECT 7, 7 UNION ALL
SELECT 8, 6 UNION ALL
SELECT 9, 0
)
SELECT a, c[SAFE_ORDINAL(2)] b2 FROM (
SELECT x.a, ARRAY_AGG(y.b ORDER BY y.b LIMIT 2) c
FROM `project.dataset.table` x
CROSS JOIN `project.dataset.table` y
WHERE y.a <= x.a
GROUP BY x.a
)
-- ORDER BY a
For 2000th element replace 2 to 2000 in LIMIT 2 and SAFE_ORDINAL(2)
Still potentially same issue with scalability because of (now) explicit CROSS JOIN

To create a column Summing the values from another column in the same view

View:
A | B
10 1
15 2
12 3
5 2
2 1
2 1
Output View:
A | B | C
10 1 14
15 2 20
12 3 12
5 2 20
2 1 14
2 1 14
I need to sum the values from column A based on column B. So, all the values from column B having value 1 extract values from column A and then sum it to column C.
I don't see the point, but:
SELECT t.a
,t.b
,sumtab.c
FROM [yourtable] t
INNER JOIN (
SELECT t.b
,sum(t.a) AS C
FROM [yourtable] t
GROUP BY t.b
) AS sumtab
ON t.b = sumtab.b
You could use SUM() OVER like this
DECLARE #SampleData AS TABLE
(
A int,
B int
)
INSERT INTO #SampleData
(
A,
B
)
VALUES
( 10, 1),
( 15, 2),
( 12, 3),
( 5 , 2),
( 2 , 1),
( 2 , 1)
SELECT *,
sum(sd.A) OVER(PARTITION BY sd.B) AS C
FROM #SampleData sd
Returns
A B C
-----------
10 1 14
2 1 14
2 1 14
15 2 20
5 2 20
12 3 12

Split words and insert it new table with counting these words

I have the foolwoing table and data.
i need to:
1- split each sentence in each row into new row
2-count the words in each row based on last part of sentence based on soundex function
create table a (id number(9), words varchar(500));
insert into a values(1,'UK,LONDON,YEMEN,JOHN,CAIRO,OMAR ALI,EGYPT,Cairo,YEMAN,OMAR AMR ALI,LONDAN');
insert into a values(2,'UK,SUDAI,SUDAIN,AYHAM SHAHER YAFOOZ,ALI YAFOOZ');
insert into a values(3,'MALAYSIA, AHMED ALI,MALYSIAN');
expexted output
create table temp_words(id number(9),words varchar2(100), count_words number(9));
id words count_words
1 UK 1
1 LONDON 2
1 YEMEN 2
1 CAIRO 2
1 OMAR ALI 2
1 JOHN 1
2 UK 1
2 SUDAI 2
2 AYHAM SHAHER YAFOOZ 2
3 MALAYSIA 2
3 AHMED ALI 1
regards
to split the data as you want you can use a "connect by" as a row generator.
SQL> with src as (select id,',' || words || ',' as words,
2 length(words) - length(translate(words, '.,', '.')) + 1 no_of_words
3 from a)
4 select a.id,
5 substr(a.words,
6 instr(words, ',', 1, r) + 1,
7 instr(words, ',', 1, r + 1) - instr(words, ',', 1, r) - 1) word,
8 a.no_of_words
9 from (select level r
10 from dual
11 connect by level <= (select max(no_of_words) from src)) d
12 inner join src a
13 on d.r <= a.no_of_words
14 where a.no_of_words is not null
15 order by a.id, d.r
16 /
ID WORD NO_OF_WORDS
---------- -------------------- -----------
1 UK 11
1 LONDON 11
1 YEMEN 11
1 JOHN 11
1 CAIRO 11
1 OMAR ALI 11
1 EGYPT 11
1 Cairo 11
1 YEMAN 11
1 OMAR AMR ALI 11
1 LONDAN 11
2 UK 5
2 SUDAI 5
2 SUDAIN 5
2 AYHAM SHAHER YAFOOZ 5
2 ALI YAFOOZ 5
3 MALAYSIA 3
3 AHMED ALI 3
3 MALYSIAN 3
19 rows selected.
SQL>
Here is a SQLFiddle demo
select id,words,
case when i=0 then
SUBSTR(words,
1,
case when INSTR(words,',', 1, 1)=0
then 100000
else
INSTR(words,',', 1, 1)-1
end
)
ELSE
SUBSTR(words,
INSTR(words,',', 1, i)+1,
case when INSTR(words,',', 1, i+1)=0
then 100000
else
INSTR(words,',', 1, i+1)-INSTR(words,',', 1, i)-1
end
)
END word,
i+1 COUNTWORDS
from a,
(
select * from
(
select 0 i from dual
union
select 1 i from dual
union
select 2 i from dual
union
select 3 i from dual
union
select 4 i from dual
union
select 5 i from dual
union
select 6 i from dual
union
select 7 i from dual
union
select 8 i from dual
union
select 9 i from dual
union
select 10 i from dual
union
select 11 i from dual
union
select 12 i from dual
)
)
table_i
where
case when i>0 then INSTR(words,',', 1, i)
else 100000 end <>0
order by id,i
Another approach(using regexp_count and regexp_substr regular expression functions):
SQL> with Occurence(oc) as(
2 select level
3 from ( select max(regexp_count(words, '[^,]+')) ml
4 from a
5 ) t
6 connect by level <= t.ml
7 )
8 select id
9 , word
10 , count(word) over(partition by id, soundex(word) order by id) as count_words
11 From ( select a.id
12 , regexp_substr(words, '[^,]+', 1, o.oc) as word
13 from occurence o
14 cross join a
15 ) s
16 where s.word is not null
17 order by id
18 ;
ID WORD COUNT_WORDS
---------- -------------------- -----------
1 Cairo 2
1 CAIRO 2
1 EGYPT 1
1 JOHN 1
1 LONDAN 2
1 LONDON 2
1 OMAR ALI 1
1 OMAR AMR ALI 1
1 UK 1
1 YEMEN 2
1 YEMAN 2
2 ALI YAFOOZ 1
2 AYHAM SHAHER YAFOOZ 1
2 SUDAI 1
2 SUDAIN 1
2 UK 1
3 AHMED ALI 1
3 MALAYSIA 1
3 MALYSIAN 1
19 rows selected
You would need to insert your data as separate records. You can keep them as a concatenated string, if you like, but it'll just make your life very difficult. So:
create table words (
id number,
w varchar2(100),
s varchar2(4)
);
create or replace trigger words_auto
before insert or update on words
for each row
begin
select trim(upper(:new.w)), soundex(:new.w)
into :new.w, :new.s
from dual;
end;
insert into words (id, w) values (1, 'UK');
insert into words (id, w) values (1, 'LONDON');
...
insert into words (id, w) values (3, ' AHMED ALI');
insert into words (id, w) values (3, 'MALYSIAN');
You could write a procedure to split your concatenated string and populate the words table appropriately. Note that I have created a trigger that normalises your input to uppercase, removing all extraneous whitespace and automatically produces the Soundex codes.
Now here's a question: You want to group words by their Soundex codes; however, how do you determine the baseline? For example, 'LONDON' and 'LONDAN' both have code 'L535', but how do you know which record is the 'main' one?... You can't, without further lookup tables! As such, the best you can do is to group by the Soundex codes. This doesn't have to be stored in a table and it makes more sense to be a view:
create or replace view word_counts as
select id,
s soundex,
count(w) count_rows
from words
group by id,
s;
Note that I've called the count field count_rows as it counts records, rather than distinct rows. That is: records of 'LONDON', 'LONDAN' and 'LONDON' will show with a count of 3, not 2 (which you might be expecting). Anyway, with your data, the view will look something like this:
id soundex count_rows
----- --------- -----------
1 U200 1
1 L535 2
... ... ...
3 M420 2
3 A534 1
As I say, that's really the best you can expect without further infrastructure.