How to identify pattern in SQL

How to identify pattern in SQL - sql

This is my table. It does consist of A,B and C columns. Only one column value will be true at one time.
My task is to identify pattern based on latest five rows.
For example
I need to search entire table to find whenever these five values were repeated.
If they were repeated, what was the next value avilable for these pattern and show how many times does A, B and C values were found after the pattern.
How this can be done in SQL? I am using oracle 11g. Thanks.

You can convert your a, b, c value to a trinary number and then calculate a value for that row and the previous 4 as if the trinary values for the rows comprised a 5-digit trinary number and then use analytic functions to find the next occurrence and to count the occurrences:
SELECT id,
a,
b,
c,
CASE
WHEN grp_value IS NULL
THEN NULL
ELSE MIN(id) OVER (
PARTITION BY grp_value
ORDER BY id
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING
) + 1
END AS row_after_next_match,
CASE
WHEN grp_value IS NULL
THEN 0
ELSE COUNT(id) OVER ( PARTITION BY grp_value )
END AS num_matches
FROM (
SELECT id,
a,
b,
c,
value,
81 * LAG(value,4) OVER ( ORDER BY id ) +
27 * LAG(value,3) OVER ( ORDER BY id ) +
9 * LAG(value,2) OVER ( ORDER BY id ) +
3 * LAG(value,1) OVER ( ORDER BY id ) +
1 * value AS grp_value
FROM (
SELECT id,
a,
b,
c,
DECODE(1,a,0,b,1,c,2) AS value
FROM table_name
)
)
ORDER BY id
Which, for the sample data:
CREATE TABLE table_name (
id PRIMARY KEY,
a,
b,
c,
CHECK (a IN (0,1)),
CHECK (b IN (0,1)),
CHECK (c IN (0,1)),
CHECK (a+b+c = 1)
) AS
SELECT 1, 1, 0, 0 FROM DUAL UNION ALL
SELECT 2, 1, 0, 0 FROM DUAL UNION ALL
SELECT 3, 0, 1, 0 FROM DUAL UNION ALL
SELECT 4, 1, 0, 0 FROM DUAL UNION ALL
SELECT 5, 0, 1, 0 FROM DUAL UNION ALL
SELECT 6, 0, 0, 1 FROM DUAL UNION ALL
SELECT 7, 1, 0, 0 FROM DUAL UNION ALL
SELECT 8, 0, 1, 0 FROM DUAL UNION ALL
SELECT 9, 1, 0, 0 FROM DUAL UNION ALL
SELECT 10, 0, 1, 0 FROM DUAL UNION ALL
SELECT 11, 0, 0, 1 FROM DUAL UNION ALL
SELECT 12, 1, 0, 0 FROM DUAL UNION ALL
SELECT 13, 1, 0, 0 FROM DUAL UNION ALL
SELECT 14, 1, 0, 0 FROM DUAL UNION ALL
SELECT 15, 1, 0, 0 FROM DUAL UNION ALL
SELECT 16, 1, 0, 0 FROM DUAL UNION ALL
SELECT 17, 1, 0, 0 FROM DUAL UNION ALL
SELECT 18, 1, 0, 0 FROM DUAL UNION ALL
SELECT 19, 1, 0, 0 FROM DUAL UNION ALL
SELECT 20, 1, 0, 0 FROM DUAL
Outputs:
ID
A
B
C
ROW_AFTER_NEXT_MATCH
NUM_MATCHES
1
1
0
0
0
2
1
0
0
0
3
0
1
0
0
4
1
0
0
0
5
0
1
0
1
6
0
0
1
12
2
7
1
0
0
13
2
8
0
1
0
1
9
1
0
0
1
10
0
1
0
1
11
0
0
1
2
12
1
0
0
2
13
1
0
0
1
14
1
0
0
1
15
1
0
0
1
16
1
0
0
18
5
17
1
0
0
19
5
18
1
0
0
20
5
19
1
0
0
21
5
20
1
0
0
5
db<>fiddle here

Related

Order by multiple columns in the SELECT query

How can i order the results in my select query to have them like this?
1, 1, 0
1, 2, 0
1, 3, 0
1, 1, 1
1, 2, 1
1, 3, 1
2, 1, 0
2, 2, 0
2, 1, 1
2, 2, 1
I tried this query but the result is not what I'm looking for:
select * from my_table order by col1, col2, col3
In which col1 represents the first number, col2 is the second one and col3 is the last number in the above example.
This query returns:
1, 1, 0
1, 1, 1
1, 2, 0
1, 2, 1
...
Thanks

Sort should be 1-3-2, I'd say. See line #15.
SQL> with test (c1, c2, c3) as
2 (select 2, 1, 0 from dual union all
3 select 1, 3, 1 from dual union all
4 select 1, 1, 1 from dual union all
5 select 1, 1, 0 from dual union all
6 select 1, 2, 0 from dual union all
7 select 2, 2, 0 from dual union all
8 select 2, 2, 1 from dual union all
9 select 2, 1, 1 from dual union all
10 select 1, 3, 0 from dual union all
11 select 1, 2, 1 from dual
12 )
13 select *
14 from test
15 order by c1, c3, c2;
C1 C2 C3
---------- ---------- ----------
1 1 0
1 2 0
1 3 0
1 1 1
1 2 1
1 3 1
2 1 0
2 2 0
2 1 1
2 2 1
10 rows selected.
SQL>

Google Bigquery: Retain Previous Value of Column

I have 2 columns named claim_no & n Proc_rank.Trying to use below logic.Please help here
Logic
a) if claim_no=proc_rank then linenum=1
b) if claim_no<>proc_rank then a+1
c) if claim_no=proc_rank then value of b
d) if claim_no<>proc_rank then c+1
I tried with Lag Function with case statement, but not getting desired results & recursive queries not supported by Google Big query.

Below is for BigQuery Standard SQL
#standardSQL
SELECT *, 1 + COUNTIF(claim_no != n_Proc_rank) OVER(ORDER BY ts) linenum
FROM `project.dataset.table`
if to apply to sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 ts, 1 claim_no, 1 n_Proc_rank UNION ALL
SELECT 2, 0, 0 UNION ALL
SELECT 3, 0, 0 UNION ALL
SELECT 4, 1, 1 UNION ALL
SELECT 5, 0, 1 UNION ALL
SELECT 6, 0, 0 UNION ALL
SELECT 7, 0, 1 UNION ALL
SELECT 8, 0, 1 UNION ALL
SELECT 9, 0, 1 UNION ALL
SELECT 10, 0, 1 UNION ALL
SELECT 11, 0, 0 UNION ALL
SELECT 12, 0, 1 UNION ALL
SELECT 13, 0, 0 UNION ALL
SELECT 14, 0, 1 UNION ALL
SELECT 15, 0, 1 UNION ALL
SELECT 16, 0, 1 UNION ALL
SELECT 17, 0, 1
)
SELECT *, 1 + COUNTIF(claim_no != n_Proc_rank) OVER(ORDER BY ts) linenum
FROM `project.dataset.table`
-- ORDER BY ts
result is
Row ts claim_no n_Proc_rank linenum
1 1 1 1 1
2 2 0 0 1
3 3 0 0 1
4 4 1 1 1
5 5 0 1 2
6 6 0 0 2
7 7 0 1 3
8 8 0 1 4
9 9 0 1 5
10 10 0 1 6
11 11 0 0 6
12 12 0 1 7
13 13 0 0 7
14 14 0 1 8
15 15 0 1 9
16 16 0 1 10
17 17 0 1 11
Note: you must have some extra column that defines order or processing, so in my example I added column ts. It can be anything - integer position or date/timestamp, etc.

oracle - group by - no aggregation

I have an oracle table , where ref_id is the flag field is the type of data and ORN is the order of data in each ref_id :
ref_id data ORN flag
1 100 0 0
1 200 1 0
1 300 2 0
1 400 3 0
1 110 0 1
1 210 1 1
1 150 0 2
1 250 1 2
1 350 2 2
1 450 3 2
2 500 0 0
2 600 1 0
2 700 2 0
2 800 3 0
2 120 0 1
2 220 1 1
2 320 1 1
2 420 1 1
2 170 0 2
2 270 1 2
2 370 2 2
2 470 3 2
I need to group the data in a way to get last data in flag 0 and last data in flag 2 for each ref_id
so the new table will be something like this:
ref_id data_1 data_2
1 400 450
2 800 470
any hint how to accomplish this without using loops?

You can use the analytical function and group by as follows:
SELECT REF_ID,
MAX(CASE WHEN FLAG = 0 THEN DATA END) AS DATA_0,
MAX(CASE WHEN FLAG = 2 THEN DATA END) AS DATA_2
FROM
(
SELECT REF_ID, DATA, ORN, FLAG,
ROW_NUMBER() OVER (PARTITION BY REF_ID, FLAG ORDER BY ORN DESC) AS RN
FROM YOUR_TABLE
WHERE FLAG IN (0,2)
)
WHERE RN = 1
GROUP BY REF_ID

Alternatively use a two step approach, first (in the CTE) select only the values of the DATA column that corresponds to the last ORN within the REF_ID
Note that is case the ORNis not unique you may get more than one row potentially with different values.
In the next step simple aggregate on REF_ID, I'm using max function, i.e. this will get that highest value of DATA in case of ties.
In case the combination of REF_ID and ORN is unique (primary key) you may use MIN and MAX interchangeable, but it is good to know that they will provide diffremt result if dups are allowed.
with agg as (
select
REF_ID,FLAG, DATA, ORN,
case when flag = 0 and ORN = max(ORN) over (partition by REF_ID, FLAG) then data end as data_0,
case when flag = 2 and ORN = max(ORN) over (partition by REF_ID, FLAG) then data end as data_2
from tab
)
select REF_ID,
max(data_0) as data_0,
max(data_2) as data_2
from agg
group by REF_ID
order by 1;
Here the result of the CTE
REF_ID FLAG DATA ORN DATA_0 DATA_2
---------- ---------- ---------- ---------- ---------- ----------
1 0 100 0
1 0 200 1
1 0 300 2
1 0 400 3 400
1 1 110 0
1 1 210 1
1 2 150 0
1 2 250 1
1 2 350 2
1 2 450 3 450
...
and the result of the final query
REF_ID DATA_0 DATA_2
---------- ---------- ----------
1 400 450
2 800 470

You may use the aggregate functions (FIRST/LAST) for the purpose.
https://docs.oracle.com/database/121/SQLRF/functions074.htm#SQLRF00641
https://docs.oracle.com/database/121/SQLRF/functions095.htm#SQLRF00653.
with t (ref_id,data,ORN,flag) as (
select 1, 100, 0, 0 from dual union all
select 1, 200, 1, 0 from dual union all
select 1, 300, 2, 0 from dual union all
select 1, 400, 3, 0 from dual union all
select 1, 110, 0, 1 from dual union all
select 1, 210, 1, 1 from dual union all
select 1, 150, 0, 2 from dual union all
select 1, 250, 1, 2 from dual union all
select 1, 350, 2, 2 from dual union all
select 1, 450, 3, 2 from dual union all
select 2, 500, 0, 0 from dual union all
select 2, 600, 1, 0 from dual union all
select 2, 700, 2, 0 from dual union all
select 2, 800, 3, 0 from dual union all
select 2, 120, 0, 1 from dual union all
select 2, 220, 1, 1 from dual union all
select 2, 320, 1, 1 from dual union all
select 2, 420, 1, 1 from dual union all
select 2, 170, 0, 2 from dual union all
select 2, 270, 1, 2 from dual union all
select 2, 370, 2, 2 from dual union all
select 2, 470, 3, 2 from dual
)
select
ref_id
, max(decode(flag, 0, data)) keep (dense_rank last order by decode(flag, 0, 100, 50), orn ) x
, max(decode(flag, 2, data)) keep (dense_rank last order by decode(flag, 2, 100, 50), orn ) y
-- or
, min(decode(flag, 0, data)) keep (dense_rank first order by decode(flag, 0, 50, 100), orn desc) xx
, min(decode(flag, 2, data)) keep (dense_rank first order by decode(flag, 2, 50, 100), orn desc) yy
from t
group by ref_id
REF_ID X Y XX YY
---------- ---------- ---------- ---------- ----------
1 400 450 400 450
2 800 470 800 470

BigQuery SQL : Rolling count distinct bounded between two conditions

I am trying to find the rolling countdistinct of ip_var bounded between two events (in two different columns in Bigquery SQL).
eg i have a table of the form :
id TIME_STAMP event_1 event_2 ip_var
A 1 0 0 1
A 2 1 0 1
A 2 0 0 2
A 3 0 0 2
A 4 0 0 3
A 5 0 1 4
A 6 0 0 1
A 7 0 0 1
B 1 0 0 2
B 2 0 0 2
B 2 1 0 3
B 3 0 0 3
B 4 0 0 3
B 4 0 1 4
B 6 0 0 5
B 7 0 0 6
For each id , i need the countdistinct of ip_var when the event_1 happens till event_2 happens , its always guaranteed that even2 happens after event_1.
I have tried using rolling count for the problem without much success.
Final output looks like
id bounded_count
A 2
B 1

Below is for BigQuery Standard SQL
#standardSQL
SELECT id, COUNT(DISTINCT ip_var) bounded_count
FROM (
SELECT *,
COUNTIF(event_1 = 1) OVER(win ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) grp,
COUNTIF(event_1 = 1) OVER(win ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) != COUNTIF(event_2 = 1) OVER(win) qualify
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY id ORDER BY time_stamp)
)
WHERE qualify
GROUP BY id, grp
if to apply to sample data from your question - result is
Row id bounded_count
1 A 2
2 B 1
Note: above solution also works in case if you have multiple qualified pairs, like in below example (same code, I just added more rows into sample data)
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'A' id, 1 time_stamp, 0 event_1, 0 event_2, 1 ip_var UNION ALL
SELECT 'A', 2, 1, 0, 1 UNION ALL
SELECT 'A', 2, 0, 0, 2 UNION ALL
SELECT 'A', 3, 0, 0, 2 UNION ALL
SELECT 'A', 4, 0, 0, 3 UNION ALL
SELECT 'A', 5, 0, 1, 4 UNION ALL
SELECT 'A', 6, 0, 0, 1 UNION ALL
SELECT 'A', 7, 0, 0, 1 UNION ALL
SELECT 'A', 12, 1, 0, 1 UNION ALL
SELECT 'A', 13, 0, 0, 2 UNION ALL
SELECT 'A', 14, 0, 0, 3 UNION ALL
SELECT 'A', 15, 0, 0, 4 UNION ALL
SELECT 'A', 16, 0, 0, 5 UNION ALL
SELECT 'A', 17, 0, 1, 1 UNION ALL
SELECT 'A', 18, 0, 0, 1 UNION ALL
SELECT 'B', 1, 0, 0, 2 UNION ALL
SELECT 'B', 2, 0, 0, 2 UNION ALL
SELECT 'B', 2, 1, 0, 3 UNION ALL
SELECT 'B', 3, 0, 0, 3 UNION ALL
SELECT 'B', 4, 0, 0, 3 UNION ALL
SELECT 'B', 5, 0, 1, 4 UNION ALL
SELECT 'B', 6, 0, 0, 5 UNION ALL
SELECT 'B', 7, 0, 0, 6
)
SELECT id, COUNT(DISTINCT ip_var) bounded_count, grp
FROM (
SELECT *,
COUNTIF(event_1 = 1) OVER(win ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) grp,
COUNTIF(event_1 = 1) OVER(win ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) != COUNTIF(event_2 = 1) OVER(win) qualify
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY id ORDER BY time_stamp)
)
WHERE qualify
GROUP BY id, grp
with result
Row id bounded_count grp
1 A 2 1
2 A 4 2
3 B 1 1

Hmmm . . . You can use window functions to calculate the timestamps for each event. The rest is just filtering and aggregation:
WITH t as (
SELECT "A" as id, 1 as time_stamp, 0 as event_1, 0 as event_2, 1 as ip_var UNION ALL
SELECT "A", 2, 1, 0, 1 UNION ALL
SELECT "A", 2, 0, 0, 2 UNION ALL
SELECT "A", 3, 0, 0, 2 UNION ALL
SELECT "A", 4, 0, 0, 3 UNION ALL
SELECT "A", 5, 0, 1, 4 UNION ALL
SELECT "A", 6, 0, 0, 1 UNION ALL
SELECT "A", 7, 0, 0, 1 UNION ALL
SELECT "B", 1, 0, 0, 2 UNION ALL
SELECT "B", 2, 0, 0, 2 UNION ALL
SELECT "B", 2, 1, 0, 3 UNION ALL
SELECT "B", 3, 0, 0, 3 UNION ALL
SELECT "B", 4, 0, 0, 3 UNION ALL
SELECT "B", 4, 0, 1, 4 UNION ALL
SELECT "B", 6, 0, 0, 5 UNION ALL
SELECT "B", 7, 0, 0, 6
)
select id, count(distinct ip_var) as bounded_count
from (select t.*,
min(case when event_1 = 1 then time_stamp end) over (partition by id) as timestamp_1,
max(case when event_2 = 1 then time_stamp end) over (partition by id) as timestamp_2
from t
) t
where time_stamp > timestamp_1 and time_stamp < timestamp_2
group by id

One way to do it is:
Find out start_time and end_time for each ID
For each ID, filter out events that are not in counting window
Count distinct ip_var
In order to print out intermediate step, I used temp table to demonstrate the idea. You should make second temp table id_start_end a WITH clause to be more efficient.
CREATE TEMP TABLE t as
SELECT "A" id, 1 time_stamp, 0 event_1, 0 event_2, 1 ip_var UNION ALL
SELECT "A", 2, 1, 0, 1 UNION ALL
SELECT "A", 2, 0, 0, 2 UNION ALL
SELECT "A", 3, 0, 0, 2 UNION ALL
SELECT "A", 4, 0, 0, 3 UNION ALL
SELECT "A", 5, 0, 1, 4 UNION ALL
SELECT "A", 6, 0, 0, 1 UNION ALL
SELECT "A", 7, 0, 0, 1 UNION ALL
SELECT "B", 1, 0, 0, 2 UNION ALL
SELECT "B", 2, 0, 0, 2 UNION ALL
SELECT "B", 2, 1, 0, 3 UNION ALL
SELECT "B", 3, 0, 0, 3 UNION ALL
SELECT "B", 4, 0, 0, 3 UNION ALL
SELECT "B", 4, 0, 1, 4 UNION ALL
SELECT "B", 6, 0, 0, 5 UNION ALL
SELECT "B", 7, 0, 0, 6;
CREATE TEMP TABLE id_start_end AS
SELECT ids.id, t_start.time_stamp as start_time, t_end.time_stamp as end_time FROM
(SELECT DISTINCT id FROM t) ids
JOIN t AS t_start ON ids.id = t_start.id AND t_start.event_1 = 1
JOIN t AS t_end ON ids.id = t_end.id AND t_end.event_2 = 1;
SELECT * FROM id_start_end;
SELECT t.id, COUNT(DISTINCT ip_var)
FROM t JOIN id_start_end
ON t.id = id_start_end.id
AND t.time_stamp < id_start_end.end_time
AND t.time_stamp > id_start_end.start_time
GROUP BY t.id
Output table id_start_end:
+----+------------+----------+
| id | start_time | end_time |
+----+------------+----------+
| A | 2 | 5 |
| B | 2 | 4 |
+----+------------+----------+
Final output:
+----+-----+
| id | f0_ |
+----+-----+
| B | 1 |
| A | 2 |
+----+-----+

Oracle PL/SQL: How to find duplicate sequences in large table?

I have a ~20000 row table like this (seq = sequence):
id seq_num seq_count seq_id a b c d
----------------------------------------------------
1 1 3 A400 1 0 0 0
2 2 3 A400 0 1 0 0
3 3 3 A400 0 0 1 0
4 1 2 V2303 1 1 1 1
5 2 2 V2303 1 1 1 1
6 1 3 G2 1 0 0 0
7 2 3 G2 0 1 0 0
8 3 3 G2 0 0 1 0
9 1 3 U900 1 0 0 0
10 2 3 U900 2 2 1 1
11 3 3 U900 5 3 8 5
I want to find the seq_id of a-b-c-d sequences that have duplicates in the table, could just be a dbms_ouput.put_line or anything. So as you can see, seq_id G2 is a duplicate of A400 because all of their rows match up, but U900 has no duplicates even though one row matches A400 and G2.
Is there a good way to check for duplicates like this on large sets of data? I cannot create new tables to temporarily hold data. So far I've been trying with cursors mostly but no luck.
Thank you, let me know if you need any more info about my problem.

Oracle Setup:
CREATE TABLE table_name ( id, seq_num, seq_count, seq_id, a, b, c, d ) AS
SELECT 1, 1, 3, 'A400', 1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 2, 2, 3, 'A400', 0, 1, 0, 0 FROM DUAL UNION ALL
SELECT 3, 3, 3, 'A400', 0, 0, 1, 0 FROM DUAL UNION ALL
SELECT 4, 1, 2, 'V2303', 1, 1, 1, 1 FROM DUAL UNION ALL
SELECT 5, 2, 2, 'V2303', 1, 1, 1, 1 FROM DUAL UNION ALL
SELECT 6, 1, 3, 'G2', 1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 7, 2, 3, 'G2', 0, 1, 0, 0 FROM DUAL UNION ALL
SELECT 8, 3, 3, 'G2', 0, 0, 1, 0 FROM DUAL UNION ALL
SELECT 9, 1, 3, 'U900', 1, 0, 0, 0 FROM DUAL UNION ALL
SELECT 10, 2, 3, 'U900', 2, 2, 1, 1 FROM DUAL UNION ALL
SELECT 11, 3, 3, 'U900', 5, 3, 8, 5 FROM DUAL;
Query:
SELECT s.seq_id,
t.seq_id AS matched_seq_id
FROM table_name s
INNER JOIN
table_name t
ON ( s.seq_num = t.seq_num
AND s.seq_count = t.seq_count
AND s.seq_id < t.seq_id
AND s.a = t.a
AND s.b = t.b
AND s.c = t.c
AND s.d = t.d )
GROUP BY
t.seq_id,
s.seq_id
HAVING COUNT( DISTINCT t.seq_num ) = MAX( t.seq_count );
Results:
SEQ_ID MATCHED_SEQ_ID
------ --------------
A400 G2

Assuming results fit in a string about 2000 characters long, the fastest way is probably to use listagg():
select abcds, listagg(seq_id, ',') within group (order by seq_id)
from (select seq_id, listagg(a||b||c||d, ',') within group (order by seq_num) as abcds
from table_name
group by seq_id
) t
group by abcds
having count(*) >= 2;
This returns the matches as a comma-delimited list.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to identify pattern in SQL - sql

Related

Order by multiple columns in the SELECT query

Google Bigquery: Retain Previous Value of Column

oracle - group by - no aggregation

BigQuery SQL : Rolling count distinct bounded between two conditions

Oracle PL/SQL: How to find duplicate sequences in large table?

Categories

Resources