create window group based on value of preceding row - sql

I have a table like so:
#standardSQL
WITH k AS (
SELECT 1 id, 1 subgrp, 'stuff1' content UNION ALL
SELECT 2, 2, 'stuff2' UNION ALL
SELECT 3, 3, 'stuff3' UNION ALL
SELECT 4, 4, 'stuff4' UNION ALL
SELECT 5, 1, 'ostuff1' UNION ALL
SELECT 6, 2, 'ostuff2' UNION ALL
SELECT 7, 3, 'ostuff3' UNION ALL
SELECT 8, 4, 'ostuff4'
)
and like to group based on the subgrp value to re-create the missing grp: if subgrp value is smaller than previous row, belongs to same group.
Intermediate result would be:
| id | grp | subgrp | content |
| 1 | 1 | 1 | stuff1 |
| 2 | 1 | 2 | stuff2 |
| 3 | 1 | 3 | stuff3 |
| 4 | 1 | 4 | stuff4 |
| 5 | 2 | 1 | ostuff1 |
| 6 | 2 | 2 | ostuff2 |
| 7 | 2 | 3 | ostuff3 |
| 8 | 2 | 4 | ostuff4 |
on which I can then apply
SELECT id, grp, ARRAY_AGG(STRUCT(subgrp, content)) rcd
FROM k ORDER BY id, grp
to have I nice nested structure.
Notes:
with 'id' ordered, subgrp is always in sequence so no 3 before 2
groups are not always 4 subgrp's - this is just to illustrate so cannot hardcode
Problem: how can I (re)create the grp column here ? I played with several Window functions to no avail.
EDIT
Although Gordon's answer work, it took 3min over 104M records to run and I had to remove an ORDER BY on the final resultset because of Resources exceeded during execution: The query could not be executed in the allotted memory. ORDER BY operator used too much memory.
Anyone having an alternative solution for large dataset ?

A simple way to assign the group is to do a cumulative count of the subgrp = 1 values:
select k.*,
sum(case when subgrp = 1 then 1 else 0 end) over (order by id) as grp
from k;
You can also do it your way, using lag() and a cumulative sum. That requires a subquery:
select k.*,
sum(case when prev_subgrp = subgrp then 0 else 1 end) over (order by id) as grp
from (select k.*,
lag(subgrp) over (order by id) as prev_subgrp
from k
) k

Below can potentially perform better - but has limitation - I assume there is no gaps in numbering within subgroups and respective ids
#standardSQL
WITH k AS (
SELECT 1 id, 1 subgrp, 'stuff1' content UNION ALL
SELECT 2, 2, 'stuff2' UNION ALL
SELECT 3, 3, 'stuff3' UNION ALL
SELECT 4, 4, 'stuff4' UNION ALL
SELECT 5, 1, 'ostuff1' UNION ALL
SELECT 6, 2, 'ostuff2' UNION ALL
SELECT 7, 3, 'ostuff3' UNION ALL
SELECT 8, 4, 'ostuff4'
)
SELECT
ROW_NUMBER() OVER(ORDER BY id) grp,
rcd
FROM (
SELECT
MIN(id) id,
ARRAY_AGG(STRUCT(subgrp, content)) rcd
FROM k
GROUP BY id - subgrp
)
result is
Row grp rcd.subgrp rcd.content
1 1 1 stuff1
2 stuff2
3 stuff3
4 stuff4
2 2 1 ostuff1
2 ostuff2
3 ostuff3
4 ostuff4

Related

Counting Rows under a Specific Header Row

I am trying to count the number of rows under specific "header rows" - for example, I have a table that looks like this:
Row # | Description | Repair_Code | Data Type
1 | FRONT LAMP | (null) | Header
2 | left head lamp | 1235 | Database
3 | right head lamp | 1236 | Database
4 | ROOF | (null) | Header
5 | headliner | 1567 | Database
6 | WHEELS | (null) | Header
7 | right wheel | 1145 | Database
Rows 1, 4 and 6 are header rows (categories) and the others are descriptors under each of those categories. The Data Type column denotes if the row is a header or not.
I want to be able to count the number of rows under the header rows to return something that looks like:
Header | Occurrences
FRONT LAMP | 2
ROOF | 1
WHEELS | 1
Thank you for the help!
Data model looks wrong. If that's some kind of a hierarchy, table should have yet another column which represents a "parent row#".
The way it is now, it's kind of questionable whether you can - or can not - do what you wanted. The only thing you can rely on is row#, which is sequential in your example. If that's not the case, then you have a problem.
So: if you use a lead analytic function for all header rows, then you could do something like this (sample data in rows #1 - 7; query that might help begins at line #8):
SQL> with test (rn, description, code) as
2 (select 1, 'front lamp' , null from dual union all
3 select 2, 'left head lamp' , 1235 from dual union all
4 select 3, 'right head lamp', 1236 from dual union all
5 select 4, 'roof' , null from dual union all
6 select 5, 'headliner' , 1567 from dual
7 ),
8 hdr as
9 -- header rows
10 (select rn,
11 description,
12 lead(rn) over (order by rn) next_rn
13 from test
14 where code is null
15 )
16 select h.description,
17 count(*)
18 from hdr h join test t on t.rn > h.rn
19 and (t.rn < h.next_rn or h.next_rn is null)
20 group by h.description;
DESCRIPTION COUNT(*)
--------------- ----------
front lamp 2
roof 1
SQL>
If data model was different (note parent_rn column), then you wouldn't depend on sequential row# values, but
SQL> with test (rn, description, code, parent_rn) as
2 (select 0, 'items' , null, null from dual union all
3 select 1, 'front lamp' , null, 0 from dual union all
4 select 2, 'left head lamp' , 1235, 1 from dual union all
5 select 3, 'right head lamp', 1236, 1 from dual union all
6 select 4, 'roof' , null, 0 from dual union all
7 select 5, 'headliner' , 1567, 4 from dual
8 ),
9 calc as
10 (select parent_rn,
11 sum(case when code is null then 0 else 1 end) cnt
12 from test
13 connect by prior rn = parent_rn
14 start with parent_rn is null
15 group by parent_rn
16 )
17 select t.description,
18 c.cnt
19 from test t join calc c on c.parent_rn = t.rn
20 where nvl(c.parent_rn, 0) <> 0;
DESCRIPTION CNT
--------------- ----------
front lamp 2
roof 1
SQL>
I would approach this using window functions. Assign a group to each header by doing a cumulative count of the NULL values of repair_code. Then aggregate:
select max(case when repair_code is null then description end) as description,
count(repair_code) as cnt
from (select t.*,
sum(case when repair_code is null then 1 else 0 end) over (order by row#) as grp
from t
) t
group by grp
order by min(row#);
Here is a db<>fiddle.

create sequence of numbers on grouped column in Oracle

Consider below table with column a,b,c.
a b c
3 4 5
3 4 5
6 4 1
1 1 8
1 1 8
1 1 0
1 1 0
I need a select statement to get below output. i.e. increment column 'rn' based on group of column a,b,c.
a b c rn
3 4 5 1
3 4 5 1
6 4 1 2
1 1 8 3
1 1 8 3
1 1 0 4
1 1 0 4
You can use the DENSE_RANK analytic function to get a unique ID for each combination of A, B, and C. Just note that if a new value is inserted into the table, the IDs of each combination of A, B, and C will shift and may not be the same.
Query
WITH
my_table (a, b, c)
AS
(SELECT 3, 4, 5 FROM DUAL
UNION ALL
SELECT 3, 4, 5 FROM DUAL
UNION ALL
SELECT 6, 4, 1 FROM DUAL
UNION ALL
SELECT 1, 1, 8 FROM DUAL
UNION ALL
SELECT 1, 1, 8 FROM DUAL
UNION ALL
SELECT 1, 1, 0 FROM DUAL
UNION ALL
SELECT 1, 1, 0 FROM DUAL)
SELECT t.*, DENSE_RANK () OVER (ORDER BY b desc, c desc, a) as rn
FROM my_table t;
Result
A B C RN
____ ____ ____ _____
3 4 5 1
3 4 5 1
6 4 1 2
1 1 8 3
1 1 8 3
1 1 0 4
1 1 0 4
As a starter: for your answer to make sense at all, you need a column that defines the ordering of the rows. Let me assume that you have such column, called id.
Then, you can use window functions:
select a, b, c,
sum(case when a = lag_a and b = lag_b and c = lag_c then 0 else 1 end) over(order by id) rn
from (
select t.*,
lag(a) over(order by id) lag_a,
lag(b) over(order by id) lag_b,
lag(c) over(order by id) lag_c
from mytable t
) t
Assuming you have some way of ordering your rows, then you can use MATCH_RECOGNIZE:
SELECT a, b, c, rn
FROM table_name
MATCH_RECOGNIZE (
ORDER BY id
MEASURES MATCH_NUMBER() AS rn
ALL ROWS PER MATCH
PATTERN ( FIRST_ROW EQUAL_ROWS* )
DEFINE EQUAL_ROWS AS (
EQUAL_ROWS.a = PREV( EQUAL_ROWS.a )
AND EQUAL_ROWS.b = PREV( EQUAL_ROWS.b )
AND EQUAL_ROWS.c = PREV( EQUAL_ROWS.c )
)
)
So, for your test data:
CREATE TABLE table_name ( id, a, b, c ) AS
SELECT 1, 3, 4, 5 FROM DUAL UNION ALL
SELECT 2, 3, 4, 5 FROM DUAL UNION ALL
SELECT 3, 6, 4, 1 FROM DUAL UNION ALL
SELECT 4, 1, 1, 8 FROM DUAL UNION ALL
SELECT 5, 1, 1, 8 FROM DUAL UNION ALL
SELECT 6, 1, 1, 0 FROM DUAL UNION ALL
SELECT 7, 1, 1, 0 FROM DUAL;
Outputs:
A | B | C | RN
-: | -: | -: | -:
3 | 4 | 5 | 1
3 | 4 | 5 | 1
6 | 4 | 1 | 2
1 | 1 | 8 | 3
1 | 1 | 8 | 3
1 | 1 | 0 | 4
1 | 1 | 0 | 4
db<>fiddle here
It can also be done without any ordering, by getting the distinct groups and numbering each group. Borrowing the first part from EJ Egjed:
WITH my_table (a, b, c) AS
(SELECT 3, 4, 5 FROM DUAL
UNION ALL
SELECT 3, 4, 5 FROM DUAL
UNION ALL
SELECT 6, 4, 1 FROM DUAL
UNION ALL
SELECT 1, 1, 8 FROM DUAL
UNION ALL
SELECT 1, 1, 8 FROM DUAL
UNION ALL
SELECT 1, 1, 0 FROM DUAL
UNION ALL
SELECT 1, 1, 0 FROM DUAL)
, groups as (select distinct a, b, c
from my_table)
, groupnums as (select rownum as num, a, b, c
from groups)
select a, b, c, num
from my_table join groupnums using(a,b,c);

Select rows when a value appears multiple times

I have a table like this one:
+------+------+
| ID | Cust |
+------+------+
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 2 | A |
| 2 | A |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 3 | B |
+------+------+
I would like to get the IDs that have at least two times A and two times B. So in my example, the query should return only the ID 1,
Thanks!
In MySQL:
SELECT id
FROM test
GROUP BY id
HAVING GROUP_CONCAT(cust ORDER BY cust SEPARATOR '') LIKE '%aa%bb%'
In Oracle
WITH cte AS ( SELECT id, LISTAGG(cust, '') WITHIN GROUP (ORDER BY cust) custs
FROM test
GROUP BY id )
SELECT id
FROM cte
WHERE custs LIKE '%aa%bb%'
I would just use two levels of aggregation:
select id
from (select id, cust, count(*) as cnt
from t
where cust in ('A', 'B')
group by id, cust
) ic
group by id
having count(*) = 2 and -- both customers are in the result set
min(cnt) >= 2 -- and there are at least two instances
This is one option; lines #1 - 13 represent sample data. Query you might be interested in begins at line #14.
SQL> with test (id, cust) as
2 (select 1, 'a' from dual union all
3 select 1, 'a' from dual union all
4 select 1, 'b' from dual union all
5 select 1, 'b' from dual union all
6 select 2, 'a' from dual union all
7 select 2, 'a' from dual union all
8 select 2, 'a' from dual union all
9 select 2, 'b' from dual union all
10 select 3, 'a' from dual union all
11 select 3, 'b' from dual union all
12 select 3, 'b' from dual
13 )
14 select id
15 from (select
16 id,
17 sum(case when cust = 'a' then 1 else 0 end) suma,
18 sum(case when cust = 'b' then 1 else 0 end) sumb
19 from test
20 group by id
21 )
22 where suma = 2
23 and sumb = 2;
ID
----------
1
SQL>
You can use group by and having for the relevant Cust ('A' , 'B')
And query twice (I chose to use with to avoid multiple selects and to cache it)
with more_than_2 as
(
select Id, Cust, count(*) c
from tab
where Cust in ('A', 'B')
group by Id, Cust
having count(*) >= 2
)
select *
from tab
where exists ( select 1 from more_than_2 where more_than_2.Id = tab.Id and more_than_2.Cust = 'A')
and exists ( select 1 from more_than_2 where more_than_2.Id = tab.Id and more_than_2.Cust = 'B')
What you want is a perfect candidate for match_recognize. Here you go:
select id_ as id from t
match_recognize
(
order by id, cust
measures id as id_
pattern (A {2, } B {2, })
define A as cust = 'A',
B as cust = 'B'
)
Output:
Regards,
Ranagal

oracle query to obtain all major version segmented message values

I need to crete one sql Oracle query to obtain the major versions of each segmented message values.
I have the next tables with their relationships already filled with example registers:
*MESSAGE_TABLE*
ID NAME
1 hello
2 bye
*SEGMENT_TABLE*
ID VALUE
1 development
2 production
*MESSAGE_VALUE_TABLE*
ID ID_MESSAGE ID_SEGMENT VERSION VALUE
1 1 1 2 hello
2 1 1 1 hi
3 1 2 1 hi
4 1 null 3 hi
5 1 null 4 hello
6 2 1 1 bye
7 2 1 2 good bye
MESSAGE_VALUE_TABLE UNIQUE_CONSTRAINT is (ID_MESSAGE, ID_SEGMENT, VERSION)
ID_SEGMENT is nullable because null segment indicates default values.
VERSION is a simple number field.
The query has to obtain the major versions of each segmented message values (query results must include the segment value):
Selected result rows from MESSAGE_VALUE_TABLE are:
ID ID_MESSAGE ID_SEGMENT VERSION VALUE
1 1 1 2 hello
3 1 2 1 hi
5 1 null 4 hello
7 2 1 2 good bye
Query return values should be (same order as the previous selected rows list):
NAME(MESSAGE_TABLE) VALUE (SEGMENT_TABLE) VALUE (MESSAGE_VALUE_TABLE)
hello development hello
hello production hi
hello null / empty hello
bye development good bye
The solution is here, thanks to San that did the hard work:
WITH tab AS (SELECT ID,
id_message,
id_segment,
CASE WHEN lead(nvl(id_segment, -1)) over (partition by id_message ORDER BY id_segmento, id_version) IS NULL
THEN 1
WHEN (nvl(id_segment, -1) != lead(nvl(id_segment, -1)) over (partition by id_message ORDER BY id_segmento, id_version))
THEN 1
ELSE 0
END change_ind,
version,
VALUE
FROM MESSAGE_VALUE_TABLE)
SELECT b.NAME, nvl(c.VALUE, 'null/empty'), a.VALUE
FROM tab a
JOIN MESSAGE_TABLE b ON (b.ID=a.id_message)
LEFT OUTER JOIN SEGMENT_TABLE c ON (c.ID=a.id_segment)
WHERE change_ind = 1
As per my understanding, you want something like
you can get first part as
WITH MESSAGE_VALUE_TABLE(ID,ID_MESSAGE,ID_SEGMENT,VERSION,VALUE) as
(select 1,1,1,1,'hi' from dual union all
select 2,1,1,2,'hello' from dual union all
select 3,1,2,1,'hi' from dual union all
select 4,1,null,3,'hi' from dual union all
select 5,1,null,4,'hello' from dual union all
select 6,1,1,1,'hi' from dual union all
select 7,1,1,2,'hello' from dual union all
select 8,1,2,3,'hi' from dual union all
select 9,1,null,1,'hi' from dual union all
select 10,1,null,2,'hello' from dual union all
select 11,2,1,1,'bye' from dual union all
select 12,2,1,2,'good bye' from dual)
------
---End of data
------
SELECT id, id_message, id_segment, version, VALUE
from (
SELECT ID,
id_message,
id_segment,
CASE WHEN lead(nvl(id_segment, -1)) over (partition by id_message ORDER BY ID) IS NULL
THEN 1
WHEN (nvl(id_segment, -1) != lead(nvl(id_segment, -1)) over (partition by id_message ORDER BY ID))
THEN 1
ELSE 0
END change_ind,
version,
VALUE
FROM MESSAGE_VALUE_TABLE)
where change_ind = 1;
Output:
| ID | ID_MESSAGE | ID_SEGMENT | VERSION | VALUE |
|----|------------|------------|---------|----------|
| 2 | 1 | 1 | 2 | hello |
| 3 | 1 | 2 | 1 | hi |
| 5 | 1 | (null) | 4 | hello |
| 7 | 1 | 1 | 2 | hello |
| 8 | 1 | 2 | 3 | hi |
| 10 | 1 | (null) | 2 | hello |
| 12 | 2 | 1 | 2 | good bye |
And second Part as
WITH MESSAGE_VALUE_TABLE(ID,ID_MESSAGE,ID_SEGMENT,VERSION,VALUE) as
(select 1,1,1,1,'hi' from dual union all
select 2,1,1,2,'hello' from dual union all
select 3,1,2,1,'hi' from dual union all
select 4,1,null,3,'hi' from dual union all
select 5,1,null,4,'hello' from dual union all
select 6,1,1,1,'hi' from dual union all
select 7,1,1,2,'hello' from dual union all
select 8,1,2,3,'hi' from dual union all
select 9,1,null,1,'hi' from dual union all
select 10,1,null,2,'hello' from dual union all
select 11,2,1,1,'bye' from dual union all
select 12,2,1,2,'good bye' from dual),
MESSAGE_TABLE(ID,NAME) AS
(SELECT 1 , 'hello' FROM dual UNION ALL
SELECT 2, 'bye' FROM dual),
SEGMENT_TABLE(ID,VALUE) AS
(SELECT 1,'development' FROM dual UNION ALL
SELECT 2,'production' FROM dual),
------
---End of data
------
tab AS (SELECT ID,
id_message,
id_segment,
CASE WHEN lead(nvl(id_segment, -1)) over (partition by id_message ORDER BY ID) IS NULL
THEN 1
WHEN (nvl(id_segment, -1) != lead(nvl(id_segment, -1)) over (partition by id_message ORDER BY ID))
THEN 1
ELSE 0
END change_ind,
version,
VALUE
FROM MESSAGE_VALUE_TABLE)
SELECT b.NAME, nvl(c.VALUE, 'null/empty') C_VALUE, a.VALUE
FROM tab a
JOIN MESSAGE_TABLE b ON (b.ID=a.id_message)
LEFT OUTER JOIN SEGMENT_TABLE c ON (c.ID=a.id_segment)
WHERE change_ind = 1
ORDER BY a.ID
Output:
| NAME | C_VALUE | VALUE |
|-------|-------------|----------|
| hello | development | hello |
| hello | production | hi |
| hello | null/empty | hello |
| hello | development | hello |
| hello | production | hi |
| hello | null/empty | hello |
| bye | development | good bye |
So your final query will be :
WITH tab AS (SELECT ID,
id_message,
id_segment,
CASE WHEN lead(nvl(id_segment, -1)) over (partition by id_message ORDER BY ID) IS NULL
THEN 1
WHEN (nvl(id_segment, -1) != lead(nvl(id_segment, -1)) over (partition by id_message ORDER BY ID))
THEN 1
ELSE 0
END change_ind,
version,
VALUE
FROM MESSAGE_VALUE_TABLE)
SELECT b.NAME, nvl(c.VALUE, 'null/empty'), a.VALUE
FROM tab a
JOIN MESSAGE_TABLE b ON (b.ID=a.id_message)
LEFT OUTER JOIN SEGMENT_TABLE c ON (c.ID=a.id_segment)
WHERE change_ind = 1
ORDER BY a.ID
Not fully sure what you are trying to achieve. Try analytical query.
select message_name, segment_value, message_value from (
select m.name message_name, s.value segment_value, v.value message_value,
dense_rank() over (partition by v.id_message, v.id_segment order by version desc nulls last) version_rank
from message_value_table v
inner join message_table m on v.id_message = m.id
left outer join segment_table s on v.id_segment = s.id
) where version_rank = 1
;
http://sqlfiddle.com/#!4/35aa7/10

Sorting by max value [duplicate]

This question already has answers here:
How to select records with maximum values in two columns?
(2 answers)
Closed 9 years ago.
I have a table that looks like this in an Oracle DB:
TransactionID Customer_id Sequence Activity
---------- ------------- ---------- -----------
1 85 1 Forms
2 51 2 Factory
3 51 1 Forms
4 51 3 Listing
5 321 1 Forms
6 321 2 Forms
7 28 1 Text
8 74 1 Escalate
And I want to be able to sort out all rows where sequence is the highest for each customer_id.
I there a MAX() function I could use on sequence but based on customer_id somehow?
I would like the result of the query to look like this:
TransactionID Customer_id Sequence Activity
---------- ------------- ---------- -----------
1 85 1 Forms
4 51 3 Listing
6 321 2 Forms
7 28 1 Text
8 74 1 Escalate
select t1.*
from your_table t1
inner join
(
select customer_id, max(Sequence) mseq
from your_table
group by customer_id
) t2 on t1.customer_id = t2.customer_id and t1.sequence = t2.mseq
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE tbl ( TransactionID, Customer_id, Sequence, Activity ) AS
SELECT 1, 85, 1, 'Forms' FROM DUAL
UNION ALL SELECT 2, 51, 2, 'Factory' FROM DUAL
UNION ALL SELECT 3, 51, 1, 'Forms' FROM DUAL
UNION ALL SELECT 4, 51, 3, 'Listing' FROM DUAL
UNION ALL SELECT 5, 321, 1, 'Forms' FROM DUAL
UNION ALL SELECT 6, 321, 2, 'Forms' FROM DUAL
UNION ALL SELECT 7, 28, 1, 'Text' FROM DUAL
UNION ALL SELECT 8, 74, 1, 'Escalate' FROM DUAL;
Query 1:
SELECT
MAX( TransactionID ) KEEP ( DENSE_RANK LAST ORDER BY Sequence ) AS TransactionID,
Customer_ID,
MAX( Sequence ) KEEP ( DENSE_RANK LAST ORDER BY Sequence ) AS Sequence,
MAX( Activity ) KEEP ( DENSE_RANK LAST ORDER BY Sequence ) AS Activity
FROM tbl
GROUP BY Customer_ID
ORDER BY TransactionID
Results:
| TRANSACTIONID | CUSTOMER_ID | SEQUENCE | ACTIVITY |
|---------------|-------------|----------|----------|
| 1 | 85 | 1 | Forms |
| 4 | 51 | 3 | Listing |
| 6 | 321 | 2 | Forms |
| 7 | 28 | 1 | Text |
| 8 | 74 | 1 | Escalate |
Please Try it
with cte as
(
select Customer_id,MAX(Sequence) as p from Tablename group by Customer_id
)
select b.* from cte a join Tablename b on a.p = b.Sequence where a.p = b.Sequence and a.Customer_id=b.Customer_id order by b.TransactionID