how to find the maximum occurence of a string in Oracle SQL developer - sql

i have 2 columns in a table. Data looks like this
Folio_no | Flag
1145 R
201 S
1145 FR
300 E
1145 R
201 E
201 S
Expected Output:
Folio_No | Flag
1145 R
201 S
300 E
The output should give the folio_no along with the flag which occured maximum number of times for that particular folio number.
i tried doing the below but it throws an error
select folio_no, max(count(flag)) from table group by folio_no;

We can use an aggregation:
WITH cte AS (
SELECT Folio_No, Flag, COUNT(*) AS cnt
FROM yourTable
GROUP BY Folio_No, Flag
),
cte2 AS (
SELECT t.*, RANK() OVER (PARTITION BY Folio_No ORDER BY cnt DESC, Flag) rnk
FROM cte t
)
SELECT Folio_No, Flag
FROM cte2
WHERE rnk = 1;
Note that I assume should two flags within a given folio number be tied for the max frequency, that you want to report the earlier flag.
Here is a working demo.

If you want the flag(s) that have the maximum occurrence for each folio then you can use:
SELECT Folio_No, Flag
FROM (
SELECT Folio_No,
Flag,
RANK() OVER (PARTITION BY Folio_No ORDER BY COUNT(*) DESC) AS rnk
FROM table_name
GROUP BY Folio_No, Flag
)
WHERE rnk = 1;
Which, for the sample data:
CREATE TABLE table_name (folio_no, flag) AS
SELECT 1145, 'R' FROM DUAL UNION ALL
SELECT 201, 'S' FROM DUAL UNION ALL
SELECT 1145, 'FR' FROM DUAL UNION ALL
SELECT 300, 'E' FROM DUAL UNION ALL
SELECT 1145, 'R' FROM DUAL UNION ALL
SELECT 201, 'E' FROM DUAL UNION ALL
SELECT 201, 'S' FROM DUAL UNION ALL
SELECT 201, 'S' FROM DUAL UNION ALL
SELECT 1, 'A' FROM DUAL UNION ALL
SELECT 1, 'A' FROM DUAL UNION ALL
SELECT 1, 'B' FROM DUAL UNION ALL
SELECT 1, 'B' FROM DUAL UNION ALL
SELECT 1, 'C' FROM DUAL UNION ALL
SELECT 1, 'D' FROM DUAL;
Outputs:
FOLIO_NO
FLAG
1
A
1
B
201
S
300
E
1145
R
If you want only a single flag with the maximum occurrence for each folio, and if there are ties then the first folio alphabetically in each folio, then:
SELECT Folio_No, Flag
FROM (
SELECT Folio_No,
Flag,
ROW_NUMBER() OVER (PARTITION BY Folio_No ORDER BY COUNT(*) DESC, flag) AS rn
FROM table_name
GROUP BY Folio_No, Flag
)
WHERE rn = 1;
Which, for the sample data outputs:
FOLIO_NO
FLAG
1
A
201
S
300
E
1145
R
db<>fiddle here

Related

create date range from day based data

i have following source data...
id date value
1 01.08.22 a
1 02.08.22 a
1 03.08.22 a
1 04.08.22 b
1 05.08.22 b
1 06.08.22 a
1 07.08.22 a
2 01.08.22 a
2 02.08.22 a
2 03.08.22 c
2 04.08.22 a
2 05.08.22 a
and i would like to have the following output...
id date_from date_until value
1 01.08.22 03.08.22 a
1 04.08.22 05.08.22 b
1 06.08.22 07.08.22 a
2 01.08.22 02.08.22 a
2 03.08.22 03.08.22 c
2 04.08.22 05.08.22 a
Is this possible with Oracle SQL? Which functions do I need for this?
Based on the link provided by #astentx, try this solution:
SELECT
id, MIN("date") AS date_from, MAX("date") AS date_until, MAX(value) AS value
FROM (
SELECT
t1.*,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY "date") -
ROW_NUMBER() OVER(PARTITION BY id, value ORDER BY "date") AS rn
FROM yourtable t1
)
GROUP BY id, rn
See db<>fiddle
WITH CTE (id, dateD,valueD)
AS
(
SELECT 1, TO_DATE('01.08.22','DD.MM.YY'), 'a' FROM DUAL UNION ALL
SELECT 1, TO_DATE('02.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 1, TO_DATE('03.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 1, TO_DATE('04.08.22','DD.MM.YY'), 'b'FROM DUAL UNION ALL
SELECT 1, TO_DATE('05.08.22','DD.MM.YY'), 'b'FROM DUAL UNION ALL
SELECT 2, TO_DATE('01.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 2, TO_DATE('02.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 2, TO_DATE('03.08.22','DD.MM.YY'), 'c'FROM DUAL
)
SELECT C.ID,C.VALUED,MIN(C.DATED)AS MIN_DATE,MAX(C.DATED)AS MAX_DATE
FROM CTE C
GROUP BY C.ID,C.VALUED
ORDER BY C.ID
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=47c87d60445ce262cd371177e31d5d63

Last changed Data with T as status in Oracle sql

My Data is given below
In the below sample latest record has T and last occurrence of T was updated on 3-Apr-17 so that row needs to be displayed
EMP EFFDT STATUS
11367 15-Apr-15 A
11367 14-Jun-15 A
11367 10-Aug-15 T
11367 2-Apr-17 A
11367 3-Apr-17 T *
11367 10-Apr-17 T
In the below sample latest record has T and last occurrence of T was updated on 23-Feb-18 so that row needs to be displayed
EMP EFFDT STATUS
20612 4-Sep-16 A
20612 23-Feb-18 T *
20612 20-Jul-18 T
In the below sample latest record has T and that is the only occurrence so display it
EMP EFFDT STATUS
20644 12-Jul-15 A
20644 8-Aug-16 A
20644 6-Oct-16 T*
In the below sample latest record does not has T so no need to display
EMP EFFDT STATUS
21155 18-May-17 T
21155 21-Jun-17 A
21155 13-Mar-18 T
21155 15-Aug-18 A
My Desired Output should be (* marked records)
EMP EFFDT STATUS
11367 3-Apr-17 T
20612 23-Feb-18 T
20644 6-Oct-16 T
This is an island and gap problem.
In the cte you try to found out what island have T as last update (t=0)
SQL DEMO
WITH cte as (
SELECT "EMP",
"EFFDT",
SUM(CASE WHEN "STATUS" <> 'T'
THEN 1
ELSE 0
END) OVER (partition by "EMP" ORDER BY "EFFDT" DESC) as t
FROM Table1
)
SELECT "EMP", MIN("EFFDT") as "EFFDT", MAX('T') as "STATUS"
FROM cte
WHERE t = 0
GROUP BY "EMP"
OUTPUT
| EMP | EFFDT | STATUS |
|-------|-----------------------|--------|
| 11367 | 2017-04-03 00:00:00.0 | T |
| 20612 | 2018-02-23 00:00:00.0 | T |
| 20644 | 2016-10-06 00:00:00.0 | T |
For debug you can try
SELECT *
FROM cte
to see how t values are created
WITH cte1
AS (
SELECT A.*
,lag(STATUS, 1, 0) OVER (
PARTITION BY EMP ORDER BY EFFDT
) AS PRIOR_STATUS
FROM Table1 A
)
SELECT EMP
,STATUS
,MAX(EFFDT) AS EFFDT
FROM cte1 A
WHERE A.STATUS = 'T'
AND A.PRIOR_STATUS <> 'T'
GROUP BY EMP
,STATUS
SQL Fiddle here: http://sqlfiddle.com/#!4/458733/18
alter session set nls_date_format = 'dd-Mon-rr';
Solution (including simulated data in with clause):
with
simulated_data (EMP, EFFDT, STATUS) as (
select 11367, to_date('15-Apr-15'), 'A' from dual union all
select 11367, to_date('14-Jun-15'), 'A' from dual union all
select 11367, to_date('10-Aug-15'), 'T' from dual union all
select 11367, to_date( '2-Apr-17'), 'A' from dual union all
select 11367, to_date( '3-Apr-17'), 'T' from dual union all
select 11367, to_date('10-Apr-17'), 'T' from dual union all
select 20612, to_date( '4-Sep-16'), 'A' from dual union all
select 20612, to_date('23-Feb-18'), 'T' from dual union all
select 20612, to_date('20-Jul-18'), 'T' from dual union all
select 20644, to_date('12-Jul-15'), 'A' from dual union all
select 20644, to_date( '8-Aug-16'), 'A' from dual union all
select 20644, to_date( '6-Oct-16'), 'T' from dual union all
select 21155, to_date('18-May-17'), 'T' from dual union all
select 21155, to_date('21-Jun-17'), 'A' from dual union all
select 21155, to_date('13-Mar-18'), 'T' from dual union all
select 21155, to_date('15-Aug-18'), 'A' from dual
)
-- End of simulated data (for testing only).
-- SQL query (solution) begins BELOW THIS LINE.
select emp, min(effdt) as eff_dt, 'T' as status
from (
select emp, effdt, status,
row_number() over (partition by emp, status
order by effdt desc) as rn,
min(status) keep (dense_rank last order by effdt)
over (partition by emp) as last_status
from simulated_data
)
where last_status = 'T' and status = 'T' and rn <= 2
group by emp
;
Output:
EMP EFF_DT STATUS
---------- --------- ------
11367 03-Apr-17 T
20612 23-Feb-18 T
20644 06-Oct-16 T
Explanation:
In the subquery, we add two columns to the input data. Column RN gives a rank within each partition by EMPNO and STATUS, in descending order by EFFDT. LAST_STATUS used the analytic version of the LAST() function to assign either T or A as the last status for each EMP (and it attaches this value to EVERY row for the EMP, regardless of each row's own STATUS).
In the outer query, we are only interested to retain the EMP where the last status was T. For those rows, we only want to retain the rows where the actual status of the row is in fact T (we know this will always include the last row for that EMP, by the way, and it will have RN = 1). Moreover, we are only interested in those rows where RN is 1 or possibly 2 (if there are at least two rows with status T for that EMP). Of these either one or two rows with status T for a given EMP, we want to get the EARLIEST date. That will be the ONLY date if there is no row with RN = 2 for that partition; otherwise, it will be the date from the earlier row, with RN = 2.
In the outer SELECT we select the EMP, the earliest date, and the status we already know, it is T (so we don't need any work for this - actually it is not clear why the third column is even needed, since it is known beforehand it will be T in all rows).
Assuming that A and T are the only statuses, this should work.
WITH cte1
AS (
SELECT A.EMP, A.EFFDT, A.STATUS
,min(STATUS) OVER (
PARTITION BY EMP ORDER BY EFFDT RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS MIN_STATUS
FROM Table1 A
)
SELECT
cte1.EMP
,MIN(cte1.EFFDT) AS EFFDT
,MIN(cte1.STATUS) as STATUS
FROM cte1
WHERE cte1.MIN_STATUS = 'T'
GROUP BY EMP
EDIT: well, if you have another statues, let's make it more robust. Actually, it's almost the same as juan-carlos-oropeza proposed, but he missed "RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING" part.
Ooops, it IS the same solution: juan-carlos-oropeza used order by DESC istead of unbounded following.
with emp_status_log (EMP, EFFDT, STATUS) as
(
select 11367, to_date('15-Apr-15', 'dd-Mon-yy'), 'A' from dual union all
select 11367, to_date('14-Jun-15', 'dd-Mon-yy'), 'A' from dual union all
select 11367, to_date('10-Aug-15', 'dd-Mon-yy'), 'T' from dual union all
select 11367, to_date( '2-Apr-17', 'dd-Mon-yy'), 'A' from dual union all
select 11367, to_date( '3-Apr-17', 'dd-Mon-yy'), 'T' from dual union all
select 11367, to_date('10-Apr-17', 'dd-Mon-yy'), 'T' from dual union all
select 20612, to_date( '4-Sep-16', 'dd-Mon-yy'), 'A' from dual union all
select 20612, to_date('23-Feb-18', 'dd-Mon-yy'), 'T' from dual union all
select 20612, to_date('20-Jul-18', 'dd-Mon-yy'), 'T' from dual union all
select 20644, to_date('12-Jul-15', 'dd-Mon-yy'), 'A' from dual union all
select 20644, to_date( '8-Aug-16', 'dd-Mon-yy'), 'A' from dual union all
select 20644, to_date( '6-Oct-16', 'dd-Mon-yy'), 'T' from dual union all
select 21155, to_date('18-May-17', 'dd-Mon-yy'), 'T' from dual union all
select 21155, to_date('21-Jun-17', 'dd-Mon-yy'), 'A' from dual union all
select 21155, to_date('13-Mar-18', 'dd-Mon-yy'), 'T' from dual union all
select 21155, to_date('15-Aug-18', 'dd-Mon-yy'), 'A' from dual
)
,
-- End of simulated data (for testing only).
/* SQL query (solution) begins BELOW THIS LINE.
with--*/
cte1 as
(
select sl.*
,sum(decode(sl.STATUS, 'T', 0, 1)) OVER (
PARTITION BY sl.EMP ORDER BY sl.EFFDT RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS non_t_count
from emp_status_log sl
)
select
cte1.emp
, min(cte1.effdt) as effdt
, min(cte1.status) as status
from cte1
where cte1.non_t_count = 0
group by cte1.emp

Average sum ignoring max or min

I want to use AVG to get the average of some values, but ignoring the max and min values only if they are 1.5 bellow or above the second max and min values. I will put some examples:
Example 1:
SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL ) D
I need this result, ignoring the 103.1 value:
100.5
101.5
Example 2:
SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL UNION
SELECT 106.2 v FROM DUAL) D
I need this result, ignoring only the 106.2 value:
100.5
101.1
103.1
Example 3:
SELECT *
FROM (
SELECT 100.0 v FROM DUAL UNION
SELECT 102.0 v FROM DUAL UNION
SELECT 103.0 v FROM DUAL UNION
SELECT 105.0 v FROM DUAL UNION
SELECT 107.0 v FROM DUAL) D
I need this result, ignoring 100.0 and 107.0 values:
102.0
103.0
105.0
When there is only two values it doesnt matter.
With the right result, I can AVG(value) correctly.
You need a combination of analytic functions (lead/lag) and conditional aggregation. Here's what I came up with. Note that I allow for multiple groups, the "adjusted" average must be computed for each group separately (a common task in statistics when you must throw out the outliers in each group, when they exist):
with
inputs ( id, val ) as (
select 101, 33 from dual union all
select 102, 23 from dual union all
select 102, 22.8 from dual union all
select 103, 30 from dual union all
select 103, 40 from dual union all
select 104, 90 from dual union all
select 104, 92 from dual union all
select 104, 92 from dual union all
select 104, 91.5 from dual union all
select 104, 91.7 from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id,
avg ( case when cnt >= 3
and ( lag_val is null and lead_val - val >= 1.5
or
lead_val is null and val - lag_val >= 1.5
)
then null
else val
end
) as adjusted_avg_val
from (
select id, val, count(val) over (partition by id) as cnt,
lag ( val ) over ( partition by id order by val ) as lag_val,
lead ( val ) over ( partition by id order by val ) as lead_val
from inputs
)
group by id
;
Output:
ID ADJUSTED_AVG_VAL
--- ----------------
101 33
102 22.9
103 35
104 91.8
Try using the following combination of row_number lead and lag
with cte as (
SELECT 100.5 v FROM DUAL UNION ALL
SELECT 101.5 v FROM DUAL UNION ALL
SELECT 103.1 v FROM DUAL UNION ALL
SELECT 106.2 v FROM DUAL)
-- end of sample data
select avg(v)
from
(
select row_number() over (order by v desc) arn,
row_number() over (order by v) drn,
lag(v) over (order by v) av,
lead(v) over (order by v) dv,
v
from cte
) t
where (arn != 1 and drn != 1) or -- if they are no maximum nor minumum
(drn = 1 and v + 1.5 > dv) or -- if they are minimum
(arn = 1 and v - 1.5 < av) or -- if they are maximum
(av is null and arn < 3) or -- if there are just two ore one value
(dv is null and drn < 3) -- if there are just two ore one value
In SQL you simply just need to express the result, so ...
WITH D as(
SELECT 100.0 v FROM DUAL UNION
SELECT 102.0 FROM DUAL UNION
SELECT 103.0 FROM DUAL UNION
SELECT 105.0 FROM DUAL UNION
SELECT 107.0 FROM DUAL)
SELECT avg(v)
FROM D
where (v < (select max(v) from D )
and ((select max(v) from D )
-(select max(v) from D where v !=
(select max(v) from D ) ) > 1.5))
or
(v > (select min(v) from D )
and ((select min(v) from D )
+(select min(v) from D where v !=
(select min(v) from D ) ) > 1.5))
... should do the trick!
But thinking ahead... the below version may also be useful ;)
WITH D as(
SELECT 1 PK, 100.0 v FROM DUAL UNION
SELECT 1,102.0 FROM DUAL UNION
SELECT 1,103.0 FROM DUAL UNION
SELECT 1,105.0 FROM DUAL UNION
SELECT 1,107.0 FROM DUAL)
SELECT PK,avg(v)
FROM D
where (v < (select max(v) from D group by PK)
and ((select max(v) from D group by PK)
-(select max(v) from D where v !=
(select max(v) from D group by PK) group by PK) > 1.5))
or
(v > (select min(v) from D group by PK)
and ((select min(v) from D group by PK)
+(select min(v) from D where v !=
(select min(v) from D group by PK) group by PK) > 1.5))
GROUP BY PK
In real life though you would consider the execution plans of the above in a large dataset too (homework).
For any further clarifications, I am at your disposal via comments.
Sincerely,
Ted

How to count consecutive duplicates in a table?

I have below question:
Want to find the consecutive duplicates
SLNO NAME PG
1 A1 NO
2 A2 YES
3 A3 NO
4 A4 YES
6 A5 YES
7 A6 YES
8 A7 YES
9 A8 YES
10 A9 YES
11 A10 NO
12 A11 YES
13 A12 NO
14 A14 NO
We will consider the value of PG column and I need the output as 6 which is the count of maximum consecutive duplicates.
It can be done with Tabibitosan method. Run this, to understand it:
with a as(
select 1 slno, 'A' pg from dual union all
select 2 slno, 'A' pg from dual union all
select 3 slno, 'B' pg from dual union all
select 4 slno, 'A' pg from dual union all
select 5 slno, 'A' pg from dual union all
select 6 slno, 'A' pg from dual
)
select slno, pg, newgrp, sum(newgrp) over (order by slno) grp
from(
select slno,
pg,
case when pg <> nvl(lag(pg) over (order by slno),1) then 1 else 0 end newgrp
from a
);
Newgrp means a new group is found.
Result:
SLNO PG NEWGRP GRP
1 A 1 1
2 A 0 1
3 B 1 2
4 A 1 3
5 A 0 3
6 A 0 3
Now, just use a group by with count, to find the group with maximum number of occurrences:
with a as(
select 1 slno, 'A' pg from dual union all
select 2 slno, 'A' pg from dual union all
select 3 slno, 'B' pg from dual union all
select 4 slno, 'A' pg from dual union all
select 5 slno, 'A' pg from dual union all
select 6 slno, 'A' pg from dual
),
b as(
select slno, pg, newgrp, sum(newgrp) over (order by slno) grp
from(
select slno, pg, case when pg <> nvl(lag(pg) over (order by slno),1) then 1 else 0 end newgrp
from a
)
)
select max(cnt)
from (
select grp, count(*) cnt
from b
group by grp
);
with test as (
select 1 slno,'A1' name ,'NO' pg from dual union all
select 2,'A2','YES' from dual union all
select 3,'A3','NO' from dual union all
select 4,'A4','YES' from dual union all
select 6,'A5','YES' from dual union all
select 7,'A6','YES' from dual union all
select 8,'A7','YES' from dual union all
select 9,'A8','YES' from dual union all
select 10,'A9','YES' from dual union all
select 11,'A10','NO' from dual union all
select 12,'A11','YES' from dual union all
select 13,'A12','NO' from dual union all
select 14,'A14','NO' from dual),
consecutive as (select row_number() over(order by slno) rr, x.*
from test x)
select x.* from Consecutive x
left join Consecutive y on x.rr = y.rr+1 and x.pg = y.pg
where y.rr is not null
order by x.slno
And you can control output with condition in where.
where y.rr is not null query returns duplicates
where y.rr is null query returns "distinct" values.
Just for completeness, here's the actual Tabibitosan method:
with sample_data as (select 1 slno, 'A1' name, 'NO' pg from dual union all
select 2 slno, 'A2' name, 'YES' pg from dual union all
select 3 slno, 'A3' name, 'NO' pg from dual union all
select 4 slno, 'A4' name, 'YES' pg from dual union all
select 6 slno, 'A5' name, 'YES' pg from dual union all
select 7 slno, 'A6' name, 'YES' pg from dual union all
select 8 slno, 'A7' name, 'YES' pg from dual union all
select 9 slno, 'A8' name, 'YES' pg from dual union all
select 10 slno, 'A9' name, 'YES' pg from dual union all
select 11 slno, 'A10' name, 'NO' pg from dual union all
select 12 slno, 'A11' name, 'YES' pg from dual union all
select 13 slno, 'A12' name, 'NO' pg from dual union all
select 14 slno, 'A14' name, 'NO' pg from dual)
-- end of mimicking a table called "sample_data" containing your data; see SQL below:
select max(cnt) max_pg_in_queue
from (select count(*) cnt
from (select slno,
name,
pg,
row_number() over (order by slno)
- row_number() over (partition by pg
order by slno) grp
from sample_data)
where pg = 'YES'
group by grp);
MAX_PG_IN_QUEUE
---------------
6
SELECT MAX(consecutives) -- Block 1
FROM (
SELECT t1.pg, t1.slno, COUNT(*) AS consecutives -- Block 2
FROM test t1 INNER JOIN test t2 ON t1.pg = t2.pg
WHERE t1.slno <= t2.slno
AND NOT EXISTS (
SELECT * -- Block 3
FROM test t3
WHERE t3.slno > t1.slno
AND t3.slno < t2.slno
AND t3.pg != t1.pg
)
GROUP BY t1.pg, t1.slno
);
The query calculates the result in following way:
Extract all couples of records that don't have a record with different value of PG in between (blocks 2 and 3)
Group them by PG value and starting SLNO value -> this counts the consecutive values for any [PG, (starting) SLNO] couple (block 2);
Extract Maximum value from query 2 (block 1)
Note that the query may be simplified if the slno field in table contains consecutive values, but this seems not your case (in your example record with SLNO = 5 is missing)
Only requiring a single aggregation query and no joins (the rest of the calculation can be done with ROW_NUMBER, LAG and LAST_VALUE):
SELECT MAX( num_before_in_queue ) AS max_sequential_in_queue
FROM (
SELECT rn - LAST_VALUE( has_changed ) IGNORE NULL OVER ( ORDER BY ROWNUM ) + 1
AS num_before_in_queue
FROM (
SELECT pg,
ROW_NUMBER() OVER ( ORDER BY slno ) AS rn,
CASE pg WHEN LAG( pg ) OVER ( ORDER BY slno )
THEN NULL
ELSE ROW_NUMBER() OVER ( ORDER BY sl_no )
END AS change
FROM table_name
)
WHERE pg = 'Y'
);
Try to use row_number()
select
SLNO,
Name,
PG,
row_number() over (partition by PG order by PG) as 'Consecutive'
from
<table>
order by
SLNO,
NAME,
PG
This is should work with minor tweaking.
--EDIT--
Sorry, partiton by PG.
The partitioning tells the row_number when to start a new sequence.

SQL Grouping by Ranges

I have a data set that has timestamped entries over various sets of groups.
Timestamp -- Group -- Value
---------------------------
1 -- A -- 10
2 -- A -- 20
3 -- B -- 15
4 -- B -- 25
5 -- C -- 5
6 -- A -- 5
7 -- A -- 10
I want to sum these values by the Group field, but parsed as it appears in the data. For example, the above data would result in the following output:
Group -- Sum
A -- 30
B -- 40
C -- 5
A -- 15
I do not want this, which is all I've been able to come up with on my own so far:
Group -- Sum
A -- 45
B -- 40
C -- 5
Using Oracle 11g, this is what I've hobbled togther so far. I know that this is wrong, by I'm hoping I'm at least on the right track with RANK(). In the real data, entries with the same group could be 2 timestamps apart, or 100; there could be one entry in a group, or 100 consecutive. It does not matter, I need them separated.
WITH SUB_Q AS
(SELECT K_ID
, GRP
, VAL
-- GET THE RANK FROM TIMESTAMP TO SEPARATE GROUPS WITH SAME NAME
, RANK() OVER(PARTITION BY K_ID ORDER BY TMSTAMP) AS RNK
FROM MY_TABLE
WHERE K_ID = 123)
SELECT T1.K_ID
, T1.GRP
, SUM(CASE
WHEN T1.GRP = T2.GRP THEN
T1.VAL
ELSE
0
END) AS TOTAL_VALUE
FROM SUB_Q T1 -- MAIN VALUE
INNER JOIN SUB_Q T2 -- TIMSTAMP AFTER
ON T1.K_ID = T2.K_ID
AND T1.RNK = T2.RNK - 1
GROUP BY T1.K_ID
, T1.GRP
Is it possible to group in this way? How would I go about doing this?
I approach this problem by defining a group which is the different of two row_number():
select group, sum(value)
from (select t.*,
(row_number() over (order by timestamp) -
row_number() over (partition by group order by timestamp)
) as grp
from my_table t
) t
group by group, grp
order by min(timestamp);
The difference of two row numbers is constant for adjacent values.
A solution using LAG and windowed analytic functions:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST ( "Timestamp", "Group", Value ) AS
SELECT 1, 'A', 10 FROM DUAL
UNION ALL SELECT 2, 'A', 20 FROM DUAL
UNION ALL SELECT 3, 'B', 15 FROM DUAL
UNION ALL SELECT 4, 'B', 25 FROM DUAL
UNION ALL SELECT 5, 'C', 5 FROM DUAL
UNION ALL SELECT 6, 'A', 5 FROM DUAL
UNION ALL SELECT 7, 'A', 10 FROM DUAL;
Query 1:
WITH changes AS (
SELECT t.*,
CASE WHEN LAG( "Group" ) OVER ( ORDER BY "Timestamp" ) = "Group" THEN 0 ELSE 1 END AS hasChangedGroup
FROM TEST t
),
groups AS (
SELECT "Group",
VALUE,
SUM( hasChangedGroup ) OVER ( ORDER BY "Timestamp" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp
FROM changes
)
SELECT "Group",
SUM( VALUE )
FROM Groups
GROUP BY "Group", grp
ORDER BY grp
Results:
| Group | SUM(VALUE) |
|-------|------------|
| A | 30 |
| B | 40 |
| C | 5 |
| A | 15 |
This is typical "star_of_group" problem (see here: https://timurakhmadeev.wordpress.com/2013/07/21/start_of_group/)
In your case, it would be as follows:
with t as (
select 1 timestamp, 'A' grp, 10 value from dual union all
select 2, 'A', 20 from dual union all
select 3, 'B', 15 from dual union all
select 4, 'B', 25 from dual union all
select 5, 'C', 5 from dual union all
select 6, 'A', 5 from dual union all
select 7, 'A', 10 from dual
)
select min(timestamp), grp, sum(value) sum_value
from (
select t.*
, sum(start_of_group) over (order by timestamp) grp_id
from (
select t.*
, case when grp = lag(grp) over (order by timestamp) then 0 else 1 end
start_of_group
from t
) t
)
group by grp_id, grp
order by min(timestamp)
;