SQL Grouping by Ranges

SQL Grouping by Ranges - sql

I have a data set that has timestamped entries over various sets of groups.
Timestamp -- Group -- Value
---------------------------
1 -- A -- 10
2 -- A -- 20
3 -- B -- 15
4 -- B -- 25
5 -- C -- 5
6 -- A -- 5
7 -- A -- 10
I want to sum these values by the Group field, but parsed as it appears in the data. For example, the above data would result in the following output:
Group -- Sum
A -- 30
B -- 40
C -- 5
A -- 15
I do not want this, which is all I've been able to come up with on my own so far:
Group -- Sum
A -- 45
B -- 40
C -- 5
Using Oracle 11g, this is what I've hobbled togther so far. I know that this is wrong, by I'm hoping I'm at least on the right track with RANK(). In the real data, entries with the same group could be 2 timestamps apart, or 100; there could be one entry in a group, or 100 consecutive. It does not matter, I need them separated.
WITH SUB_Q AS
(SELECT K_ID
, GRP
, VAL
-- GET THE RANK FROM TIMESTAMP TO SEPARATE GROUPS WITH SAME NAME
, RANK() OVER(PARTITION BY K_ID ORDER BY TMSTAMP) AS RNK
FROM MY_TABLE
WHERE K_ID = 123)
SELECT T1.K_ID
, T1.GRP
, SUM(CASE
WHEN T1.GRP = T2.GRP THEN
T1.VAL
ELSE
0
END) AS TOTAL_VALUE
FROM SUB_Q T1 -- MAIN VALUE
INNER JOIN SUB_Q T2 -- TIMSTAMP AFTER
ON T1.K_ID = T2.K_ID
AND T1.RNK = T2.RNK - 1
GROUP BY T1.K_ID
, T1.GRP
Is it possible to group in this way? How would I go about doing this?

I approach this problem by defining a group which is the different of two row_number():
select group, sum(value)
from (select t.*,
(row_number() over (order by timestamp) -
row_number() over (partition by group order by timestamp)
) as grp
from my_table t
) t
group by group, grp
order by min(timestamp);
The difference of two row numbers is constant for adjacent values.

A solution using LAG and windowed analytic functions:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST ( "Timestamp", "Group", Value ) AS
SELECT 1, 'A', 10 FROM DUAL
UNION ALL SELECT 2, 'A', 20 FROM DUAL
UNION ALL SELECT 3, 'B', 15 FROM DUAL
UNION ALL SELECT 4, 'B', 25 FROM DUAL
UNION ALL SELECT 5, 'C', 5 FROM DUAL
UNION ALL SELECT 6, 'A', 5 FROM DUAL
UNION ALL SELECT 7, 'A', 10 FROM DUAL;
Query 1:
WITH changes AS (
SELECT t.*,
CASE WHEN LAG( "Group" ) OVER ( ORDER BY "Timestamp" ) = "Group" THEN 0 ELSE 1 END AS hasChangedGroup
FROM TEST t
),
groups AS (
SELECT "Group",
VALUE,
SUM( hasChangedGroup ) OVER ( ORDER BY "Timestamp" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp
FROM changes
)
SELECT "Group",
SUM( VALUE )
FROM Groups
GROUP BY "Group", grp
ORDER BY grp
Results:
| Group | SUM(VALUE) |
|-------|------------|
| A | 30 |
| B | 40 |
| C | 5 |
| A | 15 |

This is typical "star_of_group" problem (see here: https://timurakhmadeev.wordpress.com/2013/07/21/start_of_group/)
In your case, it would be as follows:
with t as (
select 1 timestamp, 'A' grp, 10 value from dual union all
select 2, 'A', 20 from dual union all
select 3, 'B', 15 from dual union all
select 4, 'B', 25 from dual union all
select 5, 'C', 5 from dual union all
select 6, 'A', 5 from dual union all
select 7, 'A', 10 from dual
)
select min(timestamp), grp, sum(value) sum_value
from (
select t.*
, sum(start_of_group) over (order by timestamp) grp_id
from (
select t.*
, case when grp = lag(grp) over (order by timestamp) then 0 else 1 end
start_of_group
from t
) t
)
group by grp_id, grp
order by min(timestamp)
;

Related

create date range from day based data

i have following source data...
id date value
1 01.08.22 a
1 02.08.22 a
1 03.08.22 a
1 04.08.22 b
1 05.08.22 b
1 06.08.22 a
1 07.08.22 a
2 01.08.22 a
2 02.08.22 a
2 03.08.22 c
2 04.08.22 a
2 05.08.22 a
and i would like to have the following output...
id date_from date_until value
1 01.08.22 03.08.22 a
1 04.08.22 05.08.22 b
1 06.08.22 07.08.22 a
2 01.08.22 02.08.22 a
2 03.08.22 03.08.22 c
2 04.08.22 05.08.22 a
Is this possible with Oracle SQL? Which functions do I need for this?

Based on the link provided by #astentx, try this solution:
SELECT
id, MIN("date") AS date_from, MAX("date") AS date_until, MAX(value) AS value
FROM (
SELECT
t1.*,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY "date") -
ROW_NUMBER() OVER(PARTITION BY id, value ORDER BY "date") AS rn
FROM yourtable t1
)
GROUP BY id, rn
See db<>fiddle

WITH CTE (id, dateD,valueD)
AS
(
SELECT 1, TO_DATE('01.08.22','DD.MM.YY'), 'a' FROM DUAL UNION ALL
SELECT 1, TO_DATE('02.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 1, TO_DATE('03.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 1, TO_DATE('04.08.22','DD.MM.YY'), 'b'FROM DUAL UNION ALL
SELECT 1, TO_DATE('05.08.22','DD.MM.YY'), 'b'FROM DUAL UNION ALL
SELECT 2, TO_DATE('01.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 2, TO_DATE('02.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 2, TO_DATE('03.08.22','DD.MM.YY'), 'c'FROM DUAL
)
SELECT C.ID,C.VALUED,MIN(C.DATED)AS MIN_DATE,MAX(C.DATED)AS MAX_DATE
FROM CTE C
GROUP BY C.ID,C.VALUED
ORDER BY C.ID
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=47c87d60445ce262cd371177e31d5d63

Oracle SQL row concatenation by periods: maximum period

I have the below table:
LAUFD
ID
NEXDT
ORDER_ROW
20140305
C1
20140310
14
20140226
C1
20140305
13
20131125
C1
20131126
12
20131021
C1
20131022
11
20130821
C1
20130828
10
20130814
C1
20130821
9
20130807
C1
20130814
8
20130731
C1
20130807
7
20130724
C1
20130731
6
20130710
C1
20130724
5
20130708
C1
20130709
4
20130624
C1
20130707
3
20130603
C1
20130608
2
20130527
C1
20130603
1
I would like to have the below output:
ID
START
END
C1
20140226
20140310
The logic is: if, ordering ID by order_row, the field NEXDT is equal or equal+1 or equal+2 to the field LAUFD of the next order_row, then continue with the next entry. If not, generate an entry in the output table with the start (earliest LAUFD) and end (latest NEXDT).
Basically, it's the same question as in Oracle SQL row concatenation by periods but I'd like just the latest period as an output.

Looks like this is what you need:
with t (LAUFD, ID, NEXDT, ORDER_ROW) as (
select 20140305,'C1', 20140310, 14 from dual union all
select 20140226,'C1', 20140305, 13 from dual union all
select 20131125,'C1', 20131126, 12 from dual union all
select 20131021,'C1', 20131022, 11 from dual union all
select 20130821,'C1', 20130828, 10 from dual union all
select 20130814,'C1', 20130821, 9 from dual union all
select 20130807,'C1', 20130814, 8 from dual union all
select 20130731,'C1', 20130807, 7 from dual union all
select 20130724,'C1', 20130731, 6 from dual union all
select 20130710,'C1', 20130724, 5 from dual union all
select 20130708,'C1', 20130709, 4 from dual union all
select 20130624,'C1', 20130707, 3 from dual union all
select 20130603,'C1', 20130608, 2 from dual union all
select 20130527,'C1', 20130603, 1 from dual
)
,t1 as (select id, order_row, to_date(laufd,'yyyymmdd') as laufd_dt, to_date(nexdt,'yyyymmdd') as nexdt_dt from t)
select *
from t1
match_recognize (
partition by id
order by order_row desc
measures
min(x.laufd_dt) as dt_start,
max(a.nexdt_dt) as dt_end,
x.laufd_dt-next(x.nexdt_dt) as dates_diff
one row per match
pattern(a x+ y* z*)
define
x as x.order_row=prev(order_row)-1 and prev(laufd_dt)-nexdt_dt<=3
,y as x.order_row=prev(order_row)-1
);

For just the latest period, you could use the previous solution. But instead, look for the first "break". Then only use the rows since that break;
select id, min(laufd), max(nextdt),
row_number() over (partition by id order by min(laufd)) as period
from (select t.*,
sum(case when prev_nextdt >= laufd - interval '2' day then 0 else 1 end) over
(partition by id order by order_row range desc) as grp,
sum(case when prev_nextdt >= laufd - interval '2' day then 0 else 1 end) over (partition by id) as num_grps
from (select t.id, t.order_row, -- any other columns you need
to_date(laufd, 'YYYYMMDD') as laufd,
to_date(nextdt, 'YYYYMMDD') as next_dt,
lag(to_date(nextdt, 'YYYYMMDD')) over (partition by id order by order_row) as prev_nextdt
from t
) t
) t
where num_grps = grp
group by id;
This is basically the same logic. It just keeps the first group.

select rows between two character values of a column

I have a table which shows as below:
S.No | Action
1 | New
2 | Dependent
3 | Dependent
4 | Dependent
5 | New
6 | Dependent
7 | Dependent
8 | New
9 | Dependent
10 | Dependent
I here want to select the rows between the first two 'New' values in the Action column, including the first row with the 'New' action. Like [New,New)
For example:
In this case, I want to select rows 1,2,3,4.
Please let me know how to do this.

Hmmm. Let's count up the cumulative number of times that New appears as a value and use that:
select t.*
from (select t.*,
sum(case when action = 'New' then 1 else 0 end) over (order by s_no) as cume_new
from t
) t
where cume_new = 1;

you can do some magic with analytic functions
1 select group of NEW actions, to get min and max s_no
2 select lead of 2 rows
3 select get between 2 sno (min and max)
with t as (
select 1 sno, 'New' action from dual union
select 2,'Dependent' from dual union
select 3,'Dependent' from dual union
select 4,'Dependent' from dual union
select 5,'New' from dual union
select 6,'Dependent' from dual union
select 7,'Dependent' from dual union
select 8,'New' from dual union
select 9,'Dependent' from dual union
select 10,'Dependent' from dual
)
select *
from (select *
from (select sno, lead(sno) over (order by sno) a
from ( select row_number() over (partition by action order by Sno) t,
t.sno
from t
where t.action = 'New'
) a
where t <=2 )
where a is not null) a, t
where t.sno >= a.sno and t.sno < a.a

Split a column into multiple columns

select distinct account_num from account order by account_num;
The above query gave the below result
account_num
1
2
4
7
12
18
24
37
45
59
I want to split the account_num column into tuple of three account_num's like (1,2,4);(7,12,18);(24,37,45),(59); The last tuple has only one entry as there are no more account_num's left. Now I want a query to output the min and max of each tuple. (please observe that the max of one tuple is less than the min of the next tuple). Output desired is shown below
1 4
7 18
24 45
59 59
Edit: I have explained my requirement in the best way I could

You can use the example below as a scratch, this is only based on information you have provided so far. For further documentation, you can consult Oracle's analytical functions docs:
with src as( --create a source data
select 1 col from dual union
select 2 from dual union
select 4 from dual union
select 7 from dual union
select 12 from dual union
select 18 from dual union
select 24 from dual union
select 37 from dual union
select 45 from dual union
select 59 from dual
)
select
col,
decode(col_2, 0, max_col, col_2) col_2 -- for the last row we get the maximum value for the row
from (
select
col,
lead(col, 2, 0) over (order by col) col_2, -- we get the values from from two rows behind
max(col) over () max_col, -- we get the max value to be used for the last row in the result
rownum rn from src -- we get the rownum to handle the final output
) where mod(rn - 1, 3) = 0 -- only get rows having a step of two

This is another solution.
SELECT *
FROM (SELECT DISTINCT MIN(val) over(PARTITION BY gr) min_,
MAX(val) over(PARTITION BY gr) max_
FROM (SELECT val,
decode(trunc(rn / 3), rn / 3, rn / 3, ceil(rn / 3)) gr
FROM (SELECT val,
row_number() over(ORDER BY val) rn
FROM (select distinct account_num from account order by account_num)))) ORDER BY min_
UPDATED
Solution without analytic function.
SELECT MIN(val) min_,
MAX(val) max_
FROM (SELECT val,
ceil(rn / 3) gr
FROM (SELECT val,
rownum rn
FROM A_DEL_ME)) GROUP BY gr

Please add more information on what you want to do. What is the connection between account_number 1 and number 4, 7 and 18? Is there any? If not, why would you want to split this into two columns and what is the rule for splitting it?
With what you have posted, you could do something like this:
select 1 as account_num, 4 as account_num1 from dual
union all select 7 as account_num, 18 as account_num1 from dual
...
and so on, but I don't see the use for this.

Get distinct rows based on priority?

I have a table as below.i am using oracle 10g.
TableA
------
id status
---------------
1 R
1 S
1 W
2 R
i need to get distinct ids along with their status. if i query for distinct ids and their status i get all 4 rows.
but i should get only 2. one per id.
here id 1 has 3 distinct statuses. here i should get only one row based on priority.
first priority is to 'S' , second priority to 'W' and third priority to 'R'.
in my case i should get two records as below.
id status
--------------
1 S
2 R
How can i do that? Please help me.
Thanks!

select
id,
max(status) keep (dense_rank first order by instr('SWR', status)) as status
from TableA
group by id
order by 1
fiddle

select id , status from (
select TableA.*, ROW_NUMBER()
OVER (PARTITION BY TableA.id ORDER BY DECODE(
TableA.status,
'S',1,
'W',2,
'R',3,
4)) AS row_no
FROM TableA)
where row_no = 1

This is first thing i would do, but there may be a better way.
Select id, case when status=1 then 'S'
when status=2 then 'W'
when status=3 then 'R' end as status
from(
select id, max(case when status='S' then 3
when status='W' then 2
when status='R' then 1
end) status
from tableA
group by id
);

To get it done you can write a similar query:
-- sample of data from your question
SQL> with t1(id , status) as (
2 select 1, 'R' from dual union all
3 select 1, 'S' from dual union all
4 select 1, 'W' from dual union all
5 select 2, 'R' from dual
6 )
7 select id -- actual query
8 , status
9 from ( select id
10 , status
11 , row_number() over(partition by id
12 order by case
13 when upper(status) = 'S'
14 then 1
15 when upper(status) = 'W'
16 then 2
17 when upper(status) = 'R'
18 then 3
19 end
20 ) as rn
21 from t1
22 ) q
23 where q.rn = 1
24 ;
ID STATUS
---------- ------
1 S
2 R

select id,status from
(select id,status,decode(status,'S',1,'W',2,'R',3) st from table) where (id,st) in
(select id,min(st) from (select id,status,decode(status,'S',1,'W',2,'R',3) st from table))

Something like this???
SQL> with xx as(
2 select 1 id, 'R' status from dual UNION ALL
3 select 1, 'S' from dual UNION ALL
4 select 1, 'W' from dual UNION ALL
5 select 2, 'R' from dual
6 )
7 select
8 id,
9 DECODE(
10 MIN(
11 DECODE(status,'S',1,'W',2,'R',3)
12 ),
13 1,'S',2,'W',3,'R') "status"
14 from xx
15 group by id;
ID s
---------- -
1 S
2 R
Here, logic is quite simple.
Do a DECODE for setting the 'Priority', then find the MIN (i.e. one with Higher Priority) value and again DECODE it back to get its 'Status'

Using MOD() example with added values:
SELECT id, val, distinct_val
FROM
(
SELECT id, val
, ROW_NUMBER() OVER (ORDER BY id) row_seq
, MOD(ROW_NUMBER() OVER (ORDER BY id), 2) even_row
, (CASE WHEN id = MOD(ROW_NUMBER() OVER (ORDER BY id), 2) THEN NULL ELSE val END) distinct_val
FROM
(
SELECT 1 id, 'R' val FROM dual
UNION
SELECT 1 id, 'S' val FROM dual
UNION
SELECT 1 id, 'W' val FROM dual
UNION
SELECT 2 id, 'R' val FROM dual
UNION -- comment below for orig data
SELECT 3 id, 'K' val FROM dual
UNION
SELECT 4 id, 'G' val FROM dual
UNION
SELECT 1 id, 'W' val FROM dual
))
WHERE distinct_val IS NOT NULL
/
ID VAL DISTINCT_VAL
--------------------------
1 S S
2 R R
3 K K
4 G G

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Grouping by Ranges - sql

Related

create date range from day based data

Oracle SQL row concatenation by periods: maximum period

select rows between two character values of a column

Split a column into multiple columns

Get distinct rows based on priority?

Categories

Resources