I want to use AVG to get the average of some values, but ignoring the max and min values only if they are 1.5 bellow or above the second max and min values. I will put some examples:
Example 1:
SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL ) D
I need this result, ignoring the 103.1 value:
100.5
101.5
Example 2:
SELECT *
FROM (
SELECT 100.5 v FROM DUAL UNION
SELECT 101.5 v FROM DUAL UNION
SELECT 103.1 v FROM DUAL UNION
SELECT 106.2 v FROM DUAL) D
I need this result, ignoring only the 106.2 value:
100.5
101.1
103.1
Example 3:
SELECT *
FROM (
SELECT 100.0 v FROM DUAL UNION
SELECT 102.0 v FROM DUAL UNION
SELECT 103.0 v FROM DUAL UNION
SELECT 105.0 v FROM DUAL UNION
SELECT 107.0 v FROM DUAL) D
I need this result, ignoring 100.0 and 107.0 values:
102.0
103.0
105.0
When there is only two values it doesnt matter.
With the right result, I can AVG(value) correctly.
You need a combination of analytic functions (lead/lag) and conditional aggregation. Here's what I came up with. Note that I allow for multiple groups, the "adjusted" average must be computed for each group separately (a common task in statistics when you must throw out the outliers in each group, when they exist):
with
inputs ( id, val ) as (
select 101, 33 from dual union all
select 102, 23 from dual union all
select 102, 22.8 from dual union all
select 103, 30 from dual union all
select 103, 40 from dual union all
select 104, 90 from dual union all
select 104, 92 from dual union all
select 104, 92 from dual union all
select 104, 91.5 from dual union all
select 104, 91.7 from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id,
avg ( case when cnt >= 3
and ( lag_val is null and lead_val - val >= 1.5
or
lead_val is null and val - lag_val >= 1.5
)
then null
else val
end
) as adjusted_avg_val
from (
select id, val, count(val) over (partition by id) as cnt,
lag ( val ) over ( partition by id order by val ) as lag_val,
lead ( val ) over ( partition by id order by val ) as lead_val
from inputs
)
group by id
;
Output:
ID ADJUSTED_AVG_VAL
--- ----------------
101 33
102 22.9
103 35
104 91.8
Try using the following combination of row_number lead and lag
with cte as (
SELECT 100.5 v FROM DUAL UNION ALL
SELECT 101.5 v FROM DUAL UNION ALL
SELECT 103.1 v FROM DUAL UNION ALL
SELECT 106.2 v FROM DUAL)
-- end of sample data
select avg(v)
from
(
select row_number() over (order by v desc) arn,
row_number() over (order by v) drn,
lag(v) over (order by v) av,
lead(v) over (order by v) dv,
v
from cte
) t
where (arn != 1 and drn != 1) or -- if they are no maximum nor minumum
(drn = 1 and v + 1.5 > dv) or -- if they are minimum
(arn = 1 and v - 1.5 < av) or -- if they are maximum
(av is null and arn < 3) or -- if there are just two ore one value
(dv is null and drn < 3) -- if there are just two ore one value
In SQL you simply just need to express the result, so ...
WITH D as(
SELECT 100.0 v FROM DUAL UNION
SELECT 102.0 FROM DUAL UNION
SELECT 103.0 FROM DUAL UNION
SELECT 105.0 FROM DUAL UNION
SELECT 107.0 FROM DUAL)
SELECT avg(v)
FROM D
where (v < (select max(v) from D )
and ((select max(v) from D )
-(select max(v) from D where v !=
(select max(v) from D ) ) > 1.5))
or
(v > (select min(v) from D )
and ((select min(v) from D )
+(select min(v) from D where v !=
(select min(v) from D ) ) > 1.5))
... should do the trick!
But thinking ahead... the below version may also be useful ;)
WITH D as(
SELECT 1 PK, 100.0 v FROM DUAL UNION
SELECT 1,102.0 FROM DUAL UNION
SELECT 1,103.0 FROM DUAL UNION
SELECT 1,105.0 FROM DUAL UNION
SELECT 1,107.0 FROM DUAL)
SELECT PK,avg(v)
FROM D
where (v < (select max(v) from D group by PK)
and ((select max(v) from D group by PK)
-(select max(v) from D where v !=
(select max(v) from D group by PK) group by PK) > 1.5))
or
(v > (select min(v) from D group by PK)
and ((select min(v) from D group by PK)
+(select min(v) from D where v !=
(select min(v) from D group by PK) group by PK) > 1.5))
GROUP BY PK
In real life though you would consider the execution plans of the above in a large dataset too (homework).
For any further clarifications, I am at your disposal via comments.
Sincerely,
Ted
Related
i have following source data...
id date value
1 01.08.22 a
1 02.08.22 a
1 03.08.22 a
1 04.08.22 b
1 05.08.22 b
1 06.08.22 a
1 07.08.22 a
2 01.08.22 a
2 02.08.22 a
2 03.08.22 c
2 04.08.22 a
2 05.08.22 a
and i would like to have the following output...
id date_from date_until value
1 01.08.22 03.08.22 a
1 04.08.22 05.08.22 b
1 06.08.22 07.08.22 a
2 01.08.22 02.08.22 a
2 03.08.22 03.08.22 c
2 04.08.22 05.08.22 a
Is this possible with Oracle SQL? Which functions do I need for this?
Based on the link provided by #astentx, try this solution:
SELECT
id, MIN("date") AS date_from, MAX("date") AS date_until, MAX(value) AS value
FROM (
SELECT
t1.*,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY "date") -
ROW_NUMBER() OVER(PARTITION BY id, value ORDER BY "date") AS rn
FROM yourtable t1
)
GROUP BY id, rn
See db<>fiddle
WITH CTE (id, dateD,valueD)
AS
(
SELECT 1, TO_DATE('01.08.22','DD.MM.YY'), 'a' FROM DUAL UNION ALL
SELECT 1, TO_DATE('02.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 1, TO_DATE('03.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 1, TO_DATE('04.08.22','DD.MM.YY'), 'b'FROM DUAL UNION ALL
SELECT 1, TO_DATE('05.08.22','DD.MM.YY'), 'b'FROM DUAL UNION ALL
SELECT 2, TO_DATE('01.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 2, TO_DATE('02.08.22','DD.MM.YY'), 'a'FROM DUAL UNION ALL
SELECT 2, TO_DATE('03.08.22','DD.MM.YY'), 'c'FROM DUAL
)
SELECT C.ID,C.VALUED,MIN(C.DATED)AS MIN_DATE,MAX(C.DATED)AS MAX_DATE
FROM CTE C
GROUP BY C.ID,C.VALUED
ORDER BY C.ID
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=47c87d60445ce262cd371177e31d5d63
i have 2 columns in a table. Data looks like this
Folio_no | Flag
1145 R
201 S
1145 FR
300 E
1145 R
201 E
201 S
Expected Output:
Folio_No | Flag
1145 R
201 S
300 E
The output should give the folio_no along with the flag which occured maximum number of times for that particular folio number.
i tried doing the below but it throws an error
select folio_no, max(count(flag)) from table group by folio_no;
We can use an aggregation:
WITH cte AS (
SELECT Folio_No, Flag, COUNT(*) AS cnt
FROM yourTable
GROUP BY Folio_No, Flag
),
cte2 AS (
SELECT t.*, RANK() OVER (PARTITION BY Folio_No ORDER BY cnt DESC, Flag) rnk
FROM cte t
)
SELECT Folio_No, Flag
FROM cte2
WHERE rnk = 1;
Note that I assume should two flags within a given folio number be tied for the max frequency, that you want to report the earlier flag.
Here is a working demo.
If you want the flag(s) that have the maximum occurrence for each folio then you can use:
SELECT Folio_No, Flag
FROM (
SELECT Folio_No,
Flag,
RANK() OVER (PARTITION BY Folio_No ORDER BY COUNT(*) DESC) AS rnk
FROM table_name
GROUP BY Folio_No, Flag
)
WHERE rnk = 1;
Which, for the sample data:
CREATE TABLE table_name (folio_no, flag) AS
SELECT 1145, 'R' FROM DUAL UNION ALL
SELECT 201, 'S' FROM DUAL UNION ALL
SELECT 1145, 'FR' FROM DUAL UNION ALL
SELECT 300, 'E' FROM DUAL UNION ALL
SELECT 1145, 'R' FROM DUAL UNION ALL
SELECT 201, 'E' FROM DUAL UNION ALL
SELECT 201, 'S' FROM DUAL UNION ALL
SELECT 201, 'S' FROM DUAL UNION ALL
SELECT 1, 'A' FROM DUAL UNION ALL
SELECT 1, 'A' FROM DUAL UNION ALL
SELECT 1, 'B' FROM DUAL UNION ALL
SELECT 1, 'B' FROM DUAL UNION ALL
SELECT 1, 'C' FROM DUAL UNION ALL
SELECT 1, 'D' FROM DUAL;
Outputs:
FOLIO_NO
FLAG
1
A
1
B
201
S
300
E
1145
R
If you want only a single flag with the maximum occurrence for each folio, and if there are ties then the first folio alphabetically in each folio, then:
SELECT Folio_No, Flag
FROM (
SELECT Folio_No,
Flag,
ROW_NUMBER() OVER (PARTITION BY Folio_No ORDER BY COUNT(*) DESC, flag) AS rn
FROM table_name
GROUP BY Folio_No, Flag
)
WHERE rn = 1;
Which, for the sample data outputs:
FOLIO_NO
FLAG
1
A
201
S
300
E
1145
R
db<>fiddle here
I need a graceful way to select the max value from a field holding a comma delimited list.
Expected Values:
List_1 | Last
------ | ------
A,B,C | C
B,D,C | D
I'm using the following query and I'm not getting what's expected.
select
list_1,
(
select max(values) WITHIN GROUP (order by 1)
from (
select
regexp_substr(list_1,'[^,]+', 1, level) as values
from dual
connect by regexp_substr(list_1, '[^,]+', 1, level) is not null)
) as last
from my_table
Anyone have any ideas to fix my query?
with
test_data ( id, list_1 ) as (
select 101, 'A,B,C' from dual union all
select 102, 'B,D,C' from dual union all
select 105, null from dual union all
select 122, 'A' from dual union all
select 140, 'A,B,B' from dual
)
-- end of simulated table (for testing purposes only, not part of the solution)
select id, list_1, max(token) as max_value
from ( select id, list_1,
regexp_substr(list_1, '([^,])(,|$)', 1, level, null, 1) as token
from test_data
connect by level <= 1 + regexp_count(list_1, ',')
and prior id = id
and prior sys_guid() is not null
)
group by id, list_1
order by id
;
ID LIST_1_ MAX_VAL
---- ------- -------
101 A,B,C C
102 B,D,C D
105
122 A A
140 A,B,B B
In Oracle 12.1 or higher, this can be re-written using the LATERAL clause:
select d.id, d.list_1, x.max_value
from test_data d,
lateral ( select max(regexp_substr(list_1, '([^,]*)(,|$)',
1, level, null, 1)) as max_value
from test_data x
where x.id = d.id
connect by level <= 1 + regexp_count(list_1, ',')
) x
order by d.id
;
This might be a simple but I need to apply the logic in other:
WITH t(col) AS (
SELECT 1 FROM dual
UNION SELECT 2 FROM dual
UNION SELECT 3 FROM dual
UNION SELECT 4 FROM dual
UNION SELECT 5 FROM dual
)
SELECT col , --- will works as usual
(SELECT col FROM t WHERE col = outer_q.col) new_col, --working as well
(
SELECT sum (latest_col)
from
(
SELECT col latest_col FROM t WHERE col = outer_q.col
UNION ALL
SELECT col FROM t WHERE col = outer_q.col
)
)newest_col -- need to get an output "4"
from t outer_q where col = 2;
An simple output like:
COL NEW_COL NEWEST_COL
---------- ---------- ----------
2 2 4
I just need to use the outer most value to the inner I used for the third column
EDITING-- sample with more data:
WITH
t(col) AS
( SELECT 1 FROM dual
UNION
SELECT 2 FROM dual
UNION
SELECT 3 FROM dual
UNION
SELECT 4 FROM dual
UNION
SELECT 5 FROM dual
),
t1(amount, col) AS
(SELECT 100 , 2 FROM dual
UNION
SELECT 200, 3 FROM dual
)
SELECT col,
(SELECT col FROM t WHERE col = outer_q.col
) new_col,
(SELECT SUM(x)
FROM
(SELECT col x FROM t
UNION ALL
SELECT amount x FROM t1
)
WHERE col = outer_q.col
) newest_col -- gives 315 as it takes whole `SUM`
FROM t outer_q
WHERE col = 2;
An output is expected like:
COL NEW_COL NEWEST_COL
---------- ---------- ----------
2 2 102
Thanks in advance for any help.
Well, you can if you refactor a but your query:
WITH t(col) AS (
SELECT 1 FROM dual
UNION SELECT 2 FROM dual
UNION SELECT 3 FROM dual
UNION SELECT 4 FROM dual
UNION SELECT 5 FROM dual
)
SELECT col,
(SELECT col FROM t WHERE col = outer_q.col) new_col,
(SELECT sum (latest_col)
from
(
SELECT col latest_col FROM t
UNION ALL
SELECT col FROM t
) x
where x.latest_col = outer_q.col
) newest_col -- need to get an output "4"
from t outer_q where col = 2;
This is possible here because outer_q is now in the where clause of the sub-query. It was used before in the sub-sub-query (the one with the UNION ALL), and this one was hiding it.
To try to make things clearer, now we have something like:
with t as (...)
select col,
(SELECT col FROM t WHERE col = outer_q.col) new_col,
(SELECT col FROM (Something more complex) WHERE ... = outer_q.col) new_col,
from t outer_q where col = 2;
So we now have the same level of "interiority".
EDIT: to answer the updated question, there is a little adaptation needed:
WITH t(col) AS
(
SELECT 1 FROM dual
UNION
SELECT 2 FROM dual
UNION
SELECT 3 FROM dual
UNION
SELECT 4 FROM dual
UNION
SELECT 5 FROM dual
),
t1(amount, col) AS
(
SELECT 100, 2 FROM dual
UNION
SELECT 200, 3 FROM dual
)
SELECT col,
(SELECT col FROM t WHERE col = outer_q.col) new_col,
(SELECT SUM(amount)
FROM
(SELECT col, col amount FROM t -- row is (1, 1), then (2, 2) etc
UNION ALL
SELECT col, amount FROM t1 -- row is (2, 100), then (3, 200) etc
)
WHERE col = outer_q.col
) newest_col -- gives 102 as it takes whole `SUM`
FROM t outer_q
WHERE col = 2;
The part to understand is in the innermost query: you want to sum both the column and the amount value, so you repeat the col value as if it was an amount.
Another way to obtain the same result (with more performance, I guess) would be to sum col and amount on the same row:
WITH t(col) AS
(
SELECT 1 FROM dual
UNION
SELECT 2 FROM dual
UNION
SELECT 3 FROM dual
UNION
SELECT 4 FROM dual
UNION
SELECT 5 FROM dual
),
t1(amount, col) AS
(
SELECT 100, 2 FROM dual
UNION
SELECT 200, 3 FROM dual
)
SELECT col,
(SELECT col FROM t WHERE col = outer_q.col) new_col,
(SELECT SUM(all_amount)
FROM
(SELECT col, col + amount all_amount FROM t1)
WHERE col = outer_q.col
) newest_col -- gives 315 as it takes whole `SUM`
FROM t outer_q
WHERE col = 2;
The inner query fails because you tried to push the outer_q.col reference two levels down. Correlated query goes only 1 level down
Reference: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1853075500346799932
Are there any techniques that would allow a row set like this
WITH
base AS
(
SELECT 1 N FROM DUAL UNION ALL
SELECT 2 N FROM DUAL UNION ALL
SELECT 3 N FROM DUAL UNION ALL
SELECT 6 N FROM DUAL UNION ALL
SELECT 7 N FROM DUAL UNION ALL
SELECT 17 N FROM DUAL UNION ALL
SELECT 18 N FROM DUAL UNION ALL
SELECT 19 N FROM DUAL UNION ALL
SELECT 21 N FROM DUAL
)
SELECT a.N
FROM base a
to yield results
1 3
6 7
17 19
21 21
It is in effect a rows to ranges operation.
I'm playing in Oracle Land, and would appreciate any suggestions.
I feel like this can probably be improved on, but it works:
WITH base AS (
SELECT 1 N FROM DUAL UNION ALL
SELECT 2 N FROM DUAL UNION ALL
SELECT 3 N FROM DUAL UNION ALL
SELECT 6 N FROM DUAL UNION ALL
SELECT 7 N FROM DUAL UNION ALL
SELECT 17 N FROM DUAL UNION ALL
SELECT 18 N FROM DUAL UNION ALL
SELECT 19 N FROM DUAL UNION ALL
SELECT 21 N FROM DUAL
)
, lagged AS
(
SELECT n, LAG(n) OVER (ORDER BY n) lag_n FROM base
)
, groups AS
(
SELECT n, row_number() OVER (ORDER BY n) groupnum
FROM lagged
WHERE lag_n IS NULL OR lag_n < n-1
)
, grouped AS
(
SELECT n, (SELECT MAX(groupnum) FROM groups
WHERE groups.n <= base.n
) groupnum
FROM base
)
SELECT groupnum, MIN(n), MAX(n)
FROM grouped
GROUP BY groupnum
ORDER BY groupnum
Another way:
WITH base AS
(
SELECT 1 N FROM DUAL UNION ALL
SELECT 2 N FROM DUAL UNION ALL
SELECT 3 N FROM DUAL UNION ALL
SELECT 6 N FROM DUAL UNION ALL
SELECT 7 N FROM DUAL UNION ALL
SELECT 17 N FROM DUAL UNION ALL
SELECT 18 N FROM DUAL UNION ALL
SELECT 19 N FROM DUAL UNION ALL
SELECT 21 N FROM DUAL
)
select min(n), max(n) from
(
select n, connect_by_root n root from base
connect by prior n = n-1
start with n not in (select n from base b
where exists (select 1 from base b1 where b1.n = b.n-1)
)
)
group by root
order by root
Yet another way:
with base as (
select 1 n from dual union all
select 2 n from dual union all
select 3 n from dual union all
select 6 n from dual union all
select 7 n from dual union all
select 17 n from dual union all
select 18 n from dual union all
select 19 n from dual union all
select 21 n from dual)
select a,b
from (select a
,case when b is not null and a is not null
then b
else lead(n) over (order by n)
end b
from (select n
,a
,b
from (select n
,case n-1 when lag (n) over (order by n) then null else n end a
,case n+1 when lead (n) over (order by n) then null else n end b
from base)
where a is not null
or b is not null))
where a is not null
order by a