Oracle SQL - Displaying only net effect (unmatched) rows - sql

Following is my sample table structure
Name Amount
A 100
A 100
A -100
A 100
A 100
A -100
B 10
A 100
There is no Primary Key in this table.
Desired Output:
Name Amount
A 100
A 100
B 10
A 100
Explanation:
I need to cancel out matching rows, i.e., one -100 nullifies one +100.
Therefore i need to display only rows that are not offset / not nullified one to one.
This can be done in PL/SQL by populating the rows to a temporary table and deleting one positive for every one corresponding negative. However, I require to do this on the fly using SQL statements.
Regards,
Raghu

You can enumerate the rows using row_number() and then use that to "cancel" them:
select t.name, t.amount
from (select t.*,
sum(amount) over (partition by name, abs(amount), seqnum) as sum_amount
from (select t.*,
row_number() over (partition by name, amount order by name) as seqnum
from t
) t
) t
where sum_amount <> 0;
Here is a db<>fiddle.

You can give each row a ROW_NUMBER unique to that name/amount pair and then count whether, for a name/ABS(amount) there are one or two values for each of those ROW_NUMBER and discard the rows where there are two (one positive and one negative):
SELECT name,
amount
FROM (
SELECT name,
amount,
COUNT( amount ) OVER ( PARTITION BY name, ABS( amount ), rn )
AS num_matches
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY name, amount ORDER BY ROWNUM ) AS rn
FROM table_name t
)
)
WHERE num_matches = 1
So, for your sample data:
CREATE TABLE table_name ( Name, Amount ) AS
SELECT 'A', +100 FROM DUAL UNION ALL
SELECT 'A', +100 FROM DUAL UNION ALL
SELECT 'A', -100 FROM DUAL UNION ALL
SELECT 'A', +100 FROM DUAL UNION ALL
SELECT 'A', +100 FROM DUAL UNION ALL
SELECT 'A', -100 FROM DUAL UNION ALL
SELECT 'B', +10 FROM DUAL UNION ALL
SELECT 'A', +100 FROM DUAL;
This outputs:
NAME | AMOUNT
:--- | -----:
A | 100
A | 100
A | 100
B | 10
db<>fiddle here

If there are never more negative values than positive it's a task for EXCEPT ALL. Oracle doesn't support it, but this is a rewrite:
select name, amount
from
(
select name, amount, row_number() over (partition by name, amount order by amount)
from tab
where amount > 0
minus
select name, -amount, row_number() over (partition by name, amount order by amount)
from tab
where amount < 0
) dt
or
with cte as
(
select name, amount, row_number() over (partition by name, amount order by amount) as rn
from tab
)
select name, amount
from
(
select name, amount, rn
from cte
where amount > 0
minus
select name, -amount, rn
from cte
where amount < 0
) dt

Related

Count distinct in rolling window function Oracle error ORA-30487 [duplicate]

I have the task to determine fact of three events on different accounts is in 1 hour window.
The solution could be like
count(distinct account_id) over (order by time_key range between 20 PRECEDING and CURRENT ROW)
and check that count() > 3
But Oracle can't use distinct function with order by clause:
ORA-30487: ORDER BY not allowed here
I have the solution below, but it seems hard
with t_data as (
select 1 as account_id, 1000 as time_key from dual union
select 1 as account_id, 1010 as time_key from dual union
select 1 as account_id, 1020 as time_key from dual union
select 1 as account_id, 1030 as time_key from dual union
select 2 as account_id, 1040 as time_key from dual union
select 3 as account_id, 1050 as time_key from dual union
select 3 as account_id, 1060 as time_key from dual union
select 3 as account_id, 1070 as time_key from dual union
select 3 as account_id, 1080 as time_key from dual union
select 3 as account_id, 1090 as time_key from dual
order by time_key
)
select *
from (
select account_id,
time_key,
max(
case
when account_id = 1 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m1,
max(
case
when account_id = 2 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m2,
max(
case
when account_id = 3 then 1
else 0
end
) over (order by time_key range between 20 PRECEDING and CURRENT ROW) as m3
from t_data
)
where m1 = 1 and m2 = 1 and m3 = 1
What is the simpler way to determine number of distinct events in a sliding window ?
It is not immediately obvious to me how you do this with window functions. You can use a correlated subquery:
select t.*,
(select count(distinct t2.account_id)
from t_data t2
where t2.time_key >= t.time_key - 20 and t2.time_key <= t.time_key
)
from t_data t;
Another method -- which could conceivably have better performance -- is to treat the problem as a gaps-and-island problem. The following version returns the number of simultaneous distinct accounts at each time key:
with t as (
select account_id, min(time_key) as min_time_key, max(time_key + 20) as max_time_key
from (select t.*, sum(case when time_key - prev_time_key <= 20 then 0 else 1 end) over (order by time_key) as grp
from (select t.*, lag(time_key) over (partition by account_id order by time_key) as prev_time_key
from t_data t
) t
) t
group by account_id
)
select td.account_id, td.time_key, count(distinct t.account_id) as num_distinct
from t_data td join
t
on td.time_key between t.min_time_key and t.max_time_key
group by td.account_id, td.time_key;
Finally, if you have only 3 (or 2) account ids that you want to find and you only care about getting some examples where the max is hit, then you can do the following:
select t.*
from (select t.*,
min(account_id) over (order by time_key range between 20 preceding and 1 preceding) as min_account_id,
max(account_id) over (order by time_key range between 20 preceding and 1 preceding) as max_account_id
from t_data t
) t
where min_account_id <> max_account_id and
account_id <> min_account_id and
account_id <> max_account_id;
This gets the max and min account ids from the preceding 20 rows -- excluding the current row. If these are different from the current value, then you have three different values.
Here is a super-simple way to do it. We can work on the performance, maybe if you want to post some details about the size of your table.
select t1.account_id, t1.time_key, count(distinct t2.account_id) cnt
from t_data t1 cross join t_data t2
where t2.time_key between t1.time_key - 20 and t1.time_key
group by t1.account_id, t1.time_key
having count(distinct t2.account_id) >= 3;
If you are really hell-bent on using a only a single windowing clause, here is a way:
with product_of_primes as (
select t.*, round(exp(sum(ln(decode(account_id,1,2,2,3,3,5)))
over ( order by time_key range between 20 preceding
and current row ))) product from t_data t
)
select account_id, time_key from product_of_primes
where mod(product,2*3*5) = 0;
Explanation:
Convert each distinct account_id into a prime number. So, the 1st account_id gets 2, the next gets 3, the next gets 5.
Take the natural log of that number
Sum the natural logs for all the events in the last hour (i.e., in our window), remembering that ln(a)+ln(b) = ln(a*b)
Take e to the power of the sum
(So far, this is just a long winded way to multiply all the prime numbers we mapped our account_ids to)
Any row where this result is evenly divisible by all three prime numbers we used (2,3,5 -- so, that is divisible by 30) has all three distinct account_ids in it's window.
If you were on my team and you wrote this, I would kill you.
Full example with data:
with t_data as (
select 1 as account_id, 1000 as time_key from dual union
select 1 as account_id, 1010 as time_key from dual union
select 1 as account_id, 1020 as time_key from dual union
select 1 as account_id, 1030 as time_key from dual union
select 2 as account_id, 1040 as time_key from dual union
select 3 as account_id, 1050 as time_key from dual union
select 3 as account_id, 1060 as time_key from dual union
select 3 as account_id, 1070 as time_key from dual union
select 3 as account_id, 1080 as time_key from dual union
select 3 as account_id, 1090 as time_key from dual
order by time_key
),
product_of_primes as (
select t.*, round(exp(sum(ln(decode(account_id,1,2,2,3,3,5)))
over ( order by time_key range between 20 preceding
and current row ))) product from t_data t
)
select account_id, time_key from product_of_primes
where mod(product,2*3*5) = 0;
Results:
+------------+----------+---------+
| ACCOUNT_ID | TIME_KEY | PRODUCT |
+------------+----------+---------+
| 3 | 1050 | 30 |
+------------+----------+---------+

Running count but reset on some column value in select query

I want to achieve a running value, but condition is reset on some specific column value.
Here is my select statement:
with tbl(emp,salary,ord) as
(
select 'A',1000,1 from dual union all
select 'B',1000,2 from dual union all
select 'K',1000,3 from dual union all
select 'A',1000,4 from dual union all
select 'B',1000,5 from dual union all
select 'D',1000,6 from dual union all
select 'B',1000,7 from dual
)
select * from tbl
I want to reset count on emp B if the column value is B, then count is reset to 0 and started again increment by 1:
emp salary ord running_count
A 1000 1 0
B 1000 2 1
K 1000 3 0
A 1000 4 1
B 1000 5 2
D 1000 6 0
B 1000 7 1
Here order column is ord.
I want to achieve the whole thing by select statement, not using the cursor.
You want to define groups were the counting takes place. Within a group, the solution is row_number().
You can define the group by doing a cumulative sum of B values. Because B ends the group, you want to count the number of B after each record.
This results in:
select t.*,
row_number() over (partition by grp order by ord) - 1 as running_count
from (select t.*,
sum(case when emp = 'B' then 1 else 0 end) over (order by ord desc) as grp
from tbl t
) t;

How can I select rows with MAX(Column) when another column has distinct values in oracle?

I have four columns like this.
Material Description Quantity Date
a 133 200 26-09-2016 12:33
a 133 400 27-09-2016 10:33
I need to take the quantity from that material when Max(Date);
I tried this but if quantity is distinct in shows both rows.
Select material , description , quantity , max(date)
FROM materials
group by material, description , quantity
Use that condition in WHERE clause like
Select material , description , quantity
FROM materials
WHERE "Date" = (select max("Date") from materials)
Use the RANK() analytic function:
SELECT *
FROM (
SELECT materials,
description,
quantity,
date,
RANK() OVER ( PARTITION BY materials ORDER BY date DESC ) AS rnk
FROM materials
)
WHERE rnk = 1;
This will get multiple rows if there are rows with the same materials and maximum date values - if you only want a single row then use ROW_NUMBER() instead of RANK().
you can use row_number() like this (i'am add material b, if you need find quantity for all materials in group "a" and "b"):
WITH a(Material, Description , Quantity , sDate) AS
(SELECT 'b', 133, 1200 , to_date('26-09-2016 12:33','dd-mm-yyyy hh24:mi') FROM dual UNION ALL
SELECT 'b', 133, 2200 , to_date('29-09-2016 12:33','dd-mm-yyyy hh24:mi') FROM dual UNION ALL
SELECT 'a', 133, 200 , to_date('26-09-2016 12:33','dd-mm-yyyy hh24:mi') FROM dual UNION ALL
SELECT 'a', 133, 400 , to_date('27-09-2016 10:33','dd-mm-yyyy hh24:mi') FROM dual )
SELECT *
FROM (SELECT a.*,
row_number() over(partition BY material order by sdate DESC) rn
FROM a)
WHERE rn = 1
MATERIAL DESCRIPTION QUANTITY SDATE RN
-------- ----------- ---------- --------- ----------
a 133 400 27-SEP-16 1
b 133 2200 29-SEP-16 1
SELECT *
FROM (
SELECT materials,
description,
quantity,
date,
RANK() OVER ( PARTITION BY materials ORDER BY date DESC ) AS rnk
FROM materials
)
WHERE rnk = 1;

select rows between two character values of a column

I have a table which shows as below:
S.No | Action
1 | New
2 | Dependent
3 | Dependent
4 | Dependent
5 | New
6 | Dependent
7 | Dependent
8 | New
9 | Dependent
10 | Dependent
I here want to select the rows between the first two 'New' values in the Action column, including the first row with the 'New' action. Like [New,New)
For example:
In this case, I want to select rows 1,2,3,4.
Please let me know how to do this.
Hmmm. Let's count up the cumulative number of times that New appears as a value and use that:
select t.*
from (select t.*,
sum(case when action = 'New' then 1 else 0 end) over (order by s_no) as cume_new
from t
) t
where cume_new = 1;
you can do some magic with analytic functions
1 select group of NEW actions, to get min and max s_no
2 select lead of 2 rows
3 select get between 2 sno (min and max)
with t as (
select 1 sno, 'New' action from dual union
select 2,'Dependent' from dual union
select 3,'Dependent' from dual union
select 4,'Dependent' from dual union
select 5,'New' from dual union
select 6,'Dependent' from dual union
select 7,'Dependent' from dual union
select 8,'New' from dual union
select 9,'Dependent' from dual union
select 10,'Dependent' from dual
)
select *
from (select *
from (select sno, lead(sno) over (order by sno) a
from ( select row_number() over (partition by action order by Sno) t,
t.sno
from t
where t.action = 'New'
) a
where t <=2 )
where a is not null) a, t
where t.sno >= a.sno and t.sno < a.a

SQL Grouping by Ranges

I have a data set that has timestamped entries over various sets of groups.
Timestamp -- Group -- Value
---------------------------
1 -- A -- 10
2 -- A -- 20
3 -- B -- 15
4 -- B -- 25
5 -- C -- 5
6 -- A -- 5
7 -- A -- 10
I want to sum these values by the Group field, but parsed as it appears in the data. For example, the above data would result in the following output:
Group -- Sum
A -- 30
B -- 40
C -- 5
A -- 15
I do not want this, which is all I've been able to come up with on my own so far:
Group -- Sum
A -- 45
B -- 40
C -- 5
Using Oracle 11g, this is what I've hobbled togther so far. I know that this is wrong, by I'm hoping I'm at least on the right track with RANK(). In the real data, entries with the same group could be 2 timestamps apart, or 100; there could be one entry in a group, or 100 consecutive. It does not matter, I need them separated.
WITH SUB_Q AS
(SELECT K_ID
, GRP
, VAL
-- GET THE RANK FROM TIMESTAMP TO SEPARATE GROUPS WITH SAME NAME
, RANK() OVER(PARTITION BY K_ID ORDER BY TMSTAMP) AS RNK
FROM MY_TABLE
WHERE K_ID = 123)
SELECT T1.K_ID
, T1.GRP
, SUM(CASE
WHEN T1.GRP = T2.GRP THEN
T1.VAL
ELSE
0
END) AS TOTAL_VALUE
FROM SUB_Q T1 -- MAIN VALUE
INNER JOIN SUB_Q T2 -- TIMSTAMP AFTER
ON T1.K_ID = T2.K_ID
AND T1.RNK = T2.RNK - 1
GROUP BY T1.K_ID
, T1.GRP
Is it possible to group in this way? How would I go about doing this?
I approach this problem by defining a group which is the different of two row_number():
select group, sum(value)
from (select t.*,
(row_number() over (order by timestamp) -
row_number() over (partition by group order by timestamp)
) as grp
from my_table t
) t
group by group, grp
order by min(timestamp);
The difference of two row numbers is constant for adjacent values.
A solution using LAG and windowed analytic functions:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST ( "Timestamp", "Group", Value ) AS
SELECT 1, 'A', 10 FROM DUAL
UNION ALL SELECT 2, 'A', 20 FROM DUAL
UNION ALL SELECT 3, 'B', 15 FROM DUAL
UNION ALL SELECT 4, 'B', 25 FROM DUAL
UNION ALL SELECT 5, 'C', 5 FROM DUAL
UNION ALL SELECT 6, 'A', 5 FROM DUAL
UNION ALL SELECT 7, 'A', 10 FROM DUAL;
Query 1:
WITH changes AS (
SELECT t.*,
CASE WHEN LAG( "Group" ) OVER ( ORDER BY "Timestamp" ) = "Group" THEN 0 ELSE 1 END AS hasChangedGroup
FROM TEST t
),
groups AS (
SELECT "Group",
VALUE,
SUM( hasChangedGroup ) OVER ( ORDER BY "Timestamp" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp
FROM changes
)
SELECT "Group",
SUM( VALUE )
FROM Groups
GROUP BY "Group", grp
ORDER BY grp
Results:
| Group | SUM(VALUE) |
|-------|------------|
| A | 30 |
| B | 40 |
| C | 5 |
| A | 15 |
This is typical "star_of_group" problem (see here: https://timurakhmadeev.wordpress.com/2013/07/21/start_of_group/)
In your case, it would be as follows:
with t as (
select 1 timestamp, 'A' grp, 10 value from dual union all
select 2, 'A', 20 from dual union all
select 3, 'B', 15 from dual union all
select 4, 'B', 25 from dual union all
select 5, 'C', 5 from dual union all
select 6, 'A', 5 from dual union all
select 7, 'A', 10 from dual
)
select min(timestamp), grp, sum(value) sum_value
from (
select t.*
, sum(start_of_group) over (order by timestamp) grp_id
from (
select t.*
, case when grp = lag(grp) over (order by timestamp) then 0 else 1 end
start_of_group
from t
) t
)
group by grp_id, grp
order by min(timestamp)
;