Oracle Group by based on next row value - sql

We are trying to get a group by result by checking the next rows value.
Sample Data:
Table A
COL1 COL2 COL3
---- ---- ----
B BUY 1
B SELL 1.2
B SELL 2
C BUY 3
C SELL 4
C BUY 5
Result:
COL1 COL2 COUNT(1)
---- ---- --------
B BUY 1
B SELL 2
C BUY 1
C SELL 1
C BUY 1

You appear to have ordered by COL3; if this is the case then:
SELECT col1,
col2,
change - COALESCE( LAG( change ) OVER ( PARTITION BY col1 ORDER BY change ), 0 )
AS cnt
FROM (
SELECT col1,
col2,
CASE LEAD( col2 ) OVER ( PARTITION BY col1 ORDER BY col3 )
WHEN col2
THEN NULL
ELSE ROW_NUMBER() OVER ( PARTITION BY col1 ORDER BY col3 )
END AS change
FROM a
)
WHERE change IS NOT NULL;

If I understand correctly, you can do this with a difference of row numbers approach:
select col1, col2, count(*)
from (select t.*,
row_number() over (partition by col1 order by col3) as seqnum,
row_number() over (partition by col1, col2 order by col3) as seqnum_2,
from t
) t
group by col1, col2, (seqnum - seqnum_2);
This identifies groups of adjacent col2 values based on the ordering in col3.

Related

Select equal number of random rows with respect to a column

Consider I have a table like this
Col1 || Col2
-------------
a || 0
b || 0
c || 1
d || 1
e || 0
How can I select rows from it so that I have equal number of 1s and 0s, like below can be a result
Col1 || Col2
-------------
a || 0
c || 1
d || 1
e || 0
The rows removed/left out are at random and deleting from an existing table would work as well.
For each col2 partition, you can give each row a row number and then find those rows where there is only one instance of the row number and delete them:
DELETE FROM table_name
WHERE ROWID IN (
SELECT MIN(ROWID)
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY col2 ORDER BY DBMS_RANDOM.VALUE)
AS rn
FROM table_name
)
GROUP BY rn
HAVING COUNT(*) < 2
);
If you just want to SELECT the rows then you can use a similar technique:
SELECT col1, col2
FROM (
SELECT col1,
col2,
COUNT(*) OVER (PARTITION BY rn) AS cnt
FROM (
SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col2 ORDER BY DBMS_RANDOM.VALUE)
AS rn
FROM table_name
)
)
WHERE cnt = 2;
db<>fiddle here
How can I select rows from it so that I have equal number of 1s and 0s?
Yet another option might be to count COL2 values and use least of those two (as the final result has to have equal number of 0s and 1s) in a UNION set operation. Something like this:
Sample data:
SQL> select * from test;
COL1 COL2
---- ----------
a 0
b 0
c 1
d 1
e 0
Query & result:
SQL> with cnts as
2 -- count rows by COL2 value
3 (select sum(case when col2 = 0 then 1 else 0 end) cnt_0,
4 sum(case when col2 = 1 then 1 else 0 end) cnt_1
5 from test
6 )
7 select t.* from test t cross join cnts c
8 where t.col2 = 0 and rownum <= least(c.cnt_0, c.cnt_1)
9 union all
10 select t.* from test t cross join cnts c
11 where t.col2 = 1 and rownum <= least(c.cnt_0, c.cnt_1);
COL1 COL2
---- ----------
a 0
b 0
c 1
d 1
SQL>
You can do this with only one subquery/CTE. The following returns the smaller number of 0s and 1 (which determines the number of rows being returned):
least( sum(col2), sum(1 - col2) ) as num_rows
Then, you can incorporate this into a window function with row_number():
select col1, col2
from (select t.*,
least(sum(col2) over (), sum(1-col2) over ()) as num_rows,
row_number() over (partition by col2 order by dbms_random.value) as seqnum
from t
) t
where seqnum <= num_rows;
use the window function to count the frequency of col2 and row number over col2. Then get the minimum frequency from it. Later get the rows with rownum less than or equal to min frequency.
with data AS
(
SELECT *, row_number() over(partition by col2 order by dbms_random.value()) as rownum, COUNT(*) over(partition by col2) freq from test
),
data2 as
(
SELECT min(freq) as cnt from data
)
SELECT col1, col2 from data,data2 where rownum <= cnt
This analytic function check if there are more zeroes or ones in the table
sum(decode(col2,0,-1,col2)) over()
Depending on the result use cumulative sum starting with that value of col2 that appears in lower count and mapping (using decode) it to -1, the other value is mapped to 1.
The filter is done on cum_sum <= 0 i.e. you get the same number of 0 and 1.
with t1 as (
select
col1, col2,
case when sum(decode(col2,0,-1,col2)) over() <= 0 then
/* more zeroes */
sum(decode(col2,0,1,1,-1)) over(order by col2 desc, col1)
else
sum(decode(col2,0,-1,col2)) over(order by col2 , col1)
end as cum_sum
from tab)
select col1, col2
from t1
where cum_sum <= 0;

How to keep track of values which are present in a group as well as in all previous group in oracle SQL?

Let's say I have a table with col1 and col2
I group by col1 and order by col1
From the first group, I want to have all values of col2 but from the second group, I want to have only those values which were present in the first group and so on with the consecutive groups.
sample table
col1 col2
1 A
1 B
1 C
1 D
2 E
2 A
2 B
2 G
3 B
3 D
And the output should be
col1 col2
1 A
1 B
1 C
1 D
2 A
2 B
3 B
You can use window functions in order to avoid to read the same table twice:
Number the groups to make sure to have 1, 2, 3, ... without gaps.
Get a rolling count of col2, or in other words the cumulated numbers of their appearances.
Only show rows where the group number equals the count.
The query:
select col1, col2
from
(
select
col1, col2,
dense_rank() over (order by col1) as rn,
count(*) over (partition by col2 order by col1) as cnt
from mytable
) numbered_and_counted
where rn = cnt
order by col1, col2;
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=f0cc6a211a1a4c767c9e3ce9deb8c28f

If two rows have same id but different col2, how can you keep only the ones that have max col3?

I have a table with three columns (id, col2, col3, col4) where col2 is A or B and col3 and col4 are integers. My problem is, there are many columns that have the same id and a different col2 value, and I want to select ONLY the rows that have a maximum value in col3.
For instance, if we have:
id | col2 | col3 | col4
1 | A | 3 | 2
1 | B | 5 | 3
2 | A | 6 | 2
...
I want to keep only the tuple (1, B, 5, 3). How can I achieve this?
I've tried this:
SELECT id, col2, MAX(col3), col4 FROM t GROUP BY id;
but I get an error saying that this is not a valid GROUP BY statement.
You can use keep:
SELECT id,
MAX(col2) KEEP (DENSE_RANK FIRST ORDER BY col3 DESC) as col2
MAX(col3),
MAX(col4) KEEP (DENSE_RANK FIRST ORDER BY col3 DESC) as col4
FROM t
GROUP BY id;
Or:
SELECT id, col2, col3, col4
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY col3 DESC) as seqnum
FROM t
) t
WHERE seqnum = 1;
This query:
select t.*
from tablename t inner join (
select id, max(col3) col3
from tablename
group by id
having count(distinct col2) > 1
) g on g.id = t.id and g.col3 = t.col3
returns for each id that has different values in col2 only 1 row: the one containing the maximum value of col3.
If you also want the other rows where each id does not have different values in col2, then use UNION ALL:
select t.*
from tablename t inner join (
select id, max(col3) col3
from tablename
group by id
having count(distinct col2) > 1
) g on g.id = t.id and g.col3 = t.col3
union all
select t.* from tablename t
where not exists (
select 1 from tablename
where id = t.id and col2 <> t.col2
)
select * from TableName where col3 = (select max(col3) from TableName)

SQL Adding row numbers

I am looking for a way to add row numbers, but adding duplicated row numbers when one of the columns are duplicates
Logic
* Every time Col1 always start RowNo from 1
* Every time Col1 + Col2 are the same use the same RowNo
Table1
Col1 Col2
1 A
1 B
1 B
2 C
2 D
2 E
3 F
4 G
Output should be
Col1 Col2 RowNo
1 A 1
1 B 2
1 B 2
2 C 1
2 D 2
2 E 3
3 F 1
4 G 1
I have tried,but the output is not correct
select col1,col2
,row_number() over(partition by (col1+col2) order by col1)
from Table1
Use DENSE_RANK():
SELECT Col1, Col2,
DENSE_RANK() OVER (PARTITION BY Col1 ORDER BY Col2) RowNo
FROM yourTable;
ORDER BY Col1, Col2;
Demo
You can use row_number window function with partitioning on the col1 column and ordering on col2
select t.*,
row_number() over (partition by col1 order by col2) as col3
from your_table t;

SQL Server : get max of the column2 and column3 value must be 1

I have an output of some part of my stored proedure like this:
col1 col2 col3 col4
--------------------------
2016-05-05 1 2 2
2016-05-05 1 3 32
2016-05-12 2 1 11
2016-05-12 3 1 31
Now I need to get result based on this condition
col2 = 1 and col3 = max or col3 = 1
and col2 = max
The final result should be
col1 col2 col3 col4
-------------------------
2016-05-05 1 3 32
2016-05-12 3 1 31
Not sure if thats the most efficient way , but you can use ROW_NUMBER() :
SELECT * FROM (
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col3 DESC) as rnk,
WHERE t.col2 = 1
UNION ALL
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.col1 ORDER BY t.col2 DESC) as rnk,
WHERE t.col3 = 1) tt
WHERE rnk = 1
This will give you all the records with
(col2=1 and col3=max) or (col3=1 and col2=max)
This is a bit tricky. Your data has no ambiguities, such as duplicate maximuma in col4 or "1" values in both col2 and col3.
The following is a direct translation of the logic in your question:
select t.*
from t
where t.col4 = (select max(t2.col4)
from t t2
where t2.col1 = t.col1 and (t2.col2 = 1 or t2.col3 = 1)
);
Try this. Note if there are more than 1 same max value, then you need all of those in output. And it will work for all scenarios, even when col1 is not in sync with col2 and col3.
I am first finding highest values of col2 and col3 and assigning them value as 1. Then in outer query, I am using your join condition. Demo created for Postgres DB as SQLServer wasn't available.
SQLFiddle Demo
select col1,col2,col3,col4
from
(
select t.*,
RANK() OVER(ORDER BY col3 DESC) as col3_max,
RANK() OVER(ORDER BY col2 DESC) as col2_max
from your_table t
) t1
where
(col2=1 and col3_max=1)
OR
(col3=1 and col2_max=1)
Alternative way:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY iif(col2 = 1, col3, col2) DESC) as r
FROM tbl) t
WHERE r = 1