Select rows without the maximum amount or rows from grouping - sql

I have a table called MyTable which has columns A, B and then multiple other columns where the values don't matter.
What I want to do is filter out all the rows, that when we group the data by A gives the maximum amount of rows for the given B. Probably easier to explain with an example, if the data looked like this
A B ...
a f ...
a f ...
a f ...
a g ...
a g ...
b h ...
b h ...
b i ...
b i ...
c j ...
c j ...
The output would be
A B ...
a g ...
a g ...
b i ...
b i ...
Filtered out the all the data with (a, f) because there is 3 of them compared to only 2 of (a, g).
Filtered out (b, h) because there is 2 of them compared to 2 of (b, i), in this case it makes no difference which we filter out as long it's one of them.
Filtered out(c, j) as it is the only grouping and therefore still the maximum amount.
In term of how to implement this I'm thinking we need to do something like this at some point to get the amount for each grouping:
SELECT A, B, count()
FROM MyTable
GROUP BY A, B
This should initially give something of the form:
A B count
a f 3
a g 2
b h 2
b i 2
c j 2
Not sure at this point how to get the maximum for each A then apply it when selecting from the original table?

If your RDBMS uses window functions use row_number in a CTE
with cte as(
select
grouper,
value,
count(value) "number",
row_number() over (partition by grouper order by count(value) desc) rn
from t
group by
grouper,
value)
select * from cte where rn = 1;
grouper | value | number | rn
:------ | :---- | -----: | -:
a | f | 3 | 1
b | h | 2 | 1
c | j | 2 | 1
db<>fiddle here

Here is one option:
with tabl1 as(
select col1,col2,count(*)over(partition by col1,col2) cnt
from test1 t1)
select col1,max(col2) from tabl1 t1
WHERE exists(
select *
from tabl1 t2
where t1.cnt
<=t2.cnt and t1.col1=t2.col1 and t1.col2!=t2.col2
)
group by col1
Sample:
create table test1 (col1 varchar(1),col2 varchar(1));
insert into test1 values ('a', 'f');
insert into test1 values ('a', 'f');
insert into test1 values ('a', 'f');
insert into test1 values ('a', 'g');
insert into test1 values ('a', 'g');
insert into test1 values ('b', 'h');
insert into test1 values ('b', 'h');
insert into test1 values ('b', 'i');
insert into test1 values ('b', 'i');
insert into test1 values ('c', 'j');
insert into test1 values ('c', 'j');
Result:
COL1 | MAX(COL2)
b i
a g

Related

Multiple conditions on multiple columns

I have table that looks like this
WO | PS | C
----------------
12 | 1 | a
12 | 2 | b
12 | 2 | b
12 | 2 | c
13 | 1 | a
I want to find values from WO column where PS has value 1 and C value a AND PS has value 2 and C has value b. So on one column I need to have multiple conditions and I need to find it within WO column. If there is no value that matches two four conditions I don't want to have column WO included.
I tried using condition:
WHERE PS = 1 AND C = a AND PS = 2 AND C = b
but it does not work and does not have connection to WO column as mentioned above.
Edit:
I need to find WO which has (PS = 1 AND C = a) and at the same time it also has rows where (PS = 2 and C = b).
The result should be:
WO | PS | C
----------------
12 | 1 | a
12 | 2 | b
12 | 2 | b
If either of rows: (PS = 1 and C = a) or (PS = 2 and C = b) does not exist then nothing should be returned.
WHERE (PS = 1 AND C = a) or (PS = 2 AND C = b)
try this condition
As I understand this, you need two IN clauses or two EXIST clauses, something like this:
SELECT DISTINCT wo, ps, c
FROM yourtable
WHERE wo IN
(SELECT wo FROM yourtable WHERE ps = 1 and c = 'a')
AND wo IN
(SELECT wo FROM yourtable WHERE ps = 2 and c = 'b');
This will produce this outcome:
WO | PS | C
----------------
12 | 1 | a
12 | 2 | b
12 | 2 | c
Please note that in the last row of the result, the column C has value c instead of b as you have shown in your question. I guess this was your mistake when creating the sample outcome?
If I understand your question incorrect, please let me know and explain what's wrong, then I would review it.
Edit: To create the same result as shown in your question, this query would do:
SELECT wo, ps, c
FROM yourtable
WHERE ps IN (1,2) AND c IN ('a','b')
AND wo IN
(SELECT wo FROM yourtable WHERE ps = 1 and c = 'a')
AND wo IN
(SELECT wo FROM yourtable WHERE ps = 2 and c = 'b');
But I really don't believe this is what you were looking for ;)
Try out: db<>fiddle
I think you can make use of an exists criteria here to filter your rows correctly, I would like to see a wider sample data set to be sure though.
select *
from t
where ps in (1,2) and C in ('a','b')
and exists (
select * from t t2 where t2.WO = t.WO
and t2.PS != t.PS and t2.C != t.C
);
Just to throw in one more solution, you can do this with a single reference to your table, but this may not necessarily mean that it is more efficient. The first part is to filter based on the combinations you want:
DECLARE #T TABLE (WO INT, PS INT, C CHAR(1))
INSERT #T (WO, PS, C)
VALUES (12, 1, 'a'), (12, 2, 'b'), (12, 2, 'b'), (12, 2, 'c'), (13, 1, 'a');
SELECT *
FROM #T AS t
WHERE (t.PS = 1 AND t.C = 'a')
OR (t.PS = 2 AND t.C = 'B');
WO
PS
C
12
1
a
12
2
b
12
2
b
13
1
a
But you want to exclude WO 13 because this doesn't have both combinations, so what we ideally need is a count distinct of WS and C to find those with a distinct count of 2. You can't do COUNT(DISTINCT ..) in a windowed function directly, but you can do this indirectly with DENSE_RANK():
DECLARE #T TABLE (WO INT, PS INT, C CHAR(1))
INSERT INTO #T (WO, PS, C)
VALUES (12, 1, 'a'), (12, 2, 'b'), (12, 2, 'b'), (12, 2, 'c'), (13, 1, 'a');
SELECT *,
CntDistinct = DENSE_RANK() OVER(PARTITION BY t.WO ORDER BY t.PS, t.C) +
DENSE_RANK() OVER(PARTITION BY t.WO ORDER BY t.PS DESC, t.C DESC) - 1
FROM #T AS t
WHERE (t.PS = 1 AND t.C = 'a')
OR (t.PS = 2 AND t.C = 'B');
Which gives:
WO
PS
C
CntDistinct
12
1
a
2
12
2
b
2
12
2
b
2
13
1
a
1
You can then put this in a subquery and chose only the rows with a count of 2:
DECLARE #T TABLE (WO INT, PS INT, C CHAR(1))
INSERT INTO #T (WO, PS, C)
VALUES (12, 1, 'a'), (12, 2, 'b'), (12, 2, 'b'), (12, 2, 'c'), (13, 1, 'a');
SELECT t.WO, t.PS, t.C
FROM ( SELECT t.*,
CntDistinct = DENSE_RANK() OVER(PARTITION BY t.WO ORDER BY t.PS, t.C) +
DENSE_RANK() OVER(PARTITION BY t.WO ORDER BY t.PS DESC, t.C DESC) - 1
FROM #T AS t
WHERE (t.PS = 1 AND t.C = 'a')
OR (t.PS = 2 AND t.C = 'B')
) AS t
WHERE t.CntDistinct = 2;
Finally, if the combinations are likely change, or are a lot more than 2, you may find building a table of the combinations you are looking for a more maintainable solution:
DECLARE #T TABLE (WO INT, PS INT, C CHAR(1))
INSERT INTO #T (WO, PS, C)
VALUES (12, 1, 'a'), (12, 2, 'b'), (12, 2, 'b'), (12, 2, 'c'), (13, 1, 'a');
DECLARE #Combinations TABLE (PS INT, C CHAR(1), PRIMARY KEY (PS, C));
INSERT #Combinations(PS, C)
VALUES (1, 'a'), (2, 'b');
SELECT t.WO, t.PS, t.C
FROM ( SELECT t.*,
CntDistinct = DENSE_RANK() OVER(PARTITION BY t.WO ORDER BY t.PS, t.C) +
DENSE_RANK() OVER(PARTITION BY t.WO ORDER BY t.PS DESC, t.C DESC) - 1
FROM #T AS t
INNER JOIN #Combinations AS c
ON c.PS = t.PS
AND c.C = t.C
) AS t
WHERE t.CntDistinct = (SELECT COUNT(*) FROM #Combinations);
Let's chat about demo data. You provided some useful data that helps us see what your problem is, but no DDL. If you provide your demo data similar to this, it makes it easier for us to understand the issue:
DECLARE #table TABLE (WO INT, PS INT, C NVARCHAR(10))
INSERT INTO #table (WO, PS, C) VALUES
(12, 1, 'a'), (12, 2, 'b'),
(12, 2, 'b'), (12, 2, 'c'),
(13, 1, 'a')
Now on to your question. It looks to me like you just need a composite conditions and that one of them needs to evaluate to fully true. Consider this:
SELECT *
FROM #table
WHERE (
PS = 1
AND C = 'a'
)
OR (
PS = 2
AND C = 'b'
)
The predicates wrapped in the parens are evaluated as a whole in the WHERE clause. If one of the predicates is false, the whole thing is. If either composite evaluates to true, we return the row.
WO PS C
---------
12 1 a
12 2 b
12 2 b
13 1 a
This result set does include WO 13, as by your definition it should be there. I don't know if there are additional things you wanted to evaluate which may exclude it, but it does have a PS of 1 and a C of a.
Edit:
if the question is as discussed in the comments that a single WO must contain BOTH then this may be the answer:
SELECT *
FROM #table t
INNER JOIN (
SELECT t1.WO
FROM #table t1
INNER JOIN #table t2
ON t1.WO = t2.WO
WHERE t1.PS = 1
AND t1.C = 'a'
AND t2.PS = 2
AND t2.C = 'b'
GROUP BY t1.WO
) a
ON t.WO = a.WO
WHERE (
t.PS = 1
AND t.C = 'a'
)
OR (
t.PS = 2
AND t.C = 'b'
)
WO PS C WO
--------------
12 1 a 12
12 2 b 12
12 2 b 12

Remove duplicate value in different categories in same table SQL but keep the first category value

Let's say I have a table with id and category like the table below
D_id | D_category
-----------------
1 | A
2 | A
3 | A
1 | B
2 | B
4 | B
5 | B
1 | C
2 | C
4 | C
5 | C
6 | C
Hence the rules are like this
values in category A should not be appear in category B and category C
values in category B should not be appear in category C
The end result should be like this
D_id | D_category
-----------------
1 | A
2 | A
3 | A
4 | B
5 | B
6 | C
I will provide a solution that works but its not an ideal solution can anyone help me to provide a better solution in case there are more categories meaning that if there are more category then it should follow the rules the values in previous categories should not appear in any other categories
DECLARE #A TABLE(
D_id INT NOT NULL,
D_category VARCHAR(MAX));
INSERT INTO #A(D_id,D_category)
VALUES (1, 'A'),
(2, 'A'),
(3, 'A'),
(1, 'B'),
(2, 'B'),
(4, 'B'),
(5, 'B'),
(1, 'C'),
(2, 'C'),
(4, 'C'),
(5, 'C'),
(6, 'C')
DELETE t
FROM #A t
WHERE t.D_category = 'B' AND EXISTS (SELECT 1 FROM #A t2 WHERE t2.D_category = 'A' and t.D_id = t2.D_id)
DELETE t
FROM #A t
WHERE t.D_category = 'C' AND EXISTS (SELECT 1 FROM #A t2 WHERE t2.D_category = 'B' and t.D_id = t2.D_id)
DELETE t
FROM #A t
WHERE t.D_category = 'C' AND EXISTS (SELECT 1 FROM #A t2 WHERE t2.D_category = 'A' and t.D_id = t2.D_id)
select * from #A
Just check that the specified record doesn't exist earlier in the sequence.
select *
from #A A1
where not exists (
select 1
from #A A2
where A2.D_id = A1.D_id
and A2.D_category < A1.D_category
)
or just make use of row_number()
select *
from
(
select *, r = row_number() over (partition by D_id order by D_category)
from #A
) a
where a.r = 1
Delete using the join syntax:
delete a
from my_table a
join my_table b on a.D_id = b.D_id
and a.D_category > b.D_category
See live demo.

postgresql count rows with special columns

my table looks like this:
table1:
ident
A
B
C
D
1
2
1
2
3
3
1
2
1
5
4
4
5
4
1
3
6
3
2
7
3
8
1
9
1
Now i need something like a analysis from that table.
It should look like:
table2:
name
just_name
A
3
B
1
C
1
D
0
the column just_name count the columns from table1 where there are no other entry in the other columns exept the ident column.
in the real table there are more than 4 columns so i better not work with a where for every other column. :)
thx
If you are ok with just putting the column names in column list then below query can get you your desired result. Though it's possible to make those part dynamic but if you know your column names and it's not changing dynamically this will be better approach. Please let me know if you wanna have hat part dynamic olso.
Schema:
create table mytable1(ident int, A int, B int, C int, D int);
insert into mytable1 values(1,null,2,1,null);
insert into mytable1 values(2,3,null,null,null);
insert into mytable1 values(3,1,2,1,5);
insert into mytable1 values(4,null,4,null,null);
insert into mytable1 values(5,4,1,null,3);
insert into mytable1 values(6,null,3,2,null);
insert into mytable1 values(7,null,null,3,null);
insert into mytable1 values(8,1,null,null,null);
insert into mytable1 values(9,1,null,null,null);
Query:
with cte as (SELECT
unnest(array['A', 'B', 'C','D']) AS Columns,
unnest(array[A, B, C,D]) AS Values,
row_number()over(order by 1)rn
FROM mytable1),
cte2 as (
select rn,max(cte.columns)col,count(*) from cte
where values is not null
group by rn
having count(*)=1)
select distinct columns as name,coalesce(just_name,0) from cte left join (select col,count(rn) just_name from cte2
group by col)t on cte.columns=t.col
Output:
name
coalesce
A
3
C
1
D
0
B
1
db<>fiddle here
I would do this as columns:
select count(*) filter (where A is not null and B is null and C is null and d is null),
count(*) filter (where A is null and B is not null and C is null and d is null),
count(*) filter (where A is null and B is null and C is not null and d is null),
count(*) filter (where A is null and B is null and C is null and d is not null)
from t;
You could also express this as:
select c.colname, count(*) filter (where c.num_vals = 1)
from t cross join lateral
(select colname, count(colval) over () as num_vals
from (values ('a', t.a), ('b', t.b), ('c', t.c), ('d', t.d)) v(colname, colval)
group by colname
) c
group by c.colname;
This returns the values in separate rows. And it is a bit easier to generalize.

Subquery in select - non-grouped values in 'IN' clause

Assume the following simplified schema:
create table main_table
(
a number,
b number,
c number
);
create table other_table
(
c number,
d number
)
Now, what i want to achieve:
I have a query on main_table, that groups by a,b.
I need to use the "all values of c" in subquery in select clause to get some data from other tables.
I can't join to the other table unfortunately.
Pseudocode would be:
select mt.a,
mt.b,
(select /* some aggregated value */
from other_table ot
where ot.c in (all_values_of_c_within_group)
)
from main table mt
group by mt.a, mt.b
There are two ways i know it's possible to handle this:
Use join on other_table and then aggregate values from there - unfortunately i can't do it, because of how the real query is structured (3 nested views, 800 sloc, 30 values in group by - long story)
Use listagg and then 'delistagg' it with 'instr'. Pseudocode:
/*(...)*/
(select /* some_aggregated_value */
from other_table ot
where instr(',' || listagg(
to_char(mt.c), ',') within group (order by 1),
',' || ot.c) > 0
)
/*(...)*/
But that's just terrible code, and it automatically prevents using any potentially existing indexes on other_table.c.
Is there a syntax to properly get "all values of column within group?
It is unclear without some data and expected results what you are trying to achieve but I think you do what you want using collections:
SQL Fiddle
Oracle 11g R2 Schema Setup:
create table main_table( a, b, c ) AS
SELECT 1, 1, 1 FROM DUAL UNION ALL
SELECT 1, 1, 2 FROM DUAL UNION ALL
SELECT 1, 1, 3 FROM DUAL
/
create table other_table( c, d ) AS
SELECT 1, 4 FROM DUAL UNION ALL
SELECT 3, 6 FROM DUAL UNION ALL
SELECT 5, 8 FROM DUAL
/
CREATE TYPE number_table AS TABLE OF NUMBER
/
Query 1:
SELECT a,
b,
( SELECT LISTAGG( d, ',' ) WITHIN GROUP ( ORDER BY d )
FROM other_table
WHERE c MEMBER OF m.cs
) ds
FROM (
SELECT a,
b,
CAST( COLLECT( c ) AS number_table ) AS cs
FROM main_table
GROUP BY a, b
) m
Results:
| A | B | DS |
|---|---|-----|
| 1 | 1 | 4,6 |
Query 2: But it seems simpler to just use a LEFT OUTER JOIN:
SELECT a,
b,
LISTAGG( d, ',' ) WITHIN GROUP ( ORDER BY d ) ds
FROM main_table m
LEFT OUTER JOIN other_table o
ON ( m.c = o.c )
GROUP BY a, b
Results:
| A | B | DS |
|---|---|-----|
| 1 | 1 | 4,6 |
You may just be able to aggregate the subquery, e.g. with sum as the aggregate function:
select mt.a,
mt.b,
sum(
(select d
from other_table ot
where ot.c = mt.c)
) as sum_d
from main_table mt
group by mt.a, mt.b;
With some made-up data:
insert into main_table values (1, 2, 3);
insert into main_table values (1, 2, 4);
insert into main_table values (2, 3, 4);
insert into main_table values (2, 3, 5);
insert into main_table values (2, 3, 6);
insert into other_table values (3, 10);
insert into other_table values (4, 11);
insert into other_table values (5, 12);
insert into other_table values (6, 13);
that query gives:
A B SUM_D
---------- ---------- ----------
2 3 36
1 2 21
As you noted, with an extra row:
insert into main_table values (2, 3, 4);
that query counts a matching c's d value multiple times, so you get 47 instead of 36:
A B SUM_D
---------- ---------- ----------
2 3 47
1 2 21
You can add a distinct:
select mt.a,
mt.b,
sum(distinct
(select d
from other_table ot
where ot.c = mt.c)
) as sum_d
from main_table mt
group by mt.a, mt.b;
A B SUM_D
---------- ---------- ----------
1 2 21
2 3 36
This assumes that c, or at least the combination of c, d, is unique in other_table.
This should work, and should not impose the uniqueness requirements on other_table that Alex's answer does.
select mt.a,
mt.b,
(select sum(d) /* some aggregated value */
from other_table ot
where ot.c in ( SELECT mt2.c
FROM main_table mt2
WHERE mt2.a = mt.a AND mt2.b = mt.b
)
) agg
from main_table mt
group by mt.a, mt.b;
It has to go to main_table again for each group, but considering you already are accessing those records, we should be talking about extra logical I/O instead of extra physical I/O.
Using Alex Poole's test data (with the duplicate MAIN_TABLE row), I get this in 12c:
+---+---+-----+
| A | B | AGG |
+---+---+-----+
| 2 | 3 | 36 |
| 1 | 2 | 21 |
+---+---+-----+

Oracle SQL: Group rows by identical fields without aggregate function

I'm quite sure it's possible, but I can't quite remember how.
Consider following table:
A B C
1 1 A
1 2 A
1 2 B
2 1 C
2 2 A
2 2 B
2 2 C
I would like to present it as:
A B C
1 1 A
1 2 A
B
2 1 C
2 2 A
B
C
In other words, group on a unique (A,B).
I was thinking along the lines of GROUP BY ROLLUP, but I can't really figure out how to just make rows null without a group by function.
(note: I imagine this has been asked before, but I just can't find the right search terms to find it. Thanks)
Try this:
create table t
(a number,
b number,
c varchar2(1));
insert into t values(1, 1, 'A');
insert into t values(1, 2, 'A');
insert into t values(1, 2, 'B');
insert into t values(2, 1, 'C');
insert into t values(2, 2, 'A');
insert into t values(2, 2, 'B');
insert into t values(2, 2, 'C');
select case when rn = 1
then a
else null end as a,
case when rn = 1
then b
else null end as b,
c
from (select a, b, c,
row_number() over (partition by a, b order by c) as rn,
row_number() over (order by a, b, c) as rn_total
from t)
order by rn_total;
A B C
- - -
1 1 A
1 2 A
B
2 1 C
2 2 A
B
C
And finally, clean your test environment:
drop table t purge;
You can do it even without a subquery:
select case when row_number() over (partition by a, b order by c) = 1
then a
else null end as a,
case when row_number() over (partition by a, b order by c) = 1
then b
else null end as b,
c
from t
order by t.a, t.b, c ;
Tested at SQL-Fiddle