Removing entries with duplicates in specific columns SQL

Removing entries with duplicates in specific columns SQL - sql

I have 3 columns: A B C. I only want rows that share the same value in col A but different values for both B and C.
1 | item1 | Jan | Amy
2 | item1 | Feb | Amy
3 | item2 | Mar | Bob
4 | item2 | Mar | Bill
5 | item3 | Apr | Charles
6 | item3 | May | Doug
7 | item4 | Jun | Felix
Out of the example above. I want it to show rows 5, 6 and 7.
Is there any good way of doing this?

If I understand well your need, this could be a way, with a single scan of the table:
with test(id, a, b, c) as
(
select 1, 'item1', 'Jan', 'Amy' from dual union all
select 2, 'item1', 'Feb', 'Amy' from dual union all
select 3, 'item2', 'Mar', 'Bob' from dual union all
select 4, 'item2', 'Mar', 'Bill' from dual union all
select 5, 'item3', 'Apr', 'Charles' from dual union all
select 6, 'item3', 'May', 'Doug' from dual union all
select 7, 'item4', 'Jun', 'Felix' from dual
)
select id, a, b, c
from (
select id, a, b, c,
count(distinct b) over (partition by a) count_b,
count(distinct c) over (partition by a) count_c,
count(1) over (partition by a) count_a
from test
)
where count_a = count_b
and count_a = count_c
The result:
ID A B C
---------- ----- --- -------
5 item3 Apr Charles
6 item3 May Doug
7 item4 Jun Felix

Use not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.a = t.a and
(t2.b = t.b or t2.c = t.c) and
t2.id <> t.id
);
This assumes that a column uniquely identifies each row. If you don't have one and the table doesn't have duplicates, then you can use:
select t.*
from t
where not exists (select 1
from t t2
where t2.a = t.a and
(t2.b = t.b or t2.c = t.c) and
not (t2.b = t.b and t2.c = t.c)
);

Related

Presto SQL group by COL1 and concat COL2 values

Say I have this
col_1 | col_2
------------
1 | a
1 | b
1 | c
2 | d
2 | e
I want result like this
col_1 | col_2_concat
-------------------
1 | a,b,c
2 | d,e
something like this I would guess:
select
col_1,
join_by_comma(col2)
from tbl
group by col_1

I think you need something like:
WITH x AS (
SELECT
'a' AS c,
'b' AS c2
UNION ALL
SELECT
'a',
'b2'
UNION ALL
SELECT
'a2',
'b3'
UNION ALL
SELECT
'a2',
'b4'
)
SELECT
c,
ARRAY_JOIN(ARRAY_AGG(c2), ',') as c2
FROM x
GROUP BY
c

Select rows when a value appears multiple times

I have a table like this one:
+------+------+
| ID | Cust |
+------+------+
| 1 | A |
| 1 | A |
| 1 | B |
| 1 | B |
| 2 | A |
| 2 | A |
| 2 | A |
| 2 | B |
| 3 | A |
| 3 | B |
| 3 | B |
+------+------+
I would like to get the IDs that have at least two times A and two times B. So in my example, the query should return only the ID 1,
Thanks!

In MySQL:
SELECT id
FROM test
GROUP BY id
HAVING GROUP_CONCAT(cust ORDER BY cust SEPARATOR '') LIKE '%aa%bb%'
In Oracle
WITH cte AS ( SELECT id, LISTAGG(cust, '') WITHIN GROUP (ORDER BY cust) custs
FROM test
GROUP BY id )
SELECT id
FROM cte
WHERE custs LIKE '%aa%bb%'

I would just use two levels of aggregation:
select id
from (select id, cust, count(*) as cnt
from t
where cust in ('A', 'B')
group by id, cust
) ic
group by id
having count(*) = 2 and -- both customers are in the result set
min(cnt) >= 2 -- and there are at least two instances

This is one option; lines #1 - 13 represent sample data. Query you might be interested in begins at line #14.
SQL> with test (id, cust) as
2 (select 1, 'a' from dual union all
3 select 1, 'a' from dual union all
4 select 1, 'b' from dual union all
5 select 1, 'b' from dual union all
6 select 2, 'a' from dual union all
7 select 2, 'a' from dual union all
8 select 2, 'a' from dual union all
9 select 2, 'b' from dual union all
10 select 3, 'a' from dual union all
11 select 3, 'b' from dual union all
12 select 3, 'b' from dual
13 )
14 select id
15 from (select
16 id,
17 sum(case when cust = 'a' then 1 else 0 end) suma,
18 sum(case when cust = 'b' then 1 else 0 end) sumb
19 from test
20 group by id
21 )
22 where suma = 2
23 and sumb = 2;
ID
----------
1
SQL>

You can use group by and having for the relevant Cust ('A' , 'B')
And query twice (I chose to use with to avoid multiple selects and to cache it)
with more_than_2 as
(
select Id, Cust, count(*) c
from tab
where Cust in ('A', 'B')
group by Id, Cust
having count(*) >= 2
)
select *
from tab
where exists ( select 1 from more_than_2 where more_than_2.Id = tab.Id and more_than_2.Cust = 'A')
and exists ( select 1 from more_than_2 where more_than_2.Id = tab.Id and more_than_2.Cust = 'B')

What you want is a perfect candidate for match_recognize. Here you go:
select id_ as id from t
match_recognize
(
order by id, cust
measures id as id_
pattern (A {2, } B {2, })
define A as cust = 'A',
B as cust = 'B'
)
Output:
Regards,
Ranagal

If a then b check in where clause

I am performing some data quality checks to identify bad data, I am unable to figure out how I can perform a check-such that the data is accurately mapped based on Value 1 vs Value 2.
I ultimately need to identify all IDs in T1 that have incorrect mapping in T2.I have used the following code but doesn't seem to give desired result. The mapping is not in the database and is a rule based on which the data needs to be entered.
- When value in: Apples,Bananas,Cherries,Pears,Kiwis - then it should be mapped to Fruit
- when value in: Cheese - then Cheese
- when value in: Cashews,Almonds - then Nuts
- when value in: Skittles - then Candy
- when value in: Chocolate - then null
Edit: I have added the desired output.
SELECT t1.id, t2.*
FROM t1,t2,t3
WHERE
t1.id = t2.id
AND (
(t2.value1_id IN (01,04,05,08,09) AND t2.value2_id <> 2)
OR (t2.value1_id = 02 and t2.value2_id <> 3)
OR (t2.value1_id IN (03,10) and t2.value2_id <> 1)
OR (t2.value1_id = 06 AND t2.value2_id <> 4)
OR (t2.value1_id = 07 AND t2.value_id IS NOT NULL)
)
T1
ID
1
2
3
4
5
6
7
T2
T1.ID Value1_ID Value2_ID
1 01 2
1 02 3
1 03 1
2 04 2
2 05 2
2 02 3
2 06 4
2 07
3 08 2
3 02 3
4 09 2
4 10 1
5 02 2
5 10 1
6 04 3
6 10 2
7 07 2
T3
ID Value1
01 Apples
02 Cheese
03 Cashews
04 Bananas
05 Cherries
06 Skittles
07 Chocolate
08 Pears
09 Kiwis
10 Almonds
T4
ID Value2
1 Nuts
2 Fruit
3 Cheese
4 Candy
Desired Output:
T1.ID Value1_ID Value2_ID
5 02 2
6 04 3
6 10 2
7 07 2
T1.ID 5, value1_id 02 is in the desired output as Cheese is mapped to Fruit
T1.ID 6, value1_id 04 - Bananas is mapped to Cheese
T1.ID 6, value1_id 10 - Almonds is mapped to Fruit
T1.ID 7, value1_id 07 - Chocolate is mapped to Fruit when it should be null

One of the problems is that - when looking at T2 - it is not easy to tell whether a "mapping" is correct or not.
When creating the test data for T1 and T2, we have used CHARs for VALUE1_IDs, in order to make the subsequent queries a bit more "readable".
Tables
create table T1( id primary key )
as
select 1 from dual union all
select 2 from dual union all
select 3 from dual ;
create table T2 ( id, value1_id, value2_id )
as
select 1, '01', 2 from dual union all
select 1, '02', 3 from dual union all
select 1, '03', 1 from dual union all
select 2, '04', 2 from dual union all
select 2, '05', 2 from dual union all
select 2, '02', 3 from dual union all
select 2, '06', 4 from dual union all
select 2, '07', null from dual union all
select 3, '08', 2 from dual union all
select 3, '02', 3 from dual union all
select 4, '09', 2 from dual union all
select 4, '10', 1 from dual ;
Refactored query
--
-- find incorrect mappings
--
select t2.*, 'T1 id not valid' as status
from t2
where t2.id not in ( select id from T1 )
union all
select t2.*, 'value1_id <-> value2_id mapping incorrect '
from t1 join t2 on t1.id = t2.id
where
( t2.value1_id in ('01','04','05','08','09') and t2.value2_id <> 2 )
or
( t2.value1_id = '02' and t2.value2_id <> 3 )
or
( t2.value1_id in ('03','10') and t2.value2_id <> 1 )
or
( t2.value1_id = '06' and t2.value2_id <> 4 )
or
( t2.value1_id = '07' and t2.value2_id is null )
;
-- result
ID VALUE1_ID VALUE2_ID STATUS
4 10 1 T1 id not valid
4 09 2 T1 id not valid
2 07 NULL value1_id <-> value2_id mapping incorrect
DBfiddle
ALTERNATIVE
Another possibility may be: create a table, containing all valid mappings, in "human readable" form, and use it to validate the mappings stored in T2. However, use whatever approach you are more comfortable with - as long as you get the correct results. Example (tested w/ Oracle 12c, 18c)
-- in addition to tables T1, T2, T3, and T4: table with correct mappings
create table map( category, product )
as
select 'Fruit', 'Apples' from dual union all
select 'Cheese', 'Cheese' from dual union all
select 'Nuts', 'Cashews' from dual union all
select 'Fruit', 'Bananas' from dual union all
select 'Fruit', 'Cherries' from dual union all
select 'Candy', 'Skittles' from dual union all
select 'Candy', 'Chocolate' from dual union all
select 'Fruit', 'Pears' from dual union all
select 'Fruit', 'Kiwis' from dual union all
select 'Nuts', 'Almonds' from dual;
-- make sure that the entries in the MAP table tie in with T3 and T4
alter table map
add (
constraint m_pk primary key ( category, product )
, constraint m_category_fk foreign key ( category ) references T4 ( value2 )
, constraint m_product_fk foreign key ( product ) references T3 ( value1 )
) ;
Find incorrect mappings
-- T2 rows containing incorrect (invalid) mappings
-- -> all rows MINUS the correct ones
select T2.id, T2.value1_id, T2.value2_id
from T2
minus (
select T2.id, T2.value1_id, T2.value2_id
from T2
join (
--
select T4.id categoryid, T3.id productid, M.category, M.product
from T4
join map M on T4.value2 = M.category
join T3 on T3.value1 = M.product
--
) C -- correct mappings
on
C.productid = T2.value1_id
and C.categoryid = T2.value2_id
) ;
-- result
ID VALUE1_ID VALUE2_ID
2 07 NULL
DBfiddle

I would highly recommend that you create a table to represent the one-to-many relationship between T4 and T3. This would represent a first step towards fixing your design, while providing a simple way to solve your current question.
Here is a CREATE TABLE ... AS SELECT order that initializes such a table with your sample data :
create table cat AS
SELECT 1 t3_id, 2 t4_id FROM DUAL
UNION ALL SELECT 4, 2 FROM DUAL
UNION ALL SELECT 5, 2 FROM DUAL
UNION ALL SELECT 8, 2 FROM DUAL
UNION ALL SELECT 9, 2 FROM DUAL
UNION ALL SELECT 2, 3 FROM DUAL
UNION ALL SELECT 3, 1 FROM DUAL
UNION ALL SELECT 10, 1 FROM DUAL
UNION ALL SELECT 6, 4 FROM DUAL
UNION ALL SELECT 7, NULL FROM DUAL
;
With this table in place, indentifiying records incorrectly mapped is as simple as :
SELECT t2.*
FROM t2
WHERE t2.Value2_ID IS NOT NULL AND NOT EXISTS (
SELECT 1 FROM cat WHERE cat.t3_id = t2.Value1_ID AND cat.t4_id = t2.Value2_ID
)
This DB Fiddle demo with your sample data yields :
T1_ID | VALUE1_ID | VALUE2_ID
----: | --------: | --------:
5 | 2 | 2
6 | 4 | 3
6 | 10 | 2
7 | 7 | 2
Hint to further improve your design : you have a one-to-many relationship between T4 (families of aliments) and T3 (aliments). The classic way to represent this is to add a column in the child table (T3) that references the parent table.

If you can't create a table with the mappings from fruit to categories and you know that the values are static then just include the mappings into your query using a nested sub-query or a sub-query factoring clause:
Oracle Setup:
create table T2 ( id, value1_id, value2_id ) as
select 1, '01', 2 from dual union all
select 1, '02', 3 from dual union all
select 1, '03', 1 from dual union all
select 2, '04', 2 from dual union all
select 2, '05', 2 from dual union all
select 2, '02', 3 from dual union all
select 2, '06', 4 from dual union all
select 2, '07', null from dual union all
select 3, '08', 2 from dual union all
select 3, '02', 3 from dual union all
select 4, '09', 2 from dual union all
select 4, '10', 1 from dual union all
select 5, '02', 2 from dual union all
select 5, '10', 1 from dual union all
select 6, '04', 3 from dual union all
select 6, '10', 2 from dual union all
select 7, '07', 2 from dual;
Query:
WITH mappings ( name, category ) AS (
SELECT '01', 2 FROM DUAL UNION ALL
SELECT '02', 3 FROM DUAL UNION ALL
SELECT '03', 1 FROM DUAL UNION ALL
SELECT '04', 2 FROM DUAL UNION ALL
SELECT '05', 2 FROM DUAL UNION ALL
SELECT '06', 4 FROM DUAL UNION ALL
SELECT '07', NULL FROM DUAL UNION ALL
SELECT '08', 2 FROM DUAL UNION ALL
SELECT '09', 2 FROM DUAL UNION ALL
SELECT '10', 1 FROM DUAL
)
SELECT *
FROM T2 t
WHERE NOT EXISTS (
SELECT 1
FROM mappings m
WHERE t.value1_id = m.name
AND ( t.value2_id = m.category
OR ( t.value2_id IS NULL AND m.category IS NULL ) )
);
Results:
ID | VALUE1_ID | VALUE2_ID
-: | :-------- | --------:
5 | 02 | 2
6 | 04 | 3
6 | 10 | 2
7 | 07 | 2
db<>fiddle here

Multiple criteria on the same column

How do I search with multiple criteria on the same column.
T1 has the IDs.
T2:
ID T1_ID(FK) Value
1 1 Apple
2 1 Orange
3 1 Kiwi
4 2 Orange
5 2 Kiwi
6 3 Pear
7 3 Berry
8 3 Orange
9 4 Apple
10 5 Apple
11 5 Apple
12 5 Kiwi
Output:
T2_ID(FK) Value
1 Apple
1 Orange
1 Kiwi
select t2.t1_id, t2.value
from t1, t2
where t1.id = t2.id
and t2.value in ('Apple','Orange','Kiwi')
group by t1.id having count(t2.value)=3
Is this query correct? Doesn't it also bring t2_id = 5 because #5 matches with apple and kiwi although apple is duplicate?

You don't need to join t1 and you need COUNT(DISTINCT column_name):
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE t2 (ID, T1_ID, Value ) AS
SELECT 1, 1, 'Apple' FROM DUAL UNION ALL
SELECT 2, 1, 'Orange' FROM DUAL UNION ALL
SELECT 3, 1, 'Kiwi' FROM DUAL UNION ALL
SELECT 4, 2, 'Orange' FROM DUAL UNION ALL
SELECT 5, 2, 'Kiwi' FROM DUAL UNION ALL
SELECT 6, 3, 'Pear' FROM DUAL UNION ALL
SELECT 7, 3, 'Berry' FROM DUAL UNION ALL
SELECT 8, 3, 'Orange' FROM DUAL UNION ALL
SELECT 9, 4, 'Apple' FROM DUAL UNION ALL
SELECT 10, 5, 'Apple' FROM DUAL UNION ALL
SELECT 11, 5, 'Apple' FROM DUAL UNION ALL
SELECT 12, 5, 'Kiwi' FROM DUAL;
Query 1:
select t1_id,
LISTAGG( value, ',' ) WITHIN GROUP ( ORDER BY value ) As "values"
from t2
where value in ('Apple','Orange','Kiwi')
group by t1_id
having count( DISTINCT value) = 3
Results:
| T1_ID | values |
|-------|-------------------|
| 1 | Apple,Kiwi,Orange |
Query 2:
You can also do it using collections:
CREATE TYPE STRINGLIST IS TABLE OF VARCHAR2(10);
/
SELECT *
FROM (
SELECT t1_id,
CAST( COLLECT( value ORDER BY value ) AS STRINGLIST ) AS "values"
FROM t2
GROUP BY t1_id
)
WHERE STRINGLIST( 'Apple', 'Kiwi', 'Orange' ) SUBMULTISET OF "values"
Results:
| T1_ID | values |
|-------|-------------------|
| 1 | Apple,Kiwi,Orange |

T-SQL ORDER BY base on MIN of a group's column

Hi take the following data as an example
id | value
----------
A | 3
A | 9
B | 7
B | 2
C | 4
C | 5
I want to list out all the data base on the min value of each id group, so that the expected output is
id | value
----------
B | 2
B | 7
A | 3
A | 9
C | 4
C | 5
i.e. min of group A is 3, group B is 2, group C is 4, so group B first and then the rest of group B in ascending order. Next group A and then group C
I tried this but thats not what I want
SELECT * FROM (
SELECT 'A' AS id, '3' AS value
UNION SELECT 'A', '9' UNION SELECT 'B', '7' UNION SELECT 'B', '2'
UNION SELECT 'C', '4' UNION SELECT 'C', '5') data
GROUP BY id, value
ORDER BY MIN(value)
Please help! Thank you

SELECT * FROM (
SELECT 'A' AS id, '3' AS value
UNION SELECT 'A', '9' UNION SELECT 'B', '7' UNION SELECT 'B', '2'
UNION SELECT 'C', '4' UNION SELECT 'C', '5') data
ORDER BY MIN(value) OVER(PARTITION BY id), id, value
OVER Clause (Transact-SQL)
Add the over() clause to your query output and you can see what it does for you.
SELECT *,
MIN(value) OVER(PARTITION BY id) OrderedBy FROM (
SELECT 'A' AS id, '3' AS value
UNION SELECT 'A', '9' UNION SELECT 'B', '7' UNION SELECT 'B', '2'
UNION SELECT 'C', '4' UNION SELECT 'C', '5') data
ORDER BY MIN(value) OVER(PARTITION BY id), id, value
Result:
id value OrderedBy
---- ----- ---------
B 2 2
B 7 2
A 3 3
A 9 3
C 4 4
C 5 4

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Removing entries with duplicates in specific columns SQL - sql

Related

Presto SQL group by COL1 and concat COL2 values

Select rows when a value appears multiple times

If a then b check in where clause

Multiple criteria on the same column

T-SQL ORDER BY base on MIN of a group's column

Categories

Resources