SQL - Multiple duplicates in some columns, 1 column has to be unique - sql

I'm quite novice with SQL and would really appreciate some help on this one for a project I'm working on. I'm using SQLite, not sure if that makes a difference!
I need to write a query that outputs a row if 3 columns are the same but 1 column is different.
Column 2, 3, and 4 combined must have a duplicate in another row,
but
Column 1, 2, 3, and 4 combined must not have any duplicates in any other rows.
An example database:
ROW 1 : 12345 | Test1 | Something1 | And1 (I don't want this, it's a full row duplicate with row 2)
ROW 2 : 12345 | Test1 | Something1 | And1 (I don't want this, it's a full row duplicate with row 1)
ROW 3 : 12344 | Test1 | Something1 | And3 (I don't want this, it's not a full row duplicate but col 2, 3 and 4 combined doesn't exist anywhere else in the table)
ROW 4 : 12222 | Test2 | Something1 | And2 (I want this! It's not a full row duplicate and columns 2, 3 and 4 combined exists in row 9)
ROW 5 : 12222 | Test3 | Something1 | And3
ROW 6 : 12222 | Test3 | Something1 | And3
ROW 7 : 12224 | Test3 | Something1 | And3
ROW 8 : 12222 | Test3 | Something2 | And3
ROW 9 : 12000 | Test2 | Something1 | And2
The output I'd want for this is:
12222 | Test2 | Something1 | And2
12224 | Test3 | Something1 | And3
12000 | Test2 | Something1 | And2
I hope this makes sense to someone. Thanks in advance for any help.

I think you want not exists:
select t.*
from t
where exists (select 1
from t t2
where t2.col2 = t.col2 and t2.col3 = t.col3 and t2.col4 = t.col4 and
t2.col1 <> t.col1
);

We can join to a subquery which identifies duplicate groups, and restrict using it:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT col2, col3, col4
FROM yourTable
GROUP BY col2, col3, col4
HAVING COUNT(*) = COUNT(DISTINCT col1) AND COUNT(*) > 1
) t2
ON t1.col2 = t2.col2 AND t1.col3 = t2.col3 AND t1.col4 = t2.col4;
Demo

Try this:
select
col1,
col2,
col3,
col4
from (
SELECT
*,
LEAD(valid, 1, 1) OVER (PARTITION BY col2, col3, col4 ORDER BY col1) as valid_next,
LEAD(invalid, 1, 1) OVER (PARTITION BY col2, col3, col4 ORDER BY col1) as invalid_next
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY col2, col3, col4 ORDER BY col1) AS valid,
ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY col1) AS invalid
FROM tb1
) x ) y
where valid <> valid_next and invalid = invalid_next
ORDER BY col1;
The logic here is to create two columns (valid and invalid) to count the occurrence of 1) 3 duplicated columns, and 2) 4 duplicated columns. Then use lag to track the change. If there is a change then we know there are duplicates, otherwise the row would be unique to the columns partitioned.
Output table:
+--------+--------+-------------+------+
| col1 | col2 | col3 | col4 |
+--------+--------+-------------+------+
| 12000 | Test2 | Something1 | And2 |
| 12222 | Test2 | Something1 | And2 |
| 12224 | Test3 | Something1 | And3 |
+--------+--------+-------------+------+

Related

Fetch duplicate rows from a table

I have the following table
Col1 | Col2
2 | jim
2 | jam
3 | raw
3 | cooked
3 | boiled
5 | none
6 | yum
So in this table I want to fetch records which have multiple value in col1 like:
Col1 | Col2
2 | jim
2 | jam
3 | raw
3 | cooked
3 | boiled
This should work for you, an alternative to using EXISTS:
select t.*
from <table> t
cross apply (select 1 ex
from <table> t2
where t2.Col1=t.Col1
group by t2.Col1
having count(t2.Col1) > 1) tmp
Use exists:
select col1, col2
from t
where exists (select 1
from t t2
where t2.col1 = t.col1 and t2.col2 <> t.col2
);
Use this query
select *
from t
where col1 in
( select col1
from t
group by col1
having count(*) > 1
)

PSQL select either a or b in order but not both a and b if both a and b exists

I have a PSQL table
+--------+------+------+------+
| Col1 | Col2 | Col3 | Col4 |
+--------+------+------+------+
| 001 | 00A | 00B | 001 |
| 001001 | 00A | 00B | 001 |
| 002 | 00X | 00Y | 002 |
| 002002 | 00X | 00Y | 002 |
+--------+------+------+------+
I have the following PSQL query:
select *
from my_table
where (Col1 = '001' or Col4 = '001')
and Col2 = '00A'
order by Col3 asc;
I get the first two rows.
Here what happens is that it matches both conditions for OR condition. I need to match only one of the or conditions. That is if first condition (Col1='001001') is true then do not evaluate the next condition.
I need to select only the 2nd row (| 001001 | 00A | 00B | 001 |)
I have build another query using EXCEPT
select *
from my_table
where (Col1 = '001' or Col4 = '001')
and Col2 = '00A'
except (select *
from my_table
where Col1 != '001'
and Col2 = '00A')
order by Col3 asc
limit 1;
I would like to know if there is any other elegant queries for this job?
Your explanation is confusing as you say col1 = '001001' in one place and use 001 in the query. But I presume you want to use a hierarchy of comparison and return the one with the highest per each group ( col2,col3,col4) . Use DISTINCT ON. Change the condition in whichever way you like to return the appropriate row.
SELECT DISTINCT ON (col2, col3, col4) *
FROM my_table WHERE col2 = '00A'
ORDER BY col2,
col3,
col4,
CASE
WHEN col1 = '001001' THEN 1
WHEN col4 = '001' THEN 2
END;
DEMO
Does this give you what you want?
select *
from my_table
where (Col1 = '001' and Col2 != '00A')
or ((Col1 is null or Col1 = '') and Col4 = '001' and Col2 = '00A')
order by Col3 asc;

Efficient way to remove rows with empty column if more than one row already exists

I'm developing an app that utilizes SQLite database.
Considering the following data set:
1 | TEST1 | <NULL>
2 | TEST1 | TEST1
3 | TEST1 | <NULL>
4 | TEST1 | TEST123
...
I want to remove rows with NULL's if there's at minimum one row with TEST1.
In the above example, the wanted result is:
2 | TEST1 | TEST1
4 | TEST1 | TEST123
...
And for the following example:
1 | TEST1 | <NULL>
2 | TEST1 | TEST123
...
The wanted result is the same example as above.
As I see it, I have few options:
Avoid INSERT statements of rows, if such condition like the first example exists.
DELETE rows after they already been inserted.
Can you please advise on how to achieve each of those options?
Thanks!
Something like this?
select t.*
from t
where t.col3 is not null
union all
select t
from t
where t.col3 is null and
not exists (select 1 from t t2 where t2.col2 = t.col2 and t2.col3 is not null);
That is, select all non-null values. Then select null values where there is no other value.
If you are deleting rows, then:
delete from t
where t.col3 is null and
exists (select 1 from t t2 where t2.col2 = t.col2 and t2.col3 is not null);

How to add a column to a row in a select

Say I have this table
| Col |
-------
| ABC |
| DEF |
What query should I write to obtain this result (not literally this result, but a general way to do that)?
| Col | Col2 |
--------------
| ABC | 0 |
| ABC | 1 |
| DEF | 0 |
| DEF | 1 |
Unless I'm missing something, this should give you the results you're looking for:
Select Col, Col2
From YourTable
Cross Join (Select 0 As Col2 Union Select 1 As Col2) X
Order By Col, Col2
I would guess that you want to pair two columns, for each combination. Your question is vague and not specific to a problem. That's my assumption.
I guess this query could do:
Select Table1.Col1, Table2.Col2 from Table1 LEFT JOIN Table2 on 1=1
This way, you pair up every row from table1 with every row from table2.
Edit, without table2:
Select Table1.Col1, Constructed.Col1 from Table1 LEFT JOIN
(Select 1 as Col1 UNION Select 2 as Col1 UNION
Select 7 as Col1 UNION Select 14 as Col1) Constructed on 1=1
Can you test query, is this what you want?
select * from
(select col1, 0 b from table) table1
union all (select col1, 1 b from table) order by 1;

select distinct col1 with min(col2) and max(col3) from table

My table looks like this with duplicates in col1
col1, col2, col3, col4
1, 1, 0, a
1, 2, 1, a
1, 3, 1, a
2, 4, 1, b
3, 5, 0, c
I want to select distinct col1 with max (col3) and min(col2);
so result set will be:
col1, col2, col3, col4
1, 2, 1, a
2, 4, 1, b
3, 5, 0, c
I have a solution but looking for best ideas?
SELECT col1, MAX(col3) AS col3, MIN(col2) AS col2, MAX(col4) AS col4
FROM MyTable
GROUP BY col1;
You showed in your example that you wanted a col4 included, but you didn't say which value you want. You have to put that column either in an aggregate function or in the GROUP BY clause. I assumed that taking the max for the group would be acceptable.
update: Thanks for the clarification. You're asking about a variation of the greatest-n-per-group problem that comes up frequently on Stack Overflow. Here's my usual solution:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t3
ON t1.col1 = t3.col1 AND t1.col3 < t3.col3
WHERE t3.col1 IS NULL;
In English: show me the row (t1) for which no row exists with the same col1 and a greater value in col3. Some people write this using a NOT EXISTS subquery predicate, but I prefer the JOIN syntax.
Here's the output from my test given your example data:
+------+------+------+------+
| col1 | col2 | col3 | col4 |
+------+------+------+------+
| 1 | 2 | 1 | a |
| 1 | 3 | 1 | a |
| 2 | 4 | 1 | b |
| 3 | 5 | 0 | c |
+------+------+------+------+
Notice that there are two rows for col1 value 1, because both rows satisfy the join condition; no other row exists with a greater value in col3.
So we need to add another condition to resolve the tie. You want to compare to rows with a lesser value in col2 and if no such rows exist, then we've found the row with the least value in col2.
SELECT t1.*
FROM MyTable t1
LEFT OUTER JOIN MyTable t3
ON t1.col1 = t3.col1 AND t1.col3 < t3.col3
LEFT OUTER JOIN MyTable t2
ON t1.col1 = t2.col1 AND t1.col3 = t2.col3 AND t1.col2 > t2.col2
WHERE t2.col1 IS NULL AND t3.col1 IS NULL;
Here's the output from my test given your example data:
+------+------+------+------+
| col1 | col2 | col3 | col4 |
+------+------+------+------+
| 1 | 2 | 1 | a |
| 2 | 4 | 1 | b |
| 3 | 5 | 0 | c |
+------+------+------+------+
PS: By the way, it's customary on Stack Overflow to edit your original question and add detail, instead of adding answers to your own question that only clarify the question. But I know some actions aren't available to you until you have more than 1 reputation point.