Fetch duplicate rows from a table - sql

I have the following table
Col1 | Col2
2 | jim
2 | jam
3 | raw
3 | cooked
3 | boiled
5 | none
6 | yum
So in this table I want to fetch records which have multiple value in col1 like:
Col1 | Col2
2 | jim
2 | jam
3 | raw
3 | cooked
3 | boiled

This should work for you, an alternative to using EXISTS:
select t.*
from <table> t
cross apply (select 1 ex
from <table> t2
where t2.Col1=t.Col1
group by t2.Col1
having count(t2.Col1) > 1) tmp

Use exists:
select col1, col2
from t
where exists (select 1
from t t2
where t2.col1 = t.col1 and t2.col2 <> t.col2
);

Use this query
select *
from t
where col1 in
( select col1
from t
group by col1
having count(*) > 1
)

Related

SQL - Multiple duplicates in some columns, 1 column has to be unique

I'm quite novice with SQL and would really appreciate some help on this one for a project I'm working on. I'm using SQLite, not sure if that makes a difference!
I need to write a query that outputs a row if 3 columns are the same but 1 column is different.
Column 2, 3, and 4 combined must have a duplicate in another row,
but
Column 1, 2, 3, and 4 combined must not have any duplicates in any other rows.
An example database:
ROW 1 : 12345 | Test1 | Something1 | And1 (I don't want this, it's a full row duplicate with row 2)
ROW 2 : 12345 | Test1 | Something1 | And1 (I don't want this, it's a full row duplicate with row 1)
ROW 3 : 12344 | Test1 | Something1 | And3 (I don't want this, it's not a full row duplicate but col 2, 3 and 4 combined doesn't exist anywhere else in the table)
ROW 4 : 12222 | Test2 | Something1 | And2 (I want this! It's not a full row duplicate and columns 2, 3 and 4 combined exists in row 9)
ROW 5 : 12222 | Test3 | Something1 | And3
ROW 6 : 12222 | Test3 | Something1 | And3
ROW 7 : 12224 | Test3 | Something1 | And3
ROW 8 : 12222 | Test3 | Something2 | And3
ROW 9 : 12000 | Test2 | Something1 | And2
The output I'd want for this is:
12222 | Test2 | Something1 | And2
12224 | Test3 | Something1 | And3
12000 | Test2 | Something1 | And2
I hope this makes sense to someone. Thanks in advance for any help.
I think you want not exists:
select t.*
from t
where exists (select 1
from t t2
where t2.col2 = t.col2 and t2.col3 = t.col3 and t2.col4 = t.col4 and
t2.col1 <> t.col1
);
We can join to a subquery which identifies duplicate groups, and restrict using it:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT col2, col3, col4
FROM yourTable
GROUP BY col2, col3, col4
HAVING COUNT(*) = COUNT(DISTINCT col1) AND COUNT(*) > 1
) t2
ON t1.col2 = t2.col2 AND t1.col3 = t2.col3 AND t1.col4 = t2.col4;
Demo
Try this:
select
col1,
col2,
col3,
col4
from (
SELECT
*,
LEAD(valid, 1, 1) OVER (PARTITION BY col2, col3, col4 ORDER BY col1) as valid_next,
LEAD(invalid, 1, 1) OVER (PARTITION BY col2, col3, col4 ORDER BY col1) as invalid_next
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY col2, col3, col4 ORDER BY col1) AS valid,
ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY col1) AS invalid
FROM tb1
) x ) y
where valid <> valid_next and invalid = invalid_next
ORDER BY col1;
The logic here is to create two columns (valid and invalid) to count the occurrence of 1) 3 duplicated columns, and 2) 4 duplicated columns. Then use lag to track the change. If there is a change then we know there are duplicates, otherwise the row would be unique to the columns partitioned.
Output table:
+--------+--------+-------------+------+
| col1 | col2 | col3 | col4 |
+--------+--------+-------------+------+
| 12000 | Test2 | Something1 | And2 |
| 12222 | Test2 | Something1 | And2 |
| 12224 | Test3 | Something1 | And3 |
+--------+--------+-------------+------+

SQL help on count in certain ID

Do you know how to display only the lines in table for same ID where col3 is not 'X'?
e.g., in the following table, it should display only ID 2 (as all the col2 are null)
ID | col1 | col2 | col3
---+------+------+-----
1 | 0 | 0 | X
1 | D | C | null
1 | D | C | null
2 | 0 | 0 | null
2 | D | C | null
2 | D | C | null
It should work for all ID with some many line by ID and only the same ID with all line having null.
If you are looking to get records where ID does not have at least one X in col 3 for other records:
SELECT Y.*
FROM Your_Table Y
WHERE Y.ID NOT IN (SELECT X.ID FROM YOUR_TABLE X WHERE X.ID=Y.ID AND X.COL3='X')
Most DBMS support 3 valued logic - True, False, and Undefined. NULL <> 3 is undefined, since NULL is an unknown value. You need to handle NULLs explicitly.
SELECT *
FROM Your_Table
WHERE col3 <> X
OR col3 IS NULL;
select * from table
where (col1 = col2) and (col3 <> 'X')
Use window functions or not exists:
select t.*
from t
where not exists (select 1 from t t2 where t2.id = t.id and t2.col3 = 'X');

How to add a column to a row in a select

Say I have this table
| Col |
-------
| ABC |
| DEF |
What query should I write to obtain this result (not literally this result, but a general way to do that)?
| Col | Col2 |
--------------
| ABC | 0 |
| ABC | 1 |
| DEF | 0 |
| DEF | 1 |
Unless I'm missing something, this should give you the results you're looking for:
Select Col, Col2
From YourTable
Cross Join (Select 0 As Col2 Union Select 1 As Col2) X
Order By Col, Col2
I would guess that you want to pair two columns, for each combination. Your question is vague and not specific to a problem. That's my assumption.
I guess this query could do:
Select Table1.Col1, Table2.Col2 from Table1 LEFT JOIN Table2 on 1=1
This way, you pair up every row from table1 with every row from table2.
Edit, without table2:
Select Table1.Col1, Constructed.Col1 from Table1 LEFT JOIN
(Select 1 as Col1 UNION Select 2 as Col1 UNION
Select 7 as Col1 UNION Select 14 as Col1) Constructed on 1=1
Can you test query, is this what you want?
select * from
(select col1, 0 b from table) table1
union all (select col1, 1 b from table) order by 1;

Update unique rows in SQL

I have a table
id | col1 | col3| col4
1 | x | r |
2 | y | m |
3 | z | p |
4 | x | r |
i have to update all unique rows of this table
i.e
id | col1 | col3| col4
1 | x | r | 1
2 | y | m | 1
3 | z | p | 1
4 | x | r | 0
i can fetch unique rows by
select distinct col1,col2 from table
.But how can i identify these rows in order to update them.Please help.
You can use the group by to pick unique result:
SELECT MIN(ID) AS ID FROM TABLE GROUP BY COL1, COL3;
id | col1 | col3
1 | x | r
2 | y | m
3 | z | p
Then
UPDATE TABLE SET col4 = 1 WHERE ID IN (SELECT MIN(ID) FROM TABLE GROUP BY COL1, COL3);
Restriction is that the id column should be unique.
If it is a small enough table, here is what you can do
Step 1: Update everything to 1
Update Table Set Col4 = 1
Step 2: Update all dups to 0 (OTTOMH)
Update Table
Set Col4 = 0
From
(
Select Col1, Min (Id) FirstId
From Table
Group By Col1
Having Count (*) > 1
) Duplicates
Where Table.Col1 = Duplicates.Col1
And Table.Id <> Duplicates.FirstId
You can also try:
UPDATE test
SET col4 = 1
WHERE id IN
(
SELECT t1.id
FROM table_name t1
LEFT JOIN table_name t2
ON t2.id < t1.id
AND t2.col1 = t1.col1
AND t2.col3 = t1.col3
WHERE t2.id IS NULL
)
One more slightly convoluted option, to set both 0 and 1 values in one hit:
update my_table mt
set col4 = (
select case when rn = 1 then 1 else 0 end
from (
select id,
row_number() over (partition by col1, col3 order by id) as rn
from my_table) tt
where tt.id = mt.id);
4 rows updated.
select * from my_table order by id;
ID COL1 COL3 COL4
---------- ---- ---- ----------
1 x r 1
2 y m 1
3 z p 1
4 x r 0
This is just using row_number() to decide which of the unique combinations is first, arbitrarily using the lowest id, assigning that the value of one, and everything else zero.

select distinct col1 with min(col2) and max(col3) from table

My table looks like this with duplicates in col1
col1, col2, col3, col4
1, 1, 0, a
1, 2, 1, a
1, 3, 1, a
2, 4, 1, b
3, 5, 0, c
I want to select distinct col1 with max (col3) and min(col2);
so result set will be:
col1, col2, col3, col4
1, 2, 1, a
2, 4, 1, b
3, 5, 0, c
I have a solution but looking for best ideas?
SELECT col1, MAX(col3) AS col3, MIN(col2) AS col2, MAX(col4) AS col4
FROM MyTable
GROUP BY col1;
You showed in your example that you wanted a col4 included, but you didn't say which value you want. You have to put that column either in an aggregate function or in the GROUP BY clause. I assumed that taking the max for the group would be acceptable.
update: Thanks for the clarification. You're asking about a variation of the greatest-n-per-group problem that comes up frequently on Stack Overflow. Here's my usual solution:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t3
ON t1.col1 = t3.col1 AND t1.col3 < t3.col3
WHERE t3.col1 IS NULL;
In English: show me the row (t1) for which no row exists with the same col1 and a greater value in col3. Some people write this using a NOT EXISTS subquery predicate, but I prefer the JOIN syntax.
Here's the output from my test given your example data:
+------+------+------+------+
| col1 | col2 | col3 | col4 |
+------+------+------+------+
| 1 | 2 | 1 | a |
| 1 | 3 | 1 | a |
| 2 | 4 | 1 | b |
| 3 | 5 | 0 | c |
+------+------+------+------+
Notice that there are two rows for col1 value 1, because both rows satisfy the join condition; no other row exists with a greater value in col3.
So we need to add another condition to resolve the tie. You want to compare to rows with a lesser value in col2 and if no such rows exist, then we've found the row with the least value in col2.
SELECT t1.*
FROM MyTable t1
LEFT OUTER JOIN MyTable t3
ON t1.col1 = t3.col1 AND t1.col3 < t3.col3
LEFT OUTER JOIN MyTable t2
ON t1.col1 = t2.col1 AND t1.col3 = t2.col3 AND t1.col2 > t2.col2
WHERE t2.col1 IS NULL AND t3.col1 IS NULL;
Here's the output from my test given your example data:
+------+------+------+------+
| col1 | col2 | col3 | col4 |
+------+------+------+------+
| 1 | 2 | 1 | a |
| 2 | 4 | 1 | b |
| 3 | 5 | 0 | c |
+------+------+------+------+
PS: By the way, it's customary on Stack Overflow to edit your original question and add detail, instead of adding answers to your own question that only clarify the question. But I know some actions aren't available to you until you have more than 1 reputation point.