Return rows where specific column has duplicate values - sql

From the table below I want to show the two rows where the values in column 3 are duplicates:
ID
Col2
Col3
1
a
123
2
b
123
3
c
14
4
d
65
5
e
65
This means that the query that I need should return rows with ID 1, 2 and 4, 5.
I wrote query using having:
SELECT *
FROM t1
INNER JOIN (SELECT col3 FROM t1
GROUP BY col3
HAVING COUNT(*) > 1) a
ON t1.col3 = a.col3
This query though only returns 1 and 4 rows for example, not all duplicates.
I would appreciate the help.

Your query should work, but I would suggest window functions:
select t1.*
from (select t1.*, count(*) over (partition by col3) as cnt
from t1
) t1
where cnt > 1;

Related

Sql Query for Unique and Duplicates in oracle sql?

I need to display unique records in one column and duplicates in another column in Oracle?
COL1 COL2
1 10
1 10
2 20
3 30
3 30
unique in one set duplicate in one set
col1 col2 col1 col2
2 20 1 10
1 10
3 30
3 30
You can use the group by for both cases with the having clause:
Unique records
select *
from table as t
inner join (
select col1, col2, count(*) as times
from table
group by col1, col2
having count(*) = 1) as t2 ON t.col1 = t2.col2 and t.col2 = t2.col2
Duplicate records:
select *
from table as t
inner join (
select col1, col2, count(*) as times
from table
group by col1, col2
having count(*) > 1) as t2 ON t.col1 = t2.col1 and t.col2 = t2.col2
Would something like this do? See comments within code.
SQL> with
2 test (col1, col2) as
3 -- sample data
4 (select 1, 10 from dual union all
5 select 1, 10 from dual union all
6 select 2, 20 from dual union all
7 select 3, 30 from dual union all
8 select 3, 30 from dual
9 ),
10 uni as
11 -- unique values
12 (select col1, col2
13 from test
14 group by col1, col2
15 having count(*) = 1
16 ),
17 dup as
18 -- duplicate values
19 (select col1, col2
20 from test
21 group by col1, col2
22 having count(*) > 1
23 )
24 -- the final result
25 select u.col1 ucol1,
26 u.col2 ucol2,
27 d.col1 dcol1,
28 d.col2 dcol2
29 from uni u full outer join dup d on u.col1 = d.col1;
UCOL1 UCOL2 DCOL1 DCOL2
---------- ---------- ---------- ----------
1 10
3 30
2 20
SQL>
You can identify the duplicate values using window functions, and then filter each query. Then to get unique records:
select col1, col2
from (select t.*, count(*) over (partition by col1) as cnt
from t
) t
where cnt = 1;
To get duplicates:
select col1, col2
from (select t.*, count(*) over (partition by col1) as cnt
from t
) t
where cnt > 1;

select query to fetch rows corresponding to all values in a column

Consider this example table "Table1".
Col1 Col2
A 1
B 1
A 4
A 5
A 3
A 2
D 1
B 2
C 3
B 4
I am trying to fetch those values from Col1 which corresponds to all values (in this case, 1,2,3,4,5). Here the result of the query should return 'A' as none of the others have all values 1,2,3,4,5 in Col2.
Note that the values in Col2 are decided by other parameters in the query and they will always return some numeric values. Out of those values the query needs to fetch values from Col1 corresponding to all in Col2. The values in Col2 could be 11,12,1,2,3,4 for instance (meaning not necessarily in sequence).
I have tried the following select query:
select distinct Col1 from Table1 where Col1 in (1,2,3,4,5);
select distinct Col1 from Table1 where Col1 exists (select distinct Col2 from Table1);
and its different variations. But the problem is that I need to apply an 'and' for Col2 not an 'or'.
like Return a value from Col1 where Col2 'contains' all values between 1 and 5.
Appreciate any suggestion.
You could use analytic ROW_NUMBER() function.
SQL FIddle for a setup and working demonstration.
SELECT col1
FROM
(SELECT col1,
col2,
row_number() OVER(PARTITION BY col1 ORDER BY col2) rn
FROM your_table
WHERE col2 IN (1,2,3,4,5)
)
WHERE rn =5;
UPDATE As requested by OP, some explanation about how the query works.
The inner sub-query gives you the following resultset:
SQL> SELECT col1,
2 col2,
3 row_number() OVER(PARTITION BY col1 ORDER BY col2) rn
4 FROM t
5 WHERE col2 IN (1,2,3,4,5);
C COL2 RN
- ---------- ----------
A 1 1
A 2 2
A 3 3
A 4 4
A 5 5
B 1 1
B 2 2
B 4 3
C 3 1
D 1 1
10 rows selected.
PARTITION BY clause will group each sets of col1, and ORDER BY will sort col2 in each group set of col1. Thus the sub-query gives you the row_number for each row in an ordered way. now you know that you only need those rows where row_number is at least 5. So, in the outer query all you need ot do is WHERE rn =5 to filter the rows.
You can use listagg function, like
SELECT Col1
FROM
(select Col1,listagg(Col2,',') within group (order by Col2) Col2List from Table1
group by Col1)
WHERE Col2List = '1,2,3,4,5'
You can also use below
SELECT COL1
FROM TABLE_NAME
GROUP BY COL1
HAVING
COUNT(COL1)=5
AND
SUM(
(CASE WHEN COL2=1 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=2 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=3 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=4 THEN 1 ELSE 0
END)
+
(CASE WHEN COL2=5 THEN 1 ELSE 0
END))=5

require to form a sql query

I was working on preparing a query where I was stuck.
Consider tables below:
table1
id key col1
-- --- -----
1 1 abc
2 2 d
3 3 s
4 4 xyz
table2
id col1 foreignkey
-- ---- ----------
1 12 1
2 13 1
3 14 1
4 12 2
5 13 2
Now what I need is to select only those records from table1 for which the corresponding entries in table2 does not have say col1 value as 12.
So the challenge is after applying join even though it will skip for value 1 corresponding to col1 equal to 12 it still has another multiple rows whose values are say 13, 14 for which also they have same foreignkey. Now what I want is if there is a single row having value 12 then it should not pick that id at all from table1.
How can I form a query with this?
The output which i need is say from above table structure i want to get those records from table1 for which col1 value from table2 does not have value as 14.
so my query should return me only row 2 from table1 and not row 1.
Another way of doing that. The first two queries are just for making the sample data.
;WITH t1(id ,[key] ,col1) AS
(
SELECT 1 , 1 , 'abc' UNION ALL
SELECT 2 , 2 , 'd' UNION ALL
SELECT 3 , 3 , 's' UNION ALL
SELECT 4 , 4 , 'xyz'
)
,t2(id ,col1, foreignkey) AS
(
SELECT 1 , 12 , 1 UNION ALL
SELECT 2 , 13 , 1 UNION ALL
SELECT 3 , 14 , 1 UNION ALL
SELECT 4 ,12 , 2 UNION ALL
SELECT 5 ,13 , 2
)
SELECT id, [key], col1
FROM t1
WHERE id NOT IN (SELECT t2.Id
FROM t2
INNER JOIN t1 ON t1.Id = t2.foreignkey
WHERE t2.col1 = 14)
This is a typical case for NOT EXISTS:
SELECT id, [key], col1
FROM table1 t1
WHERE NOT EXISTS (SELECT 1
FROM table2 t2
WHERE t2.foreignkey = t1.id AND t2.col1 = 14)
The above query will not select a row from table1 if there is a single correlated row in table2 having col1 = 14.
Output:
id key col1
-------------
2 2 d
3 3 s
4 4 xyz
If you want to return records that, in addition to the criterion set above, also have correlated records in table2, then you can use the following query:
SELECT t1.id, MAX(t1.[key]) AS [key], MAX(t1.col1) AS col1
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.foreignkey
GROUP BY t1.id
HAVING COUNT(CASE WHEN t2.col1 = 14 THEN 1 END) = 0
Output:
id key col1
-------------
2 2 d
You can also achieve the same result with the second query using a combination of EXISTS and NOT EXISTS:
SELECT id, [key], col1
FROM table1 t1
WHERE EXISTS (SELECT 1
FROM table2 t2
WHERE t2.foreignkey = t1.id)
AND
NOT EXISTS (SELECT 1
FROM table2 t3
WHERE t3.foreignkey = t1.id AND t3.col1 = 14)
select t1.id,t1.key,
(select ROW_NUMBER() OVER(PARTITION BY col1 ORDER BY col1 DESC) AS Row,* into
#Temp from table1)
from table1 t1
inner join table2 t2 on t1.id=t2.foreignkey
where t2.col1=(select col1 from #temp where row>1)

Find matching column data between two rows in the same table

I want to find the matching value between two rows in the same sqlite table. For example, if I have the following table:
rowid, col1, col2, col3
----- ---- ---- ----
1 5 3 1
2 3 6 9
3 9 12 5
So comparing row 1 and 2, I get the value 3.
Row 2 and 3 will give 9.
Row 3 and 1 will give 5.
There will always be one and only one matching value between any two rows in the table.
What it the correct sqlite query for this?
I hardcoded the values for the rows because i do not know how to declare variables in sqllite.
select t1.rowid as r1, t2.rowid as r2, t2.col as matchvalue from <yourtable> t1 join
(
select rowid, col1 col from <yourtable> where rowid = 3 union all
select rowid, col2 from <yourtable> where rowid = 3 union all
select rowid, col3 from <yourtable> where rowid = 3
) t2
on t2.col in (t1.col1, t1.col2, t1.col3)
and t1.rowid < t2.rowid -- you don't need this if you have two specific rows
and t1.rowid = 1
select col from
(
select rid, c1 as col from yourtable
union
select rid, c2 from yourtable
union
select rid, c3 from yourtable
) v
where rid in (3,2)
group by col
order by COUNT(*) desc
limit 1

Tricky delete. How do i?

I got a table with 2 columns (both INT), and there are 400 000 records (a lot).
The first column is random numbers ordered ASC. The second column has a rule on it (which is not important right now)
In the table there are 1000 records, that are exceptions. So, instead of the "rule", there is only "-1" - valued cells.
How can I delete ~399 000 records, so i want to have in my table left only the ones with -1 and their "neighbors" (the records before and after the ones with -1)
UPDATE
sql server 2k5
first column values - yes unique, but not ID-s (it's not ++ :D)
example:
before:
20022518 13
20022882 364
20022885 -1
20022887 5
20022905 18
20023200 295
20023412 212
20023696 284
20024112 416
20025015 903
20025400 385
20025401 -1
20025683 283
20025981 298
20025989 8
20026752 763
20027779 1027
20028344 565
20028350 6
20028896 546
20028921 25
20028924 -1
20028998 77
20029031 33
20029051 20
20029492 441
20029530 38
20029890 360
after:
20022882 364
20022885 -1
20022887 5
20025400 385
20025401 -1
20025683 283
20028921 25
20028924 -1
20028998 77
If I understand correctly you want to keep all records with col2 = -1 and the records with the closest col1 to the records with -1. Assuming no duplicates in col1 I would do something like this
delete from table where not col1 in
(
(select col1 from table where col2 = -1)
union
(select (select max(t2.col1) from table t2 where t2.col1 < t1.col1) from table t1 where t1.col2 = -1)
union
(select (select min(t4.col1) from table t4 where t4.col1 > t3.col1) from table t3 where t3.col2 = -1)
)
Edit:
t4.col1 < t3.col1 should be t4.col1 > t3.col1
I created a test-table with col1 and col2, both int, col1 is PK, but not autonumber
SELECT * from adjacent
Gives
col1 col2
1 5
3 4
4 2
7 -1
11 8
12 2
With the above subselects:
SELECT * from adjacent
where
col1 in
(
(select col1 from adjacent where col2 = -1)
union
(select (select max(t2.col1) from adjacent t2 where t2.col1 < t1.col1) from adjacent t1 where t1.col2 = -1)
union
(select (select min(t4.col1) from adjacent t4 where t4.col1 > t3.col1) from adjacent t3 where t3.col2 = -1)
)
gives
col1 col2
4 2
7 -1
11 8
With the not also
SELECT * from adjacent
where
col1 not in
(
(select col1 from adjacent where col2 = -1)
union
(select (select max(t2.col1) from adjacent t2 where t2.col1 < t1.col1) from adjacent t1 where t1.col2 = -1)
union
(select (select min(t4.col1) from adjacent t4 where t4.col1 > t3.col1) from adjacent t3 where t3.col2 = -1)
)
gives
col1 col2
1 5
3 4
12 2
Finally a delete and select
delete from adjacent
where
col1 not in
(
(select col1 from adjacent where col2 = -1)
union
(select (select max(t2.col1) from adjacent t2 where t2.col1 < t1.col1) from adjacent t1 where t1.col2 = -1)
union
(select (select min(t4.col1) from adjacent t4 where t4.col1 > t3.col1) from adjacent t3 where t3.col2 = -1)
)
select * from adjacent
gives
col1 col2
4 2
7 -1
11 8
Assuming SQL Server here. Your best bet, if you are keeping a very small dataset, is to insert into a new table. I.E.:
SELECT *
INTO MyTable2
FROM MyTable
WHERE ColumnB = -1
DROP TABLE MyTable
exec sp_rename MyTable2 MyTable
This will be a minimally logged operation and will run in a fraction of the time of a DELETE.
Without another key there is no way to ensure you get the "neighbors" since this is not really a valid concept in a relational DB. If the first column is "random" you can't tell which ones are "before" and "after" a row with a -1 value.
If by "random" you mean it's like an IDENTITY column that increases automatically, AND YOU HAVE NO MISSING VALUES IN THE SEQUENCE you can do something like:
SELECT *
INTO MyTable2
FROM MyTable mt
WHERE ColumnB = -1
OR WHERE EXISTS (
SELECT * FROM MyTable mt2
WHERE mt2.id = mt.id + 1
OR mt2.id = mt.id -1)
DROP TABLE MyTable
exec sp_rename MyTable2 MyTable
The solution is to number the records first, identify those adjacent to the -1 rules and then use UNION to assemble the final result:
WITH Numbered(seq, id, ruleno) AS (
SELECT
ROW_NUMBER() OVER (ORDER BY id), id, ruleno
FROM
Tricky
),
Brothers(id, ruleno) AS (
SELECT
b.id, b.ruleno
FROM
Numbered a INNER JOIN Numbered b
ON a.ruleno = -1 AND
abs(a.seq - b.seq) = 1
),
Triplets(id, ruleno) AS (
SELECT
id, ruleno
FROM
Tricky
WHERE
ruleno = -1
UNION ALL
SELECT
id, ruleno
FROM
Brothers
)
-- Display results
SELECT
id, ruleno
FROM
Triplets
ORDER BY
id
Result:
id ruleno
20022882 364
20022885 -1
20022887 5
20025400 385
20025401 -1
20025683 283
20028921 25
20028924 -1
20028998 77
Finally:
DELETE FROM
Tricky
WHERE
id NOT IN (
SELECT
id
FROM
triplets
)
USe this tricky query:
for this I created a table by below statement:
create table t1 (val int, val2 int)
GO
-- below is the exact stmt:
With CTE as(select val, val2, row_number() over (order by val ASC) as rnum
from t1)
DELETE t1
From t1 inner join cte a
ON t1.val = a.val INNER JOIN (SELECT * fROM cte where val2 = -1) as b
on a.rnum = b.rnum
or a.rnum = b.rnum - 1
or a.rnum = b.rnum + 1
For more information baout CTE please see this post:
http://blog.sqlauthority.com/2009/08/08/sql-server-multiple-cte-in-one-select-statement-query/