Output all duplicate rows (SQL Server) - sql

I have a table which holds what I consider duplicate rows. the values in these records may not be exactly the same, but it’s been calculated that they’re possible duplicates by fuzzy logic. For example:
RecordCD key_in key_out
---------------------------
1 1 2
2 2 2
3 3 3
4 4 6
5 5 5
6 6 6
7 7 7
8 8 11
9 9 9
10 10 10
11 11 11
key_in column has a unique ID of the record.
key_out column has a possible duplicate if it’s not equal to key_in
I need my output to look like this and list all of the possible duplicates:
RecordCD key_in key_out
---------------------------
1 1 2
2 2 2
4 4 6
6 6 6
8 8 11
11 11 11
but I’m struggling to construct a query that would do that.
Thanks.

I think this is what you want:
select t.*
from t
where exists (select 1
from t t2
where t2.key_out = t.key_out and t2.key_in <> t.key_in
)
order by t.key_out;
Here is a db<>fiddle.

It seems like if there is a mismatch between key_in, key_out you want to pull all rows where key_in has either value`
I would create a temp table with all values in rows with mismatched key_in, key_out, call this value bad_match
If either of your key_in, key_out values match this value, include it in output
select mytable.* from mytable
where key_in in
(select key_in bad_match from mytable where key_in <> key_out
union all
select key_out from mytable where key_in <> key_out);
This sample builds your schema and returns the desired output

Related

SQL How to SUM rows in second column if first column contain

View of a table
ID
kWh
1
3
1
10
1
8
1
11
2
12
2
4
2
7
2
8
3
3
3
4
3
5
I want to recive
ID
kWh
1
32
2
31
3
12
The table itself is more complex and larger. But the point is this. How can this be done? And I can't know in advance the ID numbers of the first column.
SELECT T.ID,SUM(T.KWH)SUM_KWH
FROM YOUR_TABLE T
GROUP BY T.ID
Do you need this one?
Let's assume your database name is 'testdb' and table name is 'table1'.
SELECT * FROM testdb.table1;
SELECT id, SUM(kwh) AS "kwh2"
FROM stack.table1
WHERE id = 1
keep running the query will all (ids). you will get output.
By following this query you will get desired output.
Hope this helps.

distinct value row from the table in SQL

There is a table with values as below,
Id Value
1 1
2 1
3 2
4 2
5 3
6 4
7 4
now need to write a query to retrieve value from the table and output should look as
ID Value
1 1
3 2
5 3
6 4
any suggestion ?
The query you want is nothing to do with being distinct, it's a simple aggregation of value with the minimum ID for each:
select Min(id) Id, value
from table
group by value

Combining values from one column by the key value from another column

I need to combine all values by one column depends on the key from another column. Can someone help me to get out of this problem please?
here is the short example of my problem.
CUST_ID CUST_REL_ID
100 1
100 2
100 3
100 4
200 5
200 6
200 7
CUST_ID CUST_REL_ID
1 1
1 2
1 3
1 4
2 1
2 2
2 3
2 4
...
5 5
5 6
5 7
I think you just want a self-join:
select t1.cust_rel_id, t2.cust_rel_id
from t t1 join
t t2
on t1.cust_id = t2.cust_id
order by t1.cust_rel_id, t2.cust_rel_id;
I don't understand your naming conventions. The column called cust_id in the result set looks nothing like the column called cust_id in the source data. But this appears to be what you want to do.

Querying duplicates table into related sets

We have a process that creates a table of duplicate records based on some arbitrary rules (details not relevant).
Every record gets checked against all other records and if a suspected duplicate is found both it and the duplicate are stored in a dupes table to be manually reviewed.
This results in a table something like this:
dupId, originalId, duplicateId
1 1 2
2 1 3
3 1 4
4 2 3
5 2 4
6 3 4
7 5 6
8 5 7
9 6 7
10 8 9
You can see here record #1 has 3 other records it is similar to (#2,#3 and #4) and they are each similar to each other.
Record #5 has 2 duplicates (#6 and #7) and record #8 has only 1 (#9).
I want to query the duplicates into sets, so my results would look something like this:
setId recordId
1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
But I am too old/slow/tired/rubbish and a bit out of my depth here.
Currently, when checking for duplicates if the record pairing is already in the table we don't insert it twice (i.e. you don't see both sides of the duplicate pairing) but can easily do so if it makes the querying simpler.
Any advice much appreciated!
Duplicates seems to be transitive, so you have all pairs. That is, the "original" id has the information you need.
But it is not included in the duplicates and you want that. So:
select dense_rank() over (order by originalid) as setid, duplicateid
from ((select originalid, duplicateid
from t
where not exists (select 1 from t t2 where t.originalid = t2.duplicateid)
) union all
(select distinct originalid, originalid
from t
where not exists (select 1 from t t2 where t.originalid = t2.duplicateid)
)
) i
order by setid;

Remove duplicate two way linked rows SQL

I have two columns like the following and I want to delete the duplicates.
Column1 Column2
1 10
2 9
3 8
4 7
5 6
6 5
7 4
8 3
9 2
10 1
I want to delete half of these entries so that there are only 5 rows like this:
Column1 Column2
1 10
2 9
3 8
4 7
5 6
Any ideas? I know how I could do it in C# and remove if there are duplicates then delete but I want to do it in SQL. The values represent ID's and a relationship between the ID's. Order doesnt matter in the relationship so 1-10 is the same as 10-1. So in that way there are duplicate relationships.
One way would be as follows:
DELETE t FROM MyTable t
WHERE t.Column1 > t.Column2 AND EXISTS (
SELECT * FROM MyTable tt
WHERE t.Column1=tt.Column2 AND t.Column2=tt.Column1
)
The t.Column1 > t.Column2 says that if there is a pair of matching rows, delete the one where Column1 is greater than Column2.
Demo.