Check duplicates in sql table and replace the duplicates ID in another table - sql

I have a table with duplicate entries (I forgot to make NAME column unique)
So I now have this Duplicate entry table called 'table 1'
ID NAME
1 John F Smith
2 Sam G Davies
3 Tom W Mack
4 Bob W E Jone
5 Tom W Mack
IE ID 3 and 5 are duplicates
Table 2
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 5 item34
NAMEID are ID from table 1. Table 2 ID 4 and 5 I want to have NAMEID of 3 (Tom W Mack's Orders) like so
Table 2 (correct version)
ID NAMEID ORDERS
1 2 item4
2 1 item5
3 4 item6
4 3 item23
5 3 item34
Is there an easy way to find and update the duplicates NAMEID in table 2 then remove the duplicates from table 1

In this case what you can do is.
You can find how many duplicate records you have.
In Order to find duplicate records you can use.
SELECT ID, NAME,COUNT(1) as CNT FROM TABLE1 GROUP BY ID, NAME
This is will give you the count and you find all the duplicate records
and delete them manually.
Don't forget to alter your table after removing all the duplicate records.

Here's how you can do it:
-- set up the environment
create table #t (ID int, NAME varchar(50))
insert #t values
(1, 'John F Smith'),
(2, 'Sam G Davies'),
(3, 'Tom W Mack'),
(4, 'Bob W E Jone'),
(5, 'Tom W Mack')
create table #t2 (ID int, NAMEID int, ORDERS varchar(10))
insert #t2 values
(1, 2, 'item4'),
(2, 1, 'item5'),
(3, 4, 'item6'),
(4, 3, 'item23'),
(5, 5, 'item34')
go
-- update the referencing table first
;with x as (
select id,
first_value(id) over(partition by name order by id) replace_with
from #t
),
y as (
select #t2.nameid, x.replace_with
FROM #t2
join x on #t2.nameid = x.id
where #t2.nameid <> x.replace_with
)
update y set nameid = replace_with
-- delete duplicates from referenced table
;with x as (
select *, row_number() over(partition by name order by id) rn
from #t
)
delete x where rn > 1
select * from #t
select * from #t2
Pls, test first for performance and validity.

Let's use the example data
INSERT INTO TableA
(`ID`, `NAME`)
VALUES
(1, 'NameA'),
(2, 'NameB'),
(3, 'NameA'),
(4, 'NameC'),
(5, 'NameB'),
(6, 'NameD')
and
INSERT INTO TableB
(`ID`, `NAMEID`, `ORDERS`)
VALUES
(1, 2, 'itemB1'),
(2, 1, 'itemA1'),
(3, 4, 'itemC1'),
(4, 3, 'itemA2'),
(5, 5, 'itemB2'),
(5, 6, 'itemD1')
(makes it a bit easier to spot the duplicates and check the result)
Let's start with a simple query to get the smallest ID for a given NAME
SELECT
NAME, min(ID)
FROM
tableA
GROUP BY
NAME
And the result is [NameA,1], [NameB,2], [NameC,4], [NameD,6]
Now if you use that as an uncorrelated subquery for a JOIN with the base table like
SELECT
keep.kid, dup.id
FROM
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
It finds all duplicates that have the same name as in the result of the subquery but a different id + it also gives you the id of the "original", i.e. the smallest id for that name.
For the example it's [1,3], [2,5]
Now you can use that in an UPDATE query like
UPDATE
TableB as b
JOIN
tableA as dup
JOIN
(
SELECT
NAME, min(ID) as kid
FROM
tableA
GROUP BY
NAME
) as keep
ON
keep.NAME=dup.NAME
AND keep.kid<dup.id
SET
b.NAMEID=keep.kid
WHERE
b.NAMEID=dup.id
And the result is
ID,NAMEID,ORDERS
1, 2, itemB1
2, 1, itemA1
3, 4, itemC1
4, 1, itemA2 <- now has NAMEID=1
5, 2, itemB2 <- now has NAMEID=2
5, 6, itemD1
To eleminate the duplicates from tableA you can use the first query again.

Related

1 distinct row having max value

This is the data I have
I need Unique ID(1 row) with max(Price). So, the output would be:
I have tried the following
select * from table a
join (select b.id,max(b.price) from table b
group by b.id) c on c.id=a.id;
gives the Question as output, because there is no key. I did try the other where condition as well, which gives the original table as output.
You could try something like this in SQL Server:
Table
create table ex1 (
id int,
item char(1),
price int,
qty int,
usr char(2)
);
Data
insert into ex1 values
(1, 'a', 7, 1, 'ab'),
(1, 'a', 7, 2, 'ac'),
(2, 'b', 6, 1, 'ab'),
(2, 'b', 6, 1, 'av'),
(2, 'b', 5, 1, 'ab'),
(3, 'c', 5, 2, 'ab'),
(4, 'd', 4, 2, 'ac'),
(4, 'd', 3, 1, 'av');
Query
select a.* from ex1 a
join (
select id, max(price) as maxprice, min(usr) as minuser
from ex1
group by id
) c
on c.id = a.id
and a.price = c.maxprice
and a.usr = c.minuser
order by a.id, a.usr;
Result
id item price qty usr
1 a 7 1 ab
2 b 6 1 ab
3 c 5 2 ab
4 d 4 2 ac
Explanation
In your dataset, ID 1 has 2 records with the same price. You have to make a decision which one you want. So, in the above example, I am showing a single record for the user whose name is lowest alphabetically.
Alternate method
SQL Server has ranking function row_number over() that can be used as well:
select * from (
select row_number() over( partition by id order by id, price desc, usr) as sr, *
from ex1
) c where sr = 1;
The subquery says - give me all records from the table and give each row a serial number starting with 1 unique to each ID. The rows should be sorted by ID first, then price descending and then usr. The outer query picks out records with sr number 1.
Example here: https://rextester.com/KZCZ25396

Select TOP columns from table1, join table2 with their names

I have a TABLE1 with these two columns, storing departure and arrival identifiers from flights:
dep_id arr_id
1 2
6 2
6 2
6 2
6 2
3 2
3 2
3 2
3 4
3 4
3 6
3 6
and a TABLE2 with the respective IDs containing their ICAO codes:
id icao
1 LPPT
2 LPFR
3 LPMA
4 LPPR
5 LLGB
6 LEPA
7 LEMD
How can i select the top count of TABLE1 (most used departure id and most used arrival id) and group it with the respective ICAO code from TABLE2, so i can get from the provided example data:
most_arrivals most_departures
LPFR LPMA
It's simple to get ONE of them, but mixing two or more columns doesn't seem to work for me no matter what i try.
You can do it like this.
Create and populate tables.
CREATE TABLE dbo.Icao
(
id int NOT NULL PRIMARY KEY,
icao nchar(4) NOT NULL
);
CREATE TABLE dbo.Flight
(
dep_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id),
arr_id int NOT NULL
FOREIGN KEY REFERENCES dbo.Icao(id)
);
INSERT INTO dbo.Icao (id, icao)
VALUES
(1, N'LPPT'),
(2, N'LPFR'),
(3, N'LPMA'),
(4, N'LPPR'),
(5, N'LLGB'),
(6, N'LEPA'),
(7, N'LEMD');
INSERT INTO dbo.Flight (dep_id, arr_id)
VALUES
(1, 2),
(6, 2),
(6, 2),
(6, 2),
(6, 2),
(3, 2),
(3, 2),
(3, 2),
(3, 4),
(3, 4),
(3, 6),
(3, 6);
Then do a SELECT using two subqueries.
SELECT
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.arr_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_arrivals',
(SELECT TOP 1 I.icao
FROM dbo.Flight AS F
INNER JOIN dbo.Icao AS I
ON I.id = F.dep_id
GROUP BY I.icao
ORDER BY COUNT(*) DESC) AS 'most_departures';
Click this button on the toolbar to include the actual execution plan, when you execute the query.
And this is the graphical execution plan for the query. Each icon represents an operation that will be performed by the SQL Server engine. The arrows represent data flows. The direction of flow is from right to left, so the result is the leftmost icon.
try this one:
select
(select name
from table2 where id = (
select top 1 arr_id
from table1
group by arr_id
order by count(*) desc)
) as most_arrivals,
(select name
from table2 where id = (
select top 1 dep_id
from table1
group by dep_id
order by count(*) desc)
) as most_departures

query to count number of unique relations

I have 3 tables:
t_user (id, name)
t_user_deal (id, user_id, deal_id)
t_deal (id, title)
multiple user can be linked to the same deal. (I'm using oracle but it should be similar, I can adapt it)
How can I get all the users (name) with the number of unique user he made a deal with.
let's explain with some data:
t_user:
id, name
1, joe
2, mike
3, John
t_deal:
id, title
1, deal number 1
2, deal number 2
t_user_deal:
id, user_id, deal_id
1, 1, 1
2, 2, 1
3, 1, 2
4, 3, 2
the result I expect:
user_name, number of unique user he made a deal with
Joe, 2
Mike, 1
John, 1
I've try this but I didn't get the expected result:
SELECT tu.name,
count(tu.id) AS nbRelations
FROM t_user tu
INNER JOIN t_user_deal tud ON tu.id = tud.user_id
INNER JOIN t_deal td ON tud.deal_id = td.id
WHERE
(
td.id IN
(
SELECT DISTINCT td.id
FROM t_user_deal tud2
INNER JOIN t_deal td2 ON tud2.deal_id = td2.id
WHERE tud.id <> tud2.user_id
)
)
GROUP BY tu.id
ORDER BY nbRelations DESC
thanks for your help
This should get you the result
SELECT id1, count(id2),name
FROM (
SELECT distinct tud1.user_id id1 , tud2.user_id id2
FROM t_user_deal tud1, t_user_deal tud2
WHERE tud1.deal_id = tud2.deal_id
and tud1.user_id <> tud2.user_id) as tab, t_user tu
WHERE tu.id = id1
GROUP BY id1,name
Something like
select name, NVL (i.ud, 0) ud from t_user join (
SELECT user_id, count(*) ud from t_user_deal group by user_id) i on on t_user.id = i.user_id
where i.ud > 0
Unless I'm missing somethig here. It actually sounds like your question references having a second user in the t_user_deal table. The model you've described here doesn't include that.
PostgreSQL example:
create table t_user (id int, name varchar(255)) ;
create table t_deal (id int, title varchar(255)) ;
create table t_user_deal (id int, user_id int, deal_id int) ;
insert into t_user values (1, 'joe'), (2, 'mike'), (3, 'john') ;
insert into t_deal values (1, 'deal 1'), (2, 'deal 2') ;
insert into t_user_deal values (1, 1, 1), (2, 2, 1), (3, 1, 2), (4, 3, 2) ;
And the query.....
SELECT
name, COUNT(DISTINCT deal_id)
FROM
t_user INNER JOIN t_user_deal ON (t_user.id = t_user_deal.user_id)
GROUP BY
user_id, name ;
The DISTINCT might not be necessary (in the COUNT(), that is). Depends on how clean your data is (e.g., no duplicate rows!)
Here's the result in PostgreSQL:
name | count
------+-------
joe | 2
mike | 1
john | 1
(3 rows)

Find rows with same ID and have a particular set of names

EDIT:
I have a table with 3 rows like so.
ID NAME REV
1 A 0
1 B 0
1 C 0
2 A 1
2 B 0
2 C 0
3 A 1
3 B 1
I want to find the ID wich has a particular set of Names and the REV is same
example:
Edit2: GBN's solution would have worked perfectly, but since i do not have the access to create new tables. The added constraint is that no new tables can be created.
if input = A,B then output is 3
if input = A ,B,C then output is 1 and not 1,2 since the rev level differs in 2.
The simplest way is to compare a COUNT per ID with the number of elements in your list:
SELECT
ID
FROM
MyTable
WHERE
NAME IN ('A', 'B', 'C')
GROUP BY
ID
HAVING
COUNT(*) = 3;
Note: ORDER BY isn't needed and goes after the HAVING if needed
Edit, with question update. In MySQL, it's easier to use a separate table for search terms
DROP TABLE IF EXISTS gbn;
CREATE TABLE gbn (ID INT, `name` VARCHAR(100), REV INT);
INSERT gbn VALUES (1, 'A', 0);
INSERT gbn VALUES (1, 'B', 0);
INSERT gbn VALUES (1, 'C', 0);
INSERT gbn VALUES (2, 'A', 1);
INSERT gbn VALUES (2, 'B', 0);
INSERT gbn VALUES (2, 'C', 0);
INSERT gbn VALUES (3, 'A', 0);
INSERT gbn VALUES (3, 'B', 0);
DROP TABLE IF EXISTS gbn1;
CREATE TABLE gbn1 ( `name` VARCHAR(100));
INSERT gbn1 VALUES ('A');
INSERT gbn1 VALUES ('B');
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = (SELECT COUNT(*) FROM gbn1)
AND MIN(gbn.REV) = MAX(gbn.REV);
INSERT gbn1 VALUES ('C');
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = (SELECT COUNT(*) FROM gbn1)
AND MIN(gbn.REV) = MAX(gbn.REV);
Edit 2, without extra table, use a derived (inline) table:
SELECT
gbn.ID
FROM
gbn
LEFT JOIN
(SELECT 'A' AS `name`
UNION ALL SELECT 'B'
UNION ALL SELECT 'C'
) gbn1 ON gbn.`name` = gbn1.`name`
GROUP BY
gbn.ID
HAVING
COUNT(*) = 3 -- matches number of elements in gbn1 derived table
AND MIN(gbn.REV) = MAX(gbn.REV);
Similar to gbn, but allowing for the possibility of duplicate ID/Name combinations:
SELECT ID
FROM MyTable
WHERE NAME IN ('A', 'B', 'C')
GROUP BY ID
HAVING COUNT(DISTINCT NAME) = 3;
OKAY!... I solved my problem ! I modified GBN's logic to do it without a search table using the IN clause
1 flaw with doing MAX(rev) = MIN(REV) is: if i have a data like so .
ID NAME REV
1 A 0
1 B 1
1 A 1
then when I use a query like
Select ID from TABLE
where NAME in {A,B}
groupby ID
having count(*) = 2
and MIN(REV) = MAX(REV)
it will not show me the ID 1 as the min and max are different and the count is 3.
So i simply add another column to the groupby
so the final query is
Select ID from TABLE
where NAME in {A,B}
groupby ID,REV
having count(*) = 2
and MIN(REV) = MAX(REV)
Thanks,to all that helped. !

SQL if statement with two tables

Given the following tables:
table objects
id Name rating
1 Megan 9
2 Irina 10
3 Vanessa 7
4 Samantha 9
5 Roxanne 1
6 Sonia 8
swap table
id swap_proposalid counterpartyid
1 4 2
2 3 2
Everyone wants the ten. I would like to make a list for Irina of possible swaps where id 4 and 3 don't appear because the propositions are already there.
output1
id Name rating
1 Megan 9
5 Roxanne 1
6 Sonia 8
Thanks
This should do the trick:
SELECT o.id, o.Name, o.rating
FROM objects o
LEFT JOIN swap s on o.id = s.swap_proposalid
WHERE s.id IS NULL
AND o.Name != 'Irina'
This works
SELECT mt2.ID, mt2.Name, mt2.Rating
FROM [MyTable] mt2 -- Other Candidates
, [MyTable] mt1 -- Candidate / Subject (Irina)
WHERE mt2.ID NOT IN
(
SELECT st.swap_proposalid
FROM SwapTable st
WHERE
st.counterpartyid = mt1.ID
)
AND mt1.ID <> mt2.ID -- Don't match Irina with Irina
AND mt1.Name = 'Irina' -- Find other swaps for Irina
-- Test Data
CREATE TABLE MyTable
(
ID INT,
Name VARCHAR(100),
Rating INT
)
GO
CREATE TABLE SwapTable
(
ID INT,
swap_proposalid INT,
counterpartyid INT
)
GO
INSERT INTO MyTable VALUES(1 ,'Megan', 9)
INSERT INTO MyTable VALUES(2 ,'Irina', 10)
INSERT INTO MyTable VALUES(3 ,'Vanessa', 7)
INSERT INTO MyTable VALUES(4 ,'Samantha', 9)
INSERT INTO MyTable VALUES(5 ,'Roxanne', 1)
INSERT INTO MyTable VALUES(6 ,'Sonia', 8)
INSERT INTO SwapTable(ID, swap_proposalid, counterpartyid)
VALUES (1, 4, 2)
INSERT INTO SwapTable(ID, swap_proposalid, counterpartyid)
VALUES (1, 3, 2)
Guessing that the logic involves identifying the objects EXCEPT the highest rated object EXCEPT propositions with the highest rated object e.g. (using sample DDL and data kindly posted by #nonnb):
WITH ObjectHighestRated
AS
(
SELECT ID
FROM MyTable
WHERE Rating = (
SELECT MAX(T.Rating)
FROM MyTable T
)
),
PropositionsForHighestRated
AS
(
SELECT swap_proposalid AS ID
FROM SwapTable
WHERE counterpartyid IN (SELECT ID FROM ObjectHighestRated)
),
CandidateSwappersForHighestRated
AS
(
SELECT ID
FROM MyTable
EXCEPT
SELECT ID
FROM ObjectHighestRated
EXCEPT
SELECT ID
FROM PropositionsForHighestRated
)
SELECT *
FROM MyTable
WHERE ID IN (SELECT ID FROM CandidateSwappersForHighestRated);