SQL Query to get rid of duplicate records [duplicate]

SQL Query to get rid of duplicate records [duplicate] - sql

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How Do You Delete Duplicate Records In SQL
I have a table with columns:
FirstName
LastName
AddressLine1
AddressLine2
City
State
For each row, AddressLine1 is different from AddressLine2 value. But there are some duplicates rows, in which AddressLine1 of some records matches with AddressLine2 of some other record.
I want to get rid of such records mentioned above.

This will get all duplicate records:
SELECT P.*
FROM table P INNER JOIN
table S ON P.FirstName = S.FirstName
AND P.LastName = S.LastName
WHERE P.AddressLine1 = S.AddressLine2
If your table had an ID you could write a delete to remove duplicates like this:
DELETE FROM table
WHERE Id IN (
SELECT P.Id
FROM table P INNER JOIN
table S ON P.FirstName = S.FirstName
AND P.LastName = S.LastName
WHERE P.AddressLine1 = S.AddressLine2
)

Join the table to itself
DELETE a
FROM Table a
JOIN Table b
ON a.AddressLine1 = b.AddressLine2

Swap out UNIQUE_IDENTIFIER with some ID's or names or what have you do easily identify in the future. Then you can hand delete as needed. Or modify what is below into an UPDATE or DELETE statement as needed.
SELECT
t1.UNIQUE_IDENTIFIER,
t2.UNIQUE_IDENTIFIER
FROM
table t1,
table t2
WHERE
t1.AddressLine1 = t2.AddressLine2

Related

Updating fields with indirectly related data

I have four tables:
student table with id_student, firstname and lastname fields
rent table with id_student and id_book
book table with id_book, book_name, author_name
booking table with id_book, current_student
The rent table only has IDs and links between student and book tables.
How can I update the booking.current_student field with a concatenation of firstname and lastname fields from student table (like 'john doe' ) - for each id_book in booking table, update booking.curent_student from student.firstname and student.lastname.
Since the booking table has no id_student column, how can I update booking.current_student from the student table?

You need a correlated update, which uses a subquery to get the new value to use in the set clause for each row; and that subquery needs to join the rent and student tables:
update booking b
set current_student = (
select s.firstname || ' ' || s.lastname
from rent r
join student s on s.id_student = r.id_student
where r.id_book = b.id_book
);
The 'correlation' part is that the subquery filters on r.id_book = b.id_book, so it correlates with the outer booking (b) table which is being updated.
If there are any rows in booking which don't have a matching rent row then they will be set to null. And if you have multiple booking rows for the same book ID then they will all be updated to the same student name; and if you have multiple rent rows for the same book ID it will error as the subquery will return multiple rows.
It's generally not a good idea to duplicate data like this. It would require less maintenance if you used a view instead:
create view booking (id_book, current_name) as
select r.id_book, s.firstname || ' ' || s.lastname
from rent r
join student s on s.id_student = r.id_student;
Then as rows are added to or removed from the rent table, or if a student changes their name, the view will automatically reflect the changes without you having to do anything.

SELECT Statement in CASE

Please don't downgrade this as it is bit complex for me to explain. I'm working on data migration so some of the structures look weird because it was designed by someone like that.
For ex, I have a table Person with PersonID and PersonName as columns. I have duplicates in the table.
I have Details table where I have PersonName stored in a column. This PersonName may or may not exist in the Person table. I need to retrieve PersonID from the matching records otherwise put some hardcode value in PersonID.
I can't write below query because PersonName is duplicated in Person Table, this join doubles the rows if there is a matching record due to join.
SELECT d.Fields, PersonID
FROM Details d
JOIN Person p ON d.PersonName = p.PersonName
The below query works but I don't know how to replace "NULL" with some value I want in place of NULL
SELECT d.Fields, (SELECT TOP 1 PersonID FROM Person where PersonName = d.PersonName )
FROM Details d
So, there are some PersonNames in the Details table which are not existent in Person table. How do I write CASE WHEN in this case?
I tried below but it didn't work
SELECT d.Fields,
CASE WHEN (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) = null
THEN 123
ELSE (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) END Name
FROM Details d
This query is still showing the same output as 2nd query. Please advise me on this. Let me know, if I'm unclear anywhere. Thanks

well.. I figured I can put ISNULL on top of SELECT to make it work.
SELECT d.Fields,
ISNULL(SELECT TOP 1 p.PersonID
FROM Person p where p.PersonName = d.PersonName, 124) id
FROM Details d

A simple left outer join to pull back all persons with an optional match on the details table should work with a case statement to get your desired result.
SELECT
*
FROM
(
SELECT
Instance=ROW_NUMBER() OVER (PARTITION BY PersonName),
PersonID=CASE WHEN d.PersonName IS NULL THEN 'XXXX' ELSE p.PersonID END,
d.Fields
FROM
Person p
LEFT OUTER JOIN Details d on d.PersonName=p.PersonName
)AS X
WHERE
Instance=1

Ooh goody, a chance to use two LEFT JOINs. The first will list the IDs where they exist, and insert a default otherwise; the second will eliminate the duplicates.
SELECT d.Fields, ISNULL(p1.PersonID, 123)
FROM Details d
LEFT JOIN Person p1 ON d.PersonName = p1.PersonName
LEFT JOIN Person p2 ON p2.PersonName = p1.PersonName
AND p2.PersonID < p1.PersonID
WHERE p2.PersonID IS NULL

You could use common table expressions to build up the missing datasets, i.e. your complete Person table, then join that to your Detail table as follows;
declare #n int;
-- set your default PersonID here;
set #n = 123;
-- Make sure previous SQL statement is terminated with semilcolon for with clause to parse successfully.
-- First build our unique list of names from table Detail.
with cteUniqueDetailPerson
(
[PersonName]
)
as
(
select distinct [PersonName]
from [Details]
)
-- Second get unique Person entries and record the most recent PersonID value as the active Person.
, cteUniquePersonPerson
(
[PersonID]
, [PersonName]
)
as
(
select
max([PersonID]) -- if you wanted the original Person record instead of the last, change this to min.
, [PersonName]
from [Person]
group by [PersonName]
)
-- Third join unique datasets to get the PersonID when there is a match, otherwise use our default id #n.
-- NB, this would also include records when a Person exists with no Detail rows (they are filtered out with the final inner join)
, cteSudoPerson
(
[PersonID]
, [PersonName]
)
as
(
select
coalesce(upp.[PersonID],#n) as [PersonID]
coalesce(upp.[PersonName],udp.[PersonName]) as [PersonName]
from cteUniquePersonPerson upp
full outer join cteUniqueDetailPerson udp
on udp.[PersonName] = p.[PersonName]
)
-- Fourth, join detail to the sudo person table that includes either the original ID or our default ID.
select
d.[Fields]
, sp.[PersonID]
from [Details] d
inner join cteSudoPerson sp
on sp.[PersonName] = d.[PersonName];

JOIN Two tables one of which has where clause

I want to join two tables where Person table fields(having Street S1) will be merged into Student table, but adding new field (STUDENT/NONSTUDENT).
Student table has 1milyon rows, Result person table has max 100 rows.
What is the best sql for performance to merge them all?
student table (name, age)
A-12
B-23
C-24
person table (name, street, live)
A-S1-L
B-S2-NL
D-S1-L
At the end I want such result
A-12-Student
D-NULL-NOTSTUDENT

This should work:
select p.name,
s.age,
case when s.name is null then 'NotStudent'
else 'Student' end as IsStudent
from person p
left join student s on p.name = s.name
where p.Street = 's1'

Only return results from first table with two joins

I have 3 tables. I need to get lastname, firstname, and employee number from the first table and name from another table.
In order for me to get the name on table s there needs to be a match between the slsrep columns on table s and table sw.
The issues is that I only want to return the rows from the first table (p). There is only 700 records in the first table but it is pulling 900.
Basically, I just want to look at each row in the table p and match the name from table s.
This is what I currently have:
SELECT p.LastName,
p.FirstName,
p.EmpNo,
s.Name
FROM PDDA..PhoneDirectory p
LEFT OUTER JOIN nxtsql..swsmsn sw
ON p.EmpNo = sw.EmpNo
JOIN NxtSQL..SMSN s
ON sw.slsrep = s.slsrep
WHERE sw.statustype = 1
ORDER BY
p.LastName

There are lots of ways to do this. One is to use a sub-select to get s.Name:
SELECT p.LastName, p.FirstName, p.EmpNo, (
SELECT TOP 1 s.Name
FROM NxtSQL..SMSN s
INNER JOIN nxtsql..swsmsn sw
ON sw.slsrep = s.slsrep
WHERE p.EmpNo = sw.EmpNo
AND sw.statustype = 1
) AS Name
FROM PDDA..PhoneDirectory p
ORDER By p.LastName

Insert primary keys from two newly created tables in a relationship table

I have one table which contained 4 records (PersonName, CityName, CityState, CityCountry) in two different tables. One of the tables now has personID, personName and the other one has Cityid, CityName, CityState, CityCountry.
Now I created third table which hold PersonId, CityId. How can I populate that table with the ids of person and city from the original table since they are split now. I want to get the ids from the newly created tables based on the relationship they had in the original table.

Can you not just join back to the original table?
INSERT PersonCity (PersonID, CityID)
SELECT p.PersonID, c.CityID
FROM OriginalTable o
INNER JOIN Person p
ON p.PersonName = o.Personname
INNER JOIN City c
ON c.CityName = o.CityName
AND c.CityState = o.CityState
AND c.CityCountry = o.CityCountry;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Query to get rid of duplicate records [duplicate] - sql

Join the table to itself DELETE a FROM Table a JOIN Table b ON a.AddressLine1 = b.AddressLine2

Related

Updating fields with indirectly related data

SELECT Statement in CASE

JOIN Two tables one of which has where clause

Only return results from first table with two joins

Insert primary keys from two newly created tables in a relationship table

Categories

Resources