TSQL Delete Duplicates in table after comparing results found in duplicate search - sql

I have duplicate data in a single table.
Table Layout
accountNumber | firstName | lastName | address | zip
SMI2365894511 | Paul | Smith | 1245 Rd | 89120
SMI2365894511 | Paul | Smith | |
I have the below query to find and display the duplicates.
select *
from tableA a
join (select accountNumber
from tableA
group by accountNumber
having count(*) > 1 ) b
on a.accountNumber = b.accountNumber
What I would like to do is compare the results of the above query and remove the duplicate that doesn't have any address information. I'm using MS SQL Server 2014
EDIT** I have the query the way it is so can see both duplicate rows

delete a
from XmaCustomerDetails a
join ( select accountNumber
from XmaCustomerDetails
group by accountNumber
having count(*) > 1 ) b
on a.accountNumber = b.accountNumber
WHERE address is null

Related

Why no similar ids in the results set when query with a correlated query inside where clause

I have a table with columns id, forename, surname, created (date).
I have a table such as the following:
ID | Forename | Surname | Created
---------------------------------
1 | Tom | Smith | 2008-01-01
1 | Tom | Windsor | 2008-02-01
2 | Anne | Thorn | 2008-01-05
2 | Anne | Baker | 2008-03-01
3 | Bill | Sykes | 2008-01-20
Basically, I want this to return the most recent name for each ID, so it would return:
ID | Forename | Surname | Created
---------------------------------
1 | Tom | Windsor | 2008-02-01
2 | Anne | Baker | 2008-03-01
3 | Bill | Sykes | 2008-01-20
I get the desired result with this query.
SELECT id, forename, surname, created
FROM name n
WHERE created = (SELECT MAX(created)
FROM name
GROUP BY id
HAVING id = n.id);
I am getting the result I want but I fail to understand WHY THE IDS ARE NOT BEING REPEATED in the result set. What I understand about correlated subquery is it takes one row from the outer query table and run the inner subquery. Shouldn't it repeat "id" when ids repeat in the outer query? Can someone explain to me what exactly is happening behind the scenes?
First, your subquery does not need a GROUP BY. It is more commonly written as:
SELECT n.id, n.forename, n.surname, n.created
FROM name n
WHERE n.created = (SELECT MAX(n2.created)
FROM name n2
WHERE n2.id = n.id
);
You should get in the habit of qualifying all column references, especially when your query has multiple table references.
I think you are asking why this works. Well, each row in the outer query is tested for the condition. The condition is: "is my created the same as the maximum created for all rows in the name table with the same id". In your data, only one row per id matches that condition, so ids are not repeated.
You can also consider joining the tables by created vs max(created) column values :
SELECT n.id, n.forename, n.surname, n.created
FROM name n
RIGHT JOIN ( SELECT id, MAX(created) as created FROM name GROUP BY id ) t
ON n.created = t.created;
or using IN operator :
SELECT id, forename, surname, created
FROM name n
WHERE ( id, created ) IN (SELECT id, MAX(created)
FROM name
GROUP BY id );
or using EXISTS with HAVING clause in the subquery :
SELECT id, forename, surname, created
FROM name n
WHERE EXISTS (SELECT id
FROM name
GROUP BY id
HAVING MAX(created) = n.created
);
Demo

postgresql find duplicates in column with ID

For instance, I have a table say "Name" with duplicate records in it:
Id | Firstname
--------------------
1 | John
2 | John
3 | Marc
4 | Jammie
5 | John
6 | Marc
How can I fetch duplicate records and display them with their receptive primary key ID?
Use Count()Over() window aggregate function
Select * from
(
select Id, Firstname, count(1)over(partition by Firstname) as Cnt
from yourtable
)a
Where Cnt > 1
SELECT t.*
FROM t
INNER JOIN
(SELECT firstname
FROM t
GROUP BY firstname
HAVING COUNT(*) > 1) sub
ON t.firstname = sub.firstname
A sub-query would do the trick. Select the first names that are found more than once your table, t. Then join these names back to the main table to pull in the primary key.

Count how many times a value appears in tables SQL

Here's the situation:
So, in my database, a person is "responsible" for job X and "linked" to job Y. What I want is a query that returns: name of person, his ID and he number of jobs it's linked/responsible. So far I got this:
select id_job, count(id_job) number_jobs
from
(
select responsible.id
from responsible
union all
select linked.id
from linked
GROUP BY id
) id_job
GROUP BY id_job
And it returns a table with id in the first column and number of occurrences in the second. Now, what I can't do is associate the name of person to the table. When i put that in the "select" from beginning it gives me all the possible combinations... How can I solve this? Thanks in advance!
Example data and desirable output:
| Person |
id | name
1 | John
2 | Francis
3 | Chuck
4 | Anthony
| Responsible |
process_no | id
100 | 2
200 | 2
300 | 1
400 | 4
| Linked |
process_no | id
101 | 4
201 | 1
301 | 1
401 | 2
OUTPUT:
| OUTPUT |
id | name | number_jobs
1 | John | 3
2 | Francis | 3
3 | Chuck | 0
4 | Anthony | 2
Try this way
select prs.id, prs.name, count(*) from Person prs
join(select process_no, id
from Responsible res
Union all
select process_no, id
from Linked lin ) a on a.id=prs.id
group by prs.id, prs.name
I would recommend aggregating each of the tables by the person and then joining the results back to the person table:
select p.*, coalesce(r.cnt, 0) + coalesce(l.cnt, 0) as numjobs
from person p left join
(select id, count(*) as cnt
from responsible
group by id
) r
on r.id = p.id left join
(select id, count(*) as cnt
from linked
group by id
) l
on l.id = p.id;
select id, name, count(process_no) FROM (
select pr.id, pr.name, res.process_no from Person pr
LEFT JOIN Responsible res on pr.id = res.id
UNION
select pr.id, pr.name, lin.process_no from Person pr
LEFT JOIN Linked lin on pr.id = lin.id) src
group by id, name
order by id
Query ain't tested, give it a shot, but this is the way you want to go

How to select all attributes (*) with distinct values in a particular column(s)?

Here is link to the w3school database for learners:
W3School Database
If we execute the following query:
SELECT DISTINCT city FROM Customers
it returns us a list of different City attributes from the table.
What to do if we want to get all the rows like that we get from SELECT * FROM Customers query, with unique value for City attribute in each row.
DISTINCT when used with multiple columns, is applied for all the columns together. So, the set of values of all columns is considered and not just one column.
If you want to have distinct values, then concatenate all the columns, which will make it distinct.
Or, you could group the rows using GROUP BY.
You need to select all values from customers table, where city is unique. So, logically, I came with such query:
SELECT * FROM `customers` WHERE `city` in (SELECT DISTINCT `city` FROM `customers`)
I think you want something like this:
(change PK field to your Customers Table primary key or index like Id)
In SQL Server (and standard SQL)
SELECT
*
FROM (
SELECT
*, ROW_NUMBER() OVER (PARTITION BY City ORDER BY PK) rn
FROM
Customers ) Dt
WHERE
(rn = 1)
In MySQL
SELECT
*
FORM (
SELECT
a.City, a.PK, count(*) as rn
FROM
Customers a
JOIN
Customers b ON a.City = b.City AND a.PK >= b.PK
GROUP BY a.City, a.PK ) As DT
WHERE (rn = 1)
This query -I hope - will return your Cities distinctly and also shows other columns.
You can use GROUP BY clause for getting distinct values in a particular column. Consider the following table - 'contact':
+---------+------+---------+
| id | name | city |
+---------+------+---------+
| 1 | ABC | Chennai |
+---------+------+---------+
| 2 | PQR | Chennai |
+---------+------+---------+
| 3 | XYZ | Mumbai |
+---------+------+---------+
To select all columns with distinct values in City attribute, use the following query:
SELECT *
FROM contact
GROUP BY city;
This will give you the output as follows:
+---------+------+---------+
| id | name | city |
+---------+------+---------+
| 1 | ABC | Chennai |
+---------+------+---------+
| 3 | XYZ | Mumbai |
+---------+------+---------+

SQL delete almost identical rows

I have a table that have 5 columns, and instead of update, I've done insert of all rows(stupid mistake). How to get rid of duplicated records. They are identical except of the id. I can't remove all records, but I want do delete half of them.
ex. table:
+-----+-------+--------+-------+
| id | name | name2 | user |
+-----+-------+--------+-------+
| 1 | nameA | name2A | u1 |
| 12 | nameA | name2A | u1 |
| 2 | nameB | name2B | u2 |
| 192 | nameB | name2B | u2 |
+-----+-------+--------+-------+
How to do this?
I'm using Microsoft Sql Server.
Try the following.
DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY Name, Name2, User)
That is untested so may need adapting. The following video will provide you with some more information about this query.
Video
This is more specific query than #TechDo as I find duplicates where name, name2 and user are identical not only name.
with duplicates as
(
select t.id, ROW_NUMBER() over (partition by t.name, t.name2, t.[user] order by t.id) as RowNumber
from YourTable t
)
delete duplicates
where RowNumber > 1
SQLFiddle demo to try it yourself: DEMO
Please try:
with c as
(
select
*, row_number() over(partition by name, name2, [user] order by id) as n
from YourTable
)
delete from c
where n > 1;