Deleting duplicates in a table based on a criteria only in SQL - sql

Let's say I have a table with columns:
CustomerNumber
Lastname
Firstname
PurchaseDate
...and other columns that do not change anything in the question if they're not shown here.
In this table I could have many rows for the same customer with different purchase dates (I know, poorly designed... I'm only trying to fix an issue for reporting, not really trying to fix the root of the problem).
How, in SQL, can I keep one record per customer with the latest date, and delete the rest? A group by doesn't seem to be working for my case

;with a as
(
select row_number() over (partition by CustomerNumber, Lastname, Firstname order by PurchaseDate desc) rn
from <table>
)
delete from a where rn > 1

This worked for me (on DB2):
DELETE FROM my_table
WHERE (CustomerNumber, Lastname, Firstname, PurchaseDate)
NOT IN (
SELECT CustomerNumber, Lastname, Firstname, MAX(PurchaseDate)
FROM my_table
GROUP BY CustomerNumber, Lastname, FirstName
)

SELECT CustomerNumber, Lastname, Firstname, MAX(PurchaseDate) LatestPurchaseDate
FROM Table
GROUP BY CustomerNumber, Lastname, Firstname
The MAX will select the highest (latest) date and show that date for each unique combination of the GROUP BY columns.
EDIT: I misunderstood that you wanted to delete records for all but the latest purchase date.
WITH Keep AS
(
SELECT CustomerNumber, Lastname, Firstname, MAX(PurchaseDate) LatestPurchaseDate
FROM Table
GROUP BY CustomerNumber, Lastname, Firstname
)
DELETE FROM Table
WHERE NOT EXISTS
(
SELECT *
FROM Keep
WHERE Table.CustomerNumber = Keep.CustomerNumber
AND Table.Lastname = Keep.Lastname
AND Table.Firstname = Keep.Firstname
AND Table.PurchaseDate = Keep.LastPurchaseDate
)

Related

SQL Query to Obtain the Oldest People

I am trying to find the oldest customers in my database. I want just their full names and their ages, but my current results are outputting all customers and their ages (not just the oldest). What am I doing wrong here?
SELECT
LTRIM(CONCAT(' ' + Prefix, ' ' + FirstName,
' ' + MiddleName, ' ' + LastName, ', ' + Suffix)),
MAX(DATEDIFF(year, BirthDate, GETDATE()))
FROM
Customers
WHERE
BirthDate is not null
GROUP BY
Prefix, FirstName, MiddleName, LastName, Suffix
ORDER BY
MAX(DATEDIFF(year, e.BirthDate, GETDATE())) desc
Note that there seems to be multiple customers with the same oldest age.
You have not defined what you mean with "oldest customers".
So I will give a few options you could try
to see a list of customers with the oldest on top, use a simple querie like this
SELECT FirstName, LastName, Suffix, BirthDate
FROM Customers
WHERE BirthDate is not null
ORDER BY BirthDate desc
to restrict the result to a number of rows, for example the 10 oldest, use top 10
SELECT top 10
FirstName, LastName, Suffix, BirthDate
FROM Customers
WHERE BirthDate is not null
ORDER BY BirthDate desc
to restrict the result to all customers born after a certain date, add to the where clause
SELECT FirstName, LastName, Suffix, BirthDate
FROM Customers
WHERE BirthDate is not null
and BirtDate < '19920101'
ORDER BY BirthDate desc
The first thing you need to do before you do anything else is define a unique numeric primary key on the Customers table.
ALTER TABLE Customers ADD Cust_Id int IDENTITY(1,1);
ALTER TABLE Customers ADD CONSTRAINT PK_Customers PRIMARY KEY (Cust_Id);
After you've doe that, the following code will give you the "oldest customer (or customers) in your database".
With qry1 As (
SELECT Cust_Id,
DATEDIFF(year, BirthDate, GETDATE()) As Age
FROM Customers
WHERE BirthDate is not null
),
qry2 As (
SELECT Max(Age) As Max_Age
FROM qry1
)
SELECT Customers.Cust_Id,
Customers.Prefix,
Customers.FirstName,
Customers.MiddleName,
Customers.LastName,
Customers.Suffix,
Qry1.Age
FROM Customers
Inner Join Qry1 On Customers.Cust_Id = Qry1.Cust_Id
Inner Join Qry2 On Qry1.Age = Qry2.Max_Age

SQL Select column which is not used in select section of subquery which find duplicates

I am trying to find in my database records which has duplicated fields like name, surname and type.
Example:
SELECT name, surname, type, COUNT(*)
FROM customers
GROUP BY name, surname
HAVING COUNT(*)>1
Query results:
Robb|Stark|1|2
Tyrion|Lannister|1|3
So we have duplicated customer with name and surname "Robb Stark" 2 times and "Tyrion Lannister" 3 times
Now, I want to know the id of these records.
I found similar problem described here:
Finding duplicate values in a SQL table
there is answer but no example.
Use COUNT as an analytic function:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY name, surname) cnt
FROM customers
)
SELECT * -- return all columns
FROM cte
WHERE cnt > 1
ORDER BY name, surname;
The simplest way will be to use the EXISTS as follows:
SELECT t.*
FROM customers t
where exists
(select 1 from customers tt
where tt.name = t.name
and tt.surname = t.surname
and tt.id <> t.id)
Or use your original query in IN clause as follows:
select * from customers where (name, surname) in
(SELECT name, surname
FROM customers
GROUP BY name, surname
HAVING COUNT(*)>1)
If you want one row per group of duplicate, with the list of id in a comma separated string, you can just use string aggration with your existing query:
SELECT name, surname, COUNT(*) as cnt,
STRING_AGG(id, ',') WITHIN GROUP (ORDER BY id) as all_ids
FROM customers
GROUP BY name, surname
HAVING COUNT(*) > 1

No column was specified for column 1 of 'T1' when using a sub-select with a group by

I have a working query:
SELECT
COUNT(*), ACCOUNT_ID
FROM
CDS_PLAYER
GROUP BY
ACCOUNT_ID
HAVING
COUNT(*) > 1`
Output
No column name Account_ID
----------------------------
'2' '12345'
I'm trying to add names to these accounts (all from the same table) but with no luck. The only query that gets me close is:
SELECT
LASTNAME, FIRSTNAME, COUNT(ACCOUNT_ID) AS NUMBER
FROM
(SELECT
COUNT(*), ACCOUNT_ID
FROM
CDS_PLAYER
GROUP BY
ACCOUNT_ID
HAVING
COUNT(*) > 1) AS T1
GROUP BY
LASTNAME, FIRSTNAME, PLAYER_ID
But I get an error:
No column was specified for column 1 of 'T1'
Like I said VERY NEW AT THIS. My boss of 4 months wanted me to learn and so I'm self taught (books and google). Any help at all to get me where I need to be would be appreciated!
(I'm using Windows Server 2003 and SQL Server 2000)
The error message can be resolved as below
SELECT LASTNAME, FIRSTNAME, COUNT(ACCOUNT_ID) AS NUMBER
FROM
(SELECT COUNT(*) AS Total, ACCOUNT_ID FROM CDS_PLAYER GROUP BY ACCOUNT_ID HAVING
COUNT(*) > 1) AS T1
GROUP BY LASTNAME, FIRSTNAME, PLAYER_ID`
Add as TOTAL after the count(*)
Does this do what you want?
SELECT COUNT(*), ACCOUNT_ID, LASTNAME, FIRSTNAME, PLAYER_ID
FROM CDS_PLAYER
GROUP BY ACCOUNT_ID, LASTNAME, FIRSTNAME, PLAYER_ID
HAVING COUNT(*) > 1;
You should also update your version of SQL Server. It is like 15 years out of date and hasn't been supported in many years. You can download a free version of SQL Server Express from Microsoft.
you want to select the LASTNAME and FIRSTNAME, but havn't it selected in your subselect. You only can access field which are in the resultset.
Solution: Add it to your GROUP BY clause.
ie:
SELECT
LASTNAME, FIRSTNAME, COUNT(ACCOUNT_ID) AS NUMBER
FROM
(SELECT COUNT(*), LASTNAME, FIRSTNAME, ACCOUNT_ID
FROM CDS_PLAYER
GROUP BY ACCOUNT_ID, LASTNAME, FIRSTNAME
HAVING COUNT(*) > 1) AS T1
GROUP BY
LASTNAME, FIRSTNAME, PLAYER_ID

MySQL, return only rows where there are duplicates among two columns

I have a table in MySQL of contact information ;
first name, last name, address, etc.
I would like to run a query on this table that will return only rows with first and last name combinations which appear in the table more than once.
I do not want to group the "duplicates" (which may only be duplicates of the first and last name, but not other information like address or birthdate) -
I want to return all the "duplicate" rows so I can look over the results and determine if they are dupes or not. This seemed like it would be a simple thing to do, but it has not been.
Every solution I can find either groups the dupes and gives me a count only (which is not useful for what I need to do with the results) or doesn't work at all.
Is this kind of logic even possible in a query ? Should I try and do this in Python or something?
You should be able doing this with the GROUP BY approach in a sub-query.
SELECT t.first_name, t.last_name, t.address
FROM your_table t
JOIN ( SELECT first_name, last_name
FROM your_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
) t2
ON ( t.first_name = t2.first_name, t.last_name = t2.last_name )
The sub-query returns all names (first_name and last_name) that exist more than once, and the JOIN returns all records that match these names.
You could do it with a GROUP BY / HAVING and A SUB SELECT. Something like
SELECT t.*
FROM Table t INNER JOIN
(
SELECT FirstName, LastName
FROM Table
GROUP BY FirstName, LastName
HAVING COUNT(*) > 1
) Dups ON t.FirstName = Dups.FirstName
AND t.LastName = Dups.LastName
select * from people
join (select firstName, lastName
from people
group by firstName, lastName
having count(*) > 1
) dupe
using (firstName, lastName)

How we can use CTE in subquery in sql server?

How we can use a CTE in a subquery in SQL Server?
like:
SELECT id (I want to use CTE here), name FROM table_name
Just define your CTE on top and access it in the subquery?
WITH YourCTE(blubb) AS
(
SELECT 'Blubb'
)
SELECT id,
(SELECT blubb FROM YourCTE),
name
FROM table_name
It doesn't work:
select id (I want to use CTE here), name from table_name
It's not possible to use CTE in sub queries.
You can realize it as a work around:
CREATE VIEW MyCTEView AS ..here comes your CTE-Statement.
Then you are able to do this:
select id (select id from MyCTEView), name from table_name
Create a view with CTE/ Multiple CTEs with UNION sets of all CTEs
CREATE VIEW [dbo].[_vEmployees]
AS
WITH
TEST_CTE(EmployeeID, FirstName, LastName, City, Country)
AS (
SELECT EmployeeID, FirstName, LastName, City, Country FROM Employees WHERE EmployeeID = 4
),
TEST_CTE2
AS (
SELECT EmployeeID, FirstName, LastName, City, Country FROM Employees WHERE EmployeeID = 7
)
SELECT EmployeeID, FirstName, LastName, City, Country FROM TEST_CTE UNION SELECT * FROM TEST_CTE2
GO
Now, use it into sub query
SELECT * FROM Employees WHERE EmployeeID IN (SELECT EmployeeID FROM _vEmployees)