Using sql to keep only a single record where both name field and address field repeat in 5+ records - sql

I am trying to delete all but one record from table where name field repeats same value more than 5 times and the address field repeats more than five times for a table. So if there are 5 records with a name field and address field that are the same for all 5, then I would like to delete 4 out of 5. An example:
id name address
1 john 6440
2 john 6440
3 john 6440
4 john 6440
5 john 6440
I would only want to return 1 record from the 5 records above.
I'm still having problems with this.
1) I create a table called KeepThese and give it a primary key id.
2) I create a query called delete_1 and copy this into it:
INSERT INTO KeepThese
SELECT ID FROM
(
SELECT Min(ID) AS ID
FROM Print_Ready
GROUP BY names_1, addresses
HAVING COUNT(*) >=5
UNION ALL
SELECT ID FROM Print_Ready as P
INNER JOIN
(SELECT Names_1, addresses
FROM Print_ready
GROUP BY Names_1, addresses
HAVING COUNT(*) < 5) as ThoseLessThan5
ON ThoseLessThan5.Names_1 = P.Names_1
AND ThoseLessThan5.addresses = P.addresses
)
3) I create a query called delete_2 and copy this into it:
DELETE P.* FROM Print_Ready as P
LEFT JOIN KeepThese as K
ON K.ID = P.ID
WHERE K.ID IS NULL
4) Then I run delete_1. I get a message that says "circular reference caused by alias ID" So I change this piece:
FROM (SELECT Min(ID) AS ID
to say this:
FROM (SELECT Min(ID) AS ID2
Then I double click again and a popup displays saying Enter Parameter Value for ID.This indicates that it doesn't know what ID is. But print_ready is only a query and while it has an id, it is in reality the id of another table that got filtered into this query.
Not sure what to do at this point.

CREATE TABLE isolate_duplicates AS dont sure it work for access, beside you should give a name for count(*) for new table.
This maybe work:
SELECT DISTINCT name, address
INTO isolate_duplicate
FROM print_ready
GROUP BY name + address
HAVING COUNT(*) > 4
DELETE print_ready
WHERE name + address
IN (SELECT name + address
FROM isolate_duplicate)
INSERT print_ready
SELECT *
FROM isolate_duplicate
DROP TABLE isolate_duplicate
Not tested.

Related

Set Duplicate Values to Null in PostgresSQL retaining one of the values

I have a database like this:
id name email
0 Bill bill#fakeemail.com
1 John john#fakeemail.com
2 Susan susan#fakeemail.com
3 Susan J susan#fakeemail.com
I want to remove duplicate emails by setting the value to null, but retain at least 1 email on one of the rows (doesn't really matter which one).
So that the resulting database would look like this:
id name email
0 Bill bill#fakeemail.com
1 John john#fakeemail.com
2 Susan susan#fakeemail.com
3 Susan J
I was able to target the rows like this
SELECT COUNT(email) as count FROM users WHERE count > 1
But can't figure out how to set the value to null while still retaining at least 1.
Update the rows which have the same email but greater id:
update my_table t1
set email = null
where exists (
select from my_table t2
where t1.email = t2.email and t1.id > t2.id
);
Working example in rextester.
You can use a windowed partition to assign a row number to each email group, and then use that generated row number to modify all rows except for one. Something like this:
WITH annotated_persons AS(
SELECT
id,
name,
email,
ROW_NUMBER () OVER (PARTITION BY email) AS i
FROM
persons;
)
UPDATE persons
SET email = null
WHERE id = annotated_persons.id AND annotated_persons.i <> 1
You may have to use another subquery in order to gather the IDs of persons whose row number != 1, and then change your update query to
WHERE id IN person_ids
It's been awhile since I've used a window.

SQL - Removing Duplicate without 'hard' coding?

Heres my scenario.
I have a table with 3 rows I want to return within a stored procedure, rows are email, name and id. id must = 3 or 4 and email must only be per user as some have multiple entries.
I have a Select statement as follows
SELECT
DISTINCT email,
name,
id
from table
where
id = 3
or id = 4
Ok fairly simple but there are some users whose have entries that are both 3 and 4 so they appear twice, if they appear twice I want only those with ids of 4 remaining. I'll give another example below as its hard to explain.
Table -
Email Name Id
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3
jimmy#domain.com jimmy 3
So in the above scenario I would want to ignore the jimmy with the id of 3, any way of doing this without hard coding?
Thanks
SELECT
email,
name,
max(id)
from table
where
id in( 3, 4 )
group by email, name
Is this what you want to achieve?
SELECT Email, Name, MAX(Id) FROM Table WHERE Id IN (3, 4) GROUP BY Email;
Sometimes using Having Count(*) > 1 may be useful to find duplicated records.
select * from table group by Email having count(*) > 1
or
select * from table group by Email having count(*) > 1 and id > 3.
The solution provided before with the select MAX(ID) from table sounds good for this case.
This maybe an alternative solution.
What RDMS are you using? This will return only one "Jimmy", using RANK():
SELECT A.email, A.name,A.id
FROM SO_Table A
INNER JOIN(
SELECT
email, name,id,RANK() OVER (Partition BY name ORDER BY ID DESC) AS COUNTER
FROM SO_Table B
) X ON X.ID = A.ID AND X.NAME = A.NAME
WHERE X.COUNTER = 1
Returns:
email name id
------------------------------
jimmy#domain.com jimmy 4
brian#domain.com brian 4
kevin#domain.com kevin 3

SQL: Tree structure without parent key

Note: The Data schema can not be changed. I'm stuck with it.
Database: SQLite
I have a simple tree structure, without parent keys, that is only 1 level deep. I have simplied the data for clarity:
ID Content Title
1 Null Canada
2 25 Toronto
3 33 Vancouver
4 Null USA
5 45 New York
6 56 Dallas
The structure is ordinal as well so all Canadian Cities are > Canada's ID of 1 and less than the USA's ID of 4
Question: How do I select all a nation's Cities when I do not know how many there are?
My query assigns every city to every country, which is probably not what you want, but:
http://sqlfiddle.com/#!5/94d63/3
SELECT *
FROM (
SELECT
place.Title AS country_name,
place.ID AS id,
(SELECT MIN(ID)
FROM place AS next_place
WHERE next_place.ID > place.ID
AND next_place.Content IS NULL
) AS next_id
FROM place
WHERE place.Content IS NULL
) AS country
INNER JOIN place
ON place.ID > country.id
AND CASE WHEN country.next_id IS NOT NULL
THEN place.ID < country.next_id
ELSE 1 END
select * from tbl
where id > 1
and id < (select min(id) from tbl where content is null and id > 1)
EDIT
I just realized the above does not work if there are no countries with greater ID. This should fix it.
select * from tbl a
where id > 4
and id < (select coalesce(b.id,a.id+1) from tbl b where b.content is null and b.id > a.id)
Edit 2 - Also made subquery fully correlated, so only have to change country id in one place.
You have here severals things to consider, one is if your data is gonna change and the other one is if it isn't gonna change, for the first one exist 2 solutions, and for the second, just one.
If your data is organize as shown in your example, you can do a select top 3, i.e.
SELECT * FROM CITIES WHERE ID NOT IN (SELECT TOP 3 ID FROM CITIES)
You can create another table where you specify wich city belongs to what parent, and make the hierarchy by yourself.
I reccomend the second one to be used.

write a query to identify discrepancy

I have a table with Student ID's and Student Names. There has been issues with assigning unique Student Id's to students and Hence I want to find the duplicates
Here is the sample Table:
Student ID Student Name
1 Jack
1 John
1 Bill
2 Amanda
2 Molly
3 Ron
4 Matt
5 James
6 Kathy
6 Will
Here I want a third column "Duplicate_Count" to display count of duplicate records.
For e.g. "Duplicate_Count" would display "3" for Student ID = 1 and so on. How can I do this?
Thanks in advance
Select StudentId, Count(*) DupCount
From Table
Group By StudentId
Having Count(*) > 1
Order By Count(*) desc,
Select
aa.StudentId, aa.StudentName, bb.DupCount
from
Table as aa
join
(
Select StudentId, Count(*) as DupCount from Table group by StudentId
) as bb
on aa.StudentId = bb.StudentId
The virtual table gives the count for each StudentId, this is joined back to the original table to add the count to each student record.
If you want to add a column to the table to hold dupcount, this query can be used in an update statement to update that column in the table
This should work:
update mytable
set duplicate_count = (select count(*) from mytable t where t.id = mytable.id)
UPDATE:
As mentioned by #HansUp, adding a new column with the duplicate count probably doesn't make sense, but that really depends on what the OP originally thought of using it for. I'm leaving the answer in case it is of help for someone else.

Sql COALESCE entire rows?

I just learned about COALESCE and I'm wondering if it's possible to COALESCE an entire row of data between two tables? If not, what's the best approach to the following ramblings?
For instance, I have these two tables and assuming that all columns match:
tbl_Employees
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
tbl_Customers
Id Name Email Etc
-----------------------------------
1 Bob ... ...
2 Dan ... ...
3 Mary ... ...
And a table with id's:
tbl_PeopleInCompany
Id CompanyId
-----------------
1 1
2 1
3 1
And I want to query the data in a way that gets rows from the first table with matching id's, but gets from second table if no id is found.
So the resulting query would look like:
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
3 Mary ... ...
Where Sue and Rick was taken from the first table, and Mary from the second.
SELECT Id, Name, Email, Etc FROM tbl_Employees
WHERE Id IN (SELECT ID From tbl_PeopleInID)
UNION ALL
SELECT Id, Name, Email, Etc FROM tbl_Customers
WHERE Id IN (SELECT ID From tbl_PeopleInID) AND
Id NOT IN (SELECT Id FROM tbl_Employees)
Depending on the number of rows, there are several different ways to write these queries (with JOIN and EXISTS), but try this first.
This query first selects all the people from tbl_Employees that have an Id value in your target list (the table tbl_PeopleInID). It then adds to the "bottom" of this bunch of rows the results of the second query. The second query gets all tbl_Customer rows with Ids in your target list but excluding any with Ids that appear in tbl_Employees.
The total list contains the people you want — all Ids from tbl_PeopleInID with preference given to Employees but missing records pulled from Customers.
You can also do this:
1) Outer Join the two tables on tbl_Employees.Id = tbl_Customers.Id. This will give you all the rows from tbl_Employees and leave the tbl_Customers columns null if there is no matching row.
2) Use CASE WHEN to select either the tbl_Employees column or tbl_Customers column, based on whether tbl_Customers.Id IS NULL, like this:
CASE WHEN tbl_Customers.Id IS NULL THEN tbl_Employees.Name ELSE tbl_Customers.Name END AS Name
(My syntax might not be perfect there, but the technique is sound).
This should be pretty performant. It uses a CTE to basically build a small table of Customers that have no matching Employee records, and then it simply UNIONs that result with the Employee records
;WITH FilteredCustomers (Id, Name, Email, Etc)
AS
(
SELECT Id, Name, Email, Etc
FROM tbl_Customers C
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
LEFT JOIN tbl_Employees E
ON C.Id = E.Id
WHERE E.Id IS NULL
)
SELECT Id, Name, Email, Etc
FROM tbl_Employees E
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
UNION
SELECT Id, Name, Email, Etc
FROM FilteredCustomers
Using the IN Operator can be rather taxing on large queries as it might have to evaluate the subquery for each record being processed.
I don't think the COALESCE function can be used for what you're thinking. COALESCE is similar to ISNULL, except it allows you to pass in multiple columns, and will return the first non-null value:
SELECT Name, Class, Color, ProductNumber,
COALESCE(Class, Color, ProductNumber) AS FirstNotNull
FROM Production.Product
This article should explain it's application:
http://msdn.microsoft.com/en-us/library/ms190349.aspx
It sounds like Larry Lustig's answer is more along the lines of what you need though.